Data Engineer
Our internal team is looking for a non-traditional Data Engineer (DE) —one that blends Data Engineering (DE) and Data Science (DS). This person needs to develop and optimize data architectures that support business intelligence, predictive analytics, and AI/ML applications. This role involves designing and structuring data pipelines that connect SAP, Databricks, and Microsoft CRM, enabling trend analysis, forecasting models, and AI-driven insights. The focus is not just on ETL pipelines but on building a feature store rather than traditional data models.
This role requires a deep understanding of how data should be organized to support efficient reporting, historical trend analysis, and advanced analytics.
Given our global operations, fluency in both English and Chinese is highly beneficial
Duration: 9+ months contract
Pay Range: $60/hr to $85/hr (on basis of experience level)
Responsibilities:
- Develop and implement scalable data models that link SAP, Databricks, and Microsoft CRM, ensuring they support business intelligence, forecasting, and AI-driven analytics.
- Build robust data pipelines to extract, process, and structure information from SAP HANA, SAP BW, OData, and Microsoft CRM, ensuring accuracy and usability for analytical tools.
- Design data schemas that enhance historical trend analysis, predictive modeling, and performance monitoring, rather than simply storing raw transactional data.
- Construct data warehouses and structured datasets that allow for efficient querying and insightful analysis, reducing the need for complex transformations downstream.
- Ensure data processing frameworks can accommodate both real-time updates and scheduled batch processing, supporting diverse analytical needs.
- Work closely with stakeholders across business functions to align data structures with operational goals, ensuring usability and relevance.
- Automate data ingestion and transformation using Python, SQL, and cloud-based technologies, streamlining data workflows.
- Implement data governance policies, ensuring compliance with security protocols, access management, and audit logging.
- Maintain and troubleshoot data pipelines, minimizing downtime and ensuring smooth data availability for reporting and analytics teams.
- Develop metadata and lineage tracking strategies, improving data transparency and usability across the organization.
Candidate's Persona:
1. Strong Data Engineering (DE) Background
- Experience in SQL, Python, ETL, and data modeling
- Hands-on with Databricks including Delta Lake storage, efficient partitioning strategies, and query performance tuning
- Familiar with Apache Airflow for orchestration
- Experience with DBT (nice to have)
2. Experience in Feature Engineering & Data Science (DS)
- Some exposure to data modeling and ML feature engineering
- Experience with feature store development is a plus, or it can be described as "support predictive modeling"
- Proficiency in Python for data manipulation & ML
- Familiarity with ML libraries like sci-kit-learn
- Experience with ML platforms
- Prior exposure to predictive modeling/propensity modeling is a plus
3. Business Acumen & Predictive Feature Design
- Strong understanding of SAP and CRM data
- Ability to identify features with predictive power
- Experience in creating business-relevant features, e.g., RFM (Recency, Frequency, Monetary) modeling
- Familiarity with Microsoft CRM (preferably Dynamics) and its underlying data structures for business analysis.
- 3-6 years of experience in data engineering, with a strong background in data modeling and integration.
- Proficiency in Python and SQL for data transformation, pipeline automation, and performance tuning.
- Deep knowledge of Databricks,
- Expertise in SAP data structures, including SAP HANA, SAP BW, OData, BAPI, and IDocs.
- Hands-on experience designing optimized data architectures that support trend analysis, forecasting models, and AI-powered insights.
- Experience structuring data to facilitate business intelligence reporting, advanced forecasting, and predictive modeling.
- Ability to manage large-scale data pipelines, optimizing them for performance and scalability.
- Strong understanding of workflow orchestration tools (e.g., Apache Airflow, Prefect, or Azure Data Factory) to automate and schedule data tasks.
- Prior experience implementing security controls, access management, and governance frameworks to maintain data integrity.
- Bilingual proficiency in Chinese and English is highly preferred.
- Business-oriented mindset, with the ability to align data structures with operational and strategic goals.
- Experience in forecasting, predictive analytics, or AI/ML model deployment. Familiarity with automated ML pipelines
- Familiarity with CI/CD pipelines for managing and deploying data infrastructure.
- MLOps, DevOps, Cloud Services skills
- Exposure to cloud-based data lake architectures, optimizing storage for cost and performance.
- Knowledge of metadata-driven data engineering, improving discoverability and tracking across datasets.
- Advanced education (MBA, MA, or Ph.D.) is a plus, as we welcome professionals with strong analytical backgrounds
**No C2C profiles are accepted**
Thank you!
FocusKPI Hiring Team
Founded in 2010, FocusKPI, Inc. (FocusKPI) is a data science and technology firm specializing in predictive analytics practice and methodologies. FocusKPI is a US company headquartered in Silicon Valley, California, with an East Coast office in Boston, Massachusetts.
NOTICE: Please be aware of fraudulent emails regarding job postings, job offers and fake checks. FocusKPI's recruiting team will strictly reach out via @focuskpi.com email domain. If you have received fraudulent emails now or in the past, please report it to https://reportfraud.ftc.gov/ .
The domain @focuskpijobs.com is fraudulent and not related to FocusKPI. Please do not not reply or communicate to anyone with @focuskpijobs.com.