Role Summary
We are seeking an experienced Databricks Operations & Implementation Engineer to design, implement, and manage high-performance data pipelines and operational processes within the Databricks environment. The ideal candidate will combine deep technical expertise in Databricks, Apache Spark, and AWS Cloud with strong operational discipline, ensuring platform stability, governance, and continuous optimization.
Key Responsibilities
Implementation
- Design, build, and optimize ETL/ELT pipelines leveraging Databricks' native capabilities to process large-scale structured and unstructured datasets.
- Implement data quality frameworks and monitoring solutions using Databricks' built-in features to ensure data reliability and consistency.
- Establish governance, security, and compliance best practices across Databricks environments and integrate with enterprise systems.
Operational Management
- Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance.
- Implement logging, alerting, and monitoring solutions using Databricks and enterprise tools.
- Perform cluster health checks, resource utilization reviews, and performance tuning to prevent bottlenecks.
- Manage incident response for Databricks pipeline failures, including root cause analysis and resolution.
- Develop and maintain disaster recovery and backup strategies for critical data assets.
- Conduct cost and performance optimization of Spark jobs and Databricks clusters.
- Implement automated testing frameworks (unit, integration, and data validation tests) for Databricks pipelines.
- Maintain detailed runbooks, operational documentation, and troubleshooting guides.
- Coordinate system upgrades and maintenance windows with minimal business disruption.
- Manage user access, workspace configuration, and security controls within Databricks.
- Oversee data lineage and metadata using Databricks Unity Catalog for transparency and compliance.
- Conduct capacity planning and cost forecasting for Databricks infrastructure and workloads.
Collaboration & Leadership
- Provide technical mentorship to team members on Databricks best practices and data engineering techniques.
- Participate in on-call rotations for production systems and ensure platform stability.
- Lead operational reviews and contribute to continuous improvement initiatives for platform reliability.
- Collaborate with infrastructure and security teams on cluster provisioning, networking, and access controls.
Requirements / Qualifications
Education & Experience
- Bachelor's Degree in Computer Science, Computer Engineering, or equivalent field.
- 8-10 years of experience in system operations, data platform management, or cloud operations.
- Hands-on project experience with the Databricks platform (primary requirement).
- Proven experience in cloud operations or architecture (AWS preferred).
- AWS Cloud Certification required; Databricks Certification highly preferred.
Core Technical Skills
- Expert proficiency in Databricks platform administration, workspace management, cluster configuration, and job orchestration.
- Deep expertise in Apache Spark (Spark SQL, DataFrames, RDDs) within Databricks.
- Strong experience with Delta Lake (ACID transactions, versioning, time travel).
- Hands-on experience with Databricks Unity Catalog for metadata management and data governance.
- Comprehensive understanding of data warehousing, data profiling, validation, and analytics concepts.
- Strong knowledge of monitoring, incident management, and cloud cost optimization.
Technology Stack Exposure
- Databricks (core platform expertise).
- AWS Cloud Services & Architecture.
- Informatica Data Management Cloud (IDMC).
- Tableau for reporting and visualization.
- Oracle Database administration.
- ML Ops practices within Databricks (advantageous).
- Familiarity with STATA, Amazon SageMaker, and DataRobot integrations (nice-to-have).
If you are interested in this role and would like to discuss the opportunity further please click apply now or email Chew Kai-Xinn at for more information.
Only shortlisted candidates will be responded to, therefore if you do not receive a reply within 14 days please accept this as notification that you have not been shortlisted.
Morgan McKinley Talent Solutions
Morgan McKinley Pte Ltd EA Licence No: 11C5502
EAP Registration No: R2196712
EAP Name: Chew Kai-Xinn
