We are seeking a Lead Data Engineer to join our talented team to aid Financial Institutions in their fight against money laundering and fraud by powering advanced analytics and financial crime detection by building resilient, governed, high-quality data platforms. You will design and evolve our Databricks + AWS lakehouse, enabling investigators and product teams to derive insight into criminal behaviors and act decisively.
As part of the development team, your role will focus on designing Data solution leveraging Databricks/Snowflake as a platform.
Role
* Own end-to-end design, build, optimisation, and support of scalable Spark / PySpark data
pipelines on Databricks (Batch + Streaming).
* Define and enforce lakehouse & Medallion architecture standards (bronze / silver / gold), schema governance, lineage, quality SLAs, and cost controls.
* Leverage data ingestion frameworks (Apache NiFi, SFTP/FTPS, API) to onboard diverse internal and external datasets securely and repeatably.
* Architect secure, compliant AWS data infrastructure (S3, IAM, KMS, Glue, Lake Formation, EC2/EKS, Lambda, Step Functions, CloudWatch, Secrets Manager).
* Implement orchestration using Airflow, Databricks Workflows, and Step Functions; standardise DAG patterns (idempotency, retries, observability).
* Champion data quality (expectations, anomaly detection, reconciliation, contract tests) and reliability (SLIs/SLOs, alerting, run books).
* Embed lineage & metadata (Unity Catalog / Glue / OpenLineage) to support audit, impact analysis, and regulatory transparency.
* Drive CI/CD for data assets (infra as code, notebook/test automation, artifact versioning, semantic tagging, environment promotion).
* Mentor engineers on distributed data performance, partitioning, file layout (Delta Lake optimization), caching, and cost/perf trade‑offs.
* Collaborate with data science, product, and compliance to translate analytical / detection needs into robust data models and serving layers.
* Review code (PySpark, SQL, infra templates) for efficiency, readability, and consistency; lead technical design reviews and trade‑off decisions.
* Establish secure secrets, key rotation, data masking/tokenisation, and row/column level access controls.
* Drive continuous improvement: backlog triage, sizing, delivery tracking, and stakeholder demos.
* Support incident response (root cause, postmortems, preventative engineering).
All About You
* Expert in SQL development with hands-on experience in Databricks, Snowflake, Python, and PySpark for designing and implementing advanced data engineering solutions.
* Skilled in architecting and developing scalable, reusable data models, pipelines, and frameworks leveraging Hadoop, Apache NiFi, and modern Cloud Data Lake architectures.
* Hands-on with orchestration: Airflow (DAG design, sensors, task groups), Databricks Workflows, Step Functions.
* Strong AWS data ecosystem expertise: S3 layout strategies, IAM least privilege, Glue catalog, Lake Formation, networking (VPC, endpoints), encryption.
* Proficient with CI/CD (Git branching, PR workflows, automated deployment to Databricks & AWS, infra as code-Terraform/CloudFormation).
* Familiar with governance & lineage tooling (Unity Catalog, OpenLineage, Atlas, or equivalent) and audit/compliance needs (PII/PCI handling, retention).
* Proven production experience building and optimising large-scale Spark / PySpark pipelines on Databricks (jobs, clusters, Delta Lake, Photon).
* Bonus: Python packaging for shared libs, OpenTelemetry for data pipeline observability, Financial Crime domain exposure
* Proven experience collaborating with stakeholders and cross-functional teams to understand
business requirements and deliver reliable, high-impact cloud data solutions across Azure, AWS, and Cloud Data Warehouse platforms.
* Strong leadership and communication skills, with the ability to guide and mentor data engineering teams, ensuring effective collaboration between technical and non-technical stakeholders.
* Adept at Cost optimisation (storage tiering, workload right‑sizing, spot usage, cache strategies).
* Comfortable leading design sessions, mentoring engineers, and engaging varied stakeholders (data science, security, compliance, product).
* Pragmatic mindset: balance robustness with delivery speed; automate where repeatable; document critical paths.
* Incident management experience (on-call readiness, observability dashboards, MTTR reduction).