Weekday, Inc. Logo

Weekday, Inc.

Lead - Data & Ml Platform Engineering

Posted 8 Days Ago
Be an Early Applicant
In-Office
Mumbai, Maharashtra
Expert/Leader
In-Office
Mumbai, Maharashtra
Expert/Leader
Lead architecture, build, and operate a Databricks-based Lakehouse and ML platform across four pillars: Data Platform, ML Platform & MLOps, Platform Operations & FinOps, and Data Governance & Quality. Deliver sub-second inference, industrialize ML lifecycles with MLflow and Mosaic AI, implement governance-as-code, run FinOps for DBU cost allocation, and ensure platform reliability for retail-scale traffic and thousands of developers.
The summary above was generated by AI

This role is for one of the Weekday's clients

Min Experience: 10+ years

Location: Bengaluru, Mumbai

JobType: full-time

Focus Areas: (i) Data Platform Engineering, (ii) ML Platform & MLOps, (iii) Platform Operations & FinOps, (iv) Data Governance & Quality

Experience: 14–20 years total |  8–12 years in Data/ML Platform Engineering   

Core Platform: Databricks Intelligence Platform (Unity Catalog, Delta Lake, MLflow, Mosaic AI)

The Context

We are currently developing the “v2.0” intelligence layer atop this Lakehouse—aiming to standardize MLOps, expand Agentic AI capabilities, and guarantee that the platform delivers sub-second latency across the entire retail network, which includes tens of thousands of stores and high-traffic digital channels.

The Data & ML Platforms group (Group A in Enterprise IT) serves as the driving force behind this transformation. It is led by a VP (L2) and organized into four AVP-led pillars, supported by 10 AI-ready Platform Engineers and a transitioning team of Data Engineers. Each AVP is responsible for a specific platform layer and functions as a builder-leader—expected not only to manage but also to architect, perform code reviews, and actively contribute to development alongside their team.

The Four Pillars

We are seeking to hire four AVPs, each heading one of the platform pillars. While each AVP has full ownership of their respective pillar, all four collaborate closely as a unified leadership team under the VP. Candidates may be evaluated for placement in any pillar depending on their strengths and fit.


Requirements(i) Data Platform Engineering

Mission: Take full ownership of the core Lakehouse infrastructure, encompassing storage, compute, and developer platform layers that support all other operations.

  • Design and maintain the Delta Lake storage layer, Photon compute engine, and Unity Catalog abstraction, serving over 1,000 developers across various retail sectors.
  • Implement advanced optimization techniques including query plan tuning, cluster auto-scaling policies, Z-ordering strategies, and partitioning schemes for datasets with trillions of rows.
  • Manage the internal developer platform by developing SDKs, CLI tools, templates, and enabling self-service onboarding to accelerate new teams' time-to-first-query.
  • Lead the technical cleanup of Phase-1 migration challenges, including schema standardization, pipeline consolidation, and deduplication of source of record (SOR) systems across hundreds of sources.
  • Oversee the Data Engineer transition cohort within this pillar, establishing engineering standards, enforcing code review processes, and defining career progression paths.
(ii) ML Platform & MLOps

Mission: Industrialize machine learning by building infrastructure that efficiently moves models from experimentation notebooks to production at retail scale.

  • Develop and maintain the end-to-end ML lifecycle leveraging MLflow, including experiment tracking, model registry, automated retraining, A/B testing, and canary deployments.
  • Design the real-time inference architecture to deliver model serving with sub-100ms latency across recommendation, pricing, and demand forecasting applications.
  • Construct the Agentic AI infrastructure comprising RAG pipelines, vector stores, fine-tuning workflows for Foundation Models (utilizing Mosaic AI), and agent orchestration frameworks.
  • Establish governance for the Feature Store by standardizing feature definitions, enforcing freshness SLAs, lineage tracking, and promoting feature reuse across retail divisions.
  • Ensure reliability of the ML platform through GPU/TPU cluster management, training job scheduling, cost attribution per model, and managing incident response for production model degradations.
(iii) Platform Operations & FinOps

Mission: Maintain platform stability, performance, and cost-efficiency—especially during critical periods.

  • Ensure 99.99% platform uptime, providing leadership during critical events such as festive sales, store openings, and retail peak periods.
  • Establish and run the FinOps practice focusing on DBU cost allocation by team and workload, implementing chargeback models, automating resource right-sizing, and delivering executive cost dashboards.
  • Design and manage monitoring and observability systems covering pipeline health, query performance, cluster utilization, and data freshness SLAs across all six value streams.
  • Lead capacity planning by forecasting compute and storage demands in line with retail seasonality (festive cycles, new store launches, category introductions) and provisioning resources accordingly in advance.
  • Oversee incident management, develop runbooks, and conduct post-mortem evaluations for the Databricks platform, ensuring targets for mean time to recovery are met and continually improved.
(iv) Data Governance & Quality

Mission: Serve as the technical steward for India’s largest consumer dataset, ensuring its trustworthiness, compliance, and discoverability.

  • Develop “Governance-as-Code” frameworks on Unity Catalog, incorporating automated access controls, data classification, PII masking, and audit trails to comply with DPDP Act requirements.
  • Design and implement a data quality framework that includes automated profiling, anomaly detection, schema enforcement, and freshness monitoring across thousands of datasets.
  • Manage the data catalog and discovery platform, providing metadata management, lineage visualization, business glossary, and search tools to support over 1,000 users.
  • Build consent management infrastructure to monitor, enforce, and audit user consent signals throughout the comprehensive “Phygital” retail ecosystem (online and offline).
  • Drive enterprise-wide data standards by defining naming conventions, rules for SOR deduplication, master data alignment, and data contract enforcement between producing and consuming teams.
Minimum Qualifications (All Pillars)
  • 14 to 20 years of professional experience in software engineering, data engineering, or ML infrastructure, including a minimum of 3 years leading a platform team of 5 or more engineers.
  • 8 to 12 years of hands-on experience in building and scaling data or ML platforms such as Lakehouse architectures, Feature Stores, Streaming Engines, or MLOps pipelines.
  • Strong technical expertise within the Databricks ecosystem or similar distributed data platforms (e.g., Spark, Presto/Trino, Flink, or Kafka at scale), with a strong preference for Databricks experience.
  • Proven “builder-leader” approach: actively involved in code review, production debugging, and architectural decision-making without fully delegating technical responsibilities.
  • Experience operating within large and complex technology organizations featuring inherited teams, cross-functional dependencies, and enterprise-grade compliance requirements.
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related discipline, or equivalent expertise acquired through industry experience and open-source contributions.
Preferred Qualifications
  • Previous experience managing India-scale data platforms handling multi-billion events per day, petabyte-scale data warehouses, or real-time serving at over 10,000 queries per second.
  • Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure platforms at production level—not limited to experimentation phases.
  • Familiarity with retail or e-commerce data domains such as product catalogs, inventory management, order processing, customer behavior signals, or supply chain datasets.
  • Demonstrated success in building internal tooling or developer platforms that have gained widespread organic adoption within large engineering organizations.
  • Experience with FinOps practices including DBU/compute cost attribution, chargeback modeling, and enterprise-scale cloud cost optimization.
  • Knowledge of Indian data privacy regulations (DPDP Act) or global frameworks (GDPR, CCPA) in the context of data platform governance.
Organisation Context

This position reports directly to the VP & Head of Data & ML Platforms, who in turn reports to the Head of Enterprise IT, and ultimately to the CEO. You will collaborate as a peer with three other AVPs within the Data & ML Platforms group and work closely with more than 10 AI-ready Platform Engineers at Architect and Principal levels, alongside the transitioning Data & Platforms Engineers cohort.

The broader Enterprise IT division comprises five additional L2 groups: CISO/Cybersecurity, HR/Finance/Legal Platforms, SAP-Core, Systems & AI Architects, and CIO + Cloud & Infrastructure.

Must-have skills

Data & ML Platform, Databricks, Platform Architecture

Good-to-have skills

MLOps, System Architecture, Retail

Similar Jobs

5 Hours Ago
Hybrid
Senior level
Senior level
Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Manage end-to-end transfer agency investor servicing for private equity funds, including capital calls, distributions, investor onboarding/KYC, registry maintenance, investor communications, and regulatory compliance. Liaise with internal and external stakeholders, support audits, participate in process improvements and UAT, and mentor junior staff while ensuring data integrity and SLA adherence.
Top Skills: GenevaIntralinksInvestor PortalInvestor VisionInvestranExcel
5 Hours Ago
Hybrid
Expert/Leader
Expert/Leader
Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
Manage fund accounting for private equity/debt funds: oversee booking of transactions, NAV preparation, capital activity processing, fee and carried interest computations, investor and fund reporting, client interaction, and supervise/train team members in a global matrix environment.
Top Skills: Excel
5 Hours Ago
Hybrid
Pune, Maharashtra, IND
Senior level
Senior level
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Lead project delivery in capital markets and trading, driving digital transformation and managing cross-functional teams using Agile methodologies.
Top Skills: Agile MethodologiesConfluenceJIRAMobile App DevelopmentWeb Development

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account