Morningstar Logo

Morningstar

Lead Machine Learning Engineer

Posted Yesterday
Be an Early Applicant
Hybrid
Navi Mumbai, Thane, Maharashtra
Senior level
Hybrid
Navi Mumbai, Thane, Maharashtra
Senior level
Lead the design and operation of ML systems for data collection and processing. Mentor teams, oversee technical direction, and ensure system reliability and efficiency. Responsible for cloud deployments and integrating advanced AI technologies.
The summary above was generated by AI
Title: Lead Machine Learning Engineer
Location: Vashi, Navi Mumbai
As a Lead Machine Learning Engineer, you will be the hands-on technical owner of ML systems that power large-scale data collection, extraction, enrichment, and understanding of unstructured content. You'll design, build, and operate end-to-end solutions-from feature generation and training to low-latency inference and observability. These solutions will measurably improve coverage, freshness, quality, and unit cost across our data pipelines. Your toolbox spans classical ML, NLP, LLMs/GenAI, Agentic AI, Retrieval-Augmented Generation (RAG) frameworks, and Model Context Protocol (MCP). You will use these to deliver retrieval, extraction, classification, summarization, and autonomous tasking capabilities integrated cleanly into production workflows.
You'll own the architecture and implementation across AWS and GCP clouds, selecting managed services pragmatically and deploying resilient services via Docker and Kubernetes with CI/CD, autoscaling, canary/shadow releases, and tight SLIs/SLOs. You will institute MLOps best practices-experiment tracking, model and prompt registries, evaluation harnesses, data/feature drift detection, guardrails and policy enforcement, lineage and access controls-so teams can ship faster with confidence. Day to day, you'll write production-grade Python and SQL, apply GitHub Copilot to accelerate development responsibly, and partner with Product, Data, Platform/SRE, and Security to translate ambiguous problems into staged, observable deliveries.
You bring a curiosity to understand the domain by studying the applications, dataflow, and data schemas, and you use that context to design simpler, more accurate systems. It's a plus if you have familiarity with public and private equity data and related entity models, enabling smarter features, evaluation sets, and downstream integrations. As a lead IC, you mentor through design and code reviews, set technical direction, and improve reliability, security, and developer experience. You will champion cost-aware, privacy-first designs; lead deep dives to resolve complex issues; and iterate quickly to achieve measurable outcomes (precision/recall, latency, error budgets, and cost per document). This role is ideal for an engineer who thrives on shipping robust ML/LLM systems at scale and influencing cross-functional teams through exceptional technical judgment and execution.
Team Overview
You will be part of a multidisciplinary team of ML engineers and data scientists responsible for building AI & ML solutions and services as part of robust data collection pipelines handling large volumes of unstructured data. Team will focus on building scalable and reliable systems to process and categorize data that is essential for downstream data collection processing.
Outline of Duties and Responsibilities
  • AI & ML Data Collection Leadership: Convert business goals into a clear AI/ML roadmap for data acquisition, extraction, enrichment, and measurable outcomes.
  • Technical Oversight: Architect and ship scalable ML/NLP/LLM (RAG, embeddings, reranking, Agentic AI, MCP) services with high reliability and efficiency.
  • Peer Leadership & Development: Mentor engineers and data scientists through design/code reviews, setting technical standards and elevating craftsmanship.
  • NLP Technologies: Build and integrate classifiers, transformers, LLMs, and evaluators that process and categorize unstructured data at scale.
  • Data Pipeline Engineering: Design, operate, and optimize high-throughput collection pipelines with robust orchestration, messaging, storage, and SLAs.
  • Cross-functional Collaboration: Partner with Product, Data Collection Engineering, Platform/SRE, and Security to turn ambiguous needs into phased, observable deliveries.
  • Innovation & Continuous Improvement: Pilot and productionize advances in GenAI, Agentic AI, RAG, and MCP to improve quality, speed, and cost.
  • System Integrity & Security: Enforce data governance, privacy, and model transparency with least-privilege IAM, secrets management, and auditability.
  • Process Improvement: Apply Agile/Lean/Fast-Flow practices to reduce cycle time, raise quality, and remove toil via automation.
  • Cloud & Deployment: Deliver cloud-native solutions on AWS and GCP using Docker/Kubernetes, autoscaling, and progressive delivery patterns.
  • MLOps & Reliability: Establish experiment tracking, registries, CI/CD, drift detection, SLIs/SLOs, and runbooks for dependable operations.
  • Retrieval Quality & Evaluation: Implement offline/online evals (e.g., nDCG/MRR/precision@k), golden sets, and guardrails for RAG and search relevance.
  • Cost, Performance & Observability: Optimize latency and unit cost with caching, batching, distillation, right-sizing, and clear dashboards/alerts.
  • Documentation & Knowledge Sharing: Produce concise design docs, ADRs, and playbooks to ensure durable, cross-site knowledge transfer.

Experience, Skills and Qualifications
  • Bachelor's, Master's, or PhD in Computer Science, Mathematics, Data Science, or a related field.
  • 5+ years of experience in the ML Engineering and Data Science field, with a focus on LLM and GenAI technologies, particularly in data collection and unstructured data processing.
  • 1+ years of experience in technical lead position.
  • Strong expertise in NLP and machine learning, with hands-on experience in classifiers, large language models (LLMs), Model Context Protocol (MCP), Agentic AI, and other advanced NLP techniques.
  • Extensive experience with data pipeline and messaging technologies such as Apache Kafka, Airflow, and cloud data platforms (e.g., Snowflake).
  • Expert-level proficiency in Python, SQL, and other relevant programming languages and tools.
  • Proficiency in Amazon Web Services (AWS) and Google Cloud Platform (GCP).
  • Strong understanding of cloud-native technologies and containerization (e.g., Kubernetes, Docker) with experience in managing these systems globally.
  • Demonstrated ability to solve complex technical challenges and deliver scalable solutions.
  • Excellent communication skills with a collaborative approach to working with global teams and stakeholders.
  • Experience working in fast-paced environments, particularly in industries that rely on data-intensive technologies (experience in fintech is highly desirable).

Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity

Top Skills

Airflow
Apache Kafka
AWS
Docker
GCP
Kubernetes
Python
SQL

Similar Jobs at Morningstar

An Hour Ago
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
Mid level
Mid level
Enterprise Web • Fintech • Financial Services
The Senior Process Specialist will prepare statutory accounts, ensure compliance with multiple GAAPs, manage auditor queries, and support tax compliance efforts. This role also involves process improvement and analysis reporting.
Top Skills: Financial Reporting ToolsIfrsLocal GaapsUs Gaap
An Hour Ago
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
Senior level
Senior level
Enterprise Web • Fintech • Financial Services
The Senior IT Internal Auditor will evaluate IT and security processes, oversee audits, and recommend changes to strengthen controls. They will manage teams and utilize advanced audit tools to ensure compliance and risk management.
Top Skills: CobitCosoIsoItilNist
An Hour Ago
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
Senior level
Senior level
Enterprise Web • Fintech • Financial Services
As a Senior Software Development Engineer, you will enhance the MLDLC at PitchBook, providing tools and mentorship while collaborating on AI solutions.
Top Skills: Amazon SagemakerApache AirflowApache KafkaAWSDockerElasticsearchFastapiFluxGCPGitGoogle Vertex AiGrafanaJavaKubeflowKubernetesLangchainMlflowNoSQLPrometheusPythonPyTorchRedisScikit-LearnSQLTensorFlowTerraformWeights & Biases

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account