dunnhumby

Lead Engineer

Posted 2 Days Ago

Be an Early Applicant

Gurgaon, Gurugram, Haryana

Senior level

Gurgaon, Gurugram, Haryana

Senior level

As Lead Engineer at dunnhumby, you'll lead a Site Reliability Engineering team to ensure the reliability and performance of cloud services, implement infrastructure automation, manage critical systems, and mentor junior engineers while fostering collaboration and continuous improvement.

The summary above was generated by AI

dunnhumby is the global leader in Customer Data Science, empowering businesses everywhere to compete and thrive in the modern data-driven economy. We always put the Customer First.

Our mission: to enable businesses to grow and reimagine themselves by becoming advocates and champions for their Customers. With deep heritage and expertise in retail – one of the world’s most competitive markets, with a deluge of multi-dimensional data – dunnhumby today enables businesses all over the world, across industries, to be Customer First.

dunnhumby employs nearly 2,500 experts in offices throughout Europe, Asia, Africa, and the Americas working for transformative, iconic brands such as Tesco, Coca-Cola, Meijer, Procter & Gamble and Metro.

Cloud Site Reliability Eginner ensures that dunnhumby’s cloud hosted services—both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

This is a unique opportunity to help transform how we deliver dunnhumby's cutting edge customer science and machine learning research and leverage our unique access to big data. At dunnhumby we are passionate about cloud and open-source technologies and are committed to a long term and sustained investment in people aligned to our goals.

Key Accountabilities

Lead and mentor a team of Site Reliability Engineers, fostering a culture of collaboration, learning, and continuous improvement.
Maintain and support infrastructure services in both development, integration and production environments
Design, implement, and manage robust, scalable, and high-performance systems and infrastructure.
Ensure the reliability, availability, and performance of critical services through proactive monitoring, incident response, and root cause analysis.
Drive the adoption of automation, CI/CD practices, and infrastructure as code (IaC) to streamline operations and improve operational efficiency.
Collaborate with development teams to ensure that applications are designed for scalability, reliability, and fault tolerance.
Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) to monitor and improve service health.
Lead incident management, troubleshooting, and postmortems to identify and address operational challenges.
Manage capacity planning, scaling strategies, and disaster recovery for cloud-based environments (GCP, Azure).
Drive improvements in operational tooling, monitoring, alerting, and reporting.
Act as a subject matter expert in reliability engineering best practices and promote these practices across the organization.
Contribute to creating and improving processes for release management, change management, and configuration management.
Participate in on-call rotations and respond to production incidents as necessary.
Review services before they go live in production
Enforce rigor on incident response and post-mortems
Design proactive monitoring and metrics against supported environment
Focus on automation to improve scale and reliability
Identifies and proposes alternative technology in order to create scalable implementations and achieve results
Coordinate and troubleshoot complex technical issues until resolution
Identify and prioritize what technical debt will be eliminated
Identify opportunities to influence the roadmap of infrastructure services

Qualifications

8+ years of experience in an engineering role with hands on experience in the public cloud; Google Cloud Platform (GCP) preferred however not limited to and exposure on any other public cloud provider ideally Azure.
Strong experience in designing and managing large-scale, distributed systems.
Expertise in cloud technologies (GCP, Azure) and infrastructure automation tools (Terraform, Ansible, Puppet, etc.).
Proficiency in containerization and orchestration technologies such as Docker, Kubernetes, and Helm.
Experience with monitoring and observability tools like Prometheus, Grafana, NewRelic, or similar.
Strong knowledge of CI/CD pipelines and related automation tools.
Proficient in scripting languages like Python, Bash, or Go.
Strong troubleshooting and problem-solving skills in production environments.
Experience leading and mentoring engineering teams, with a strong focus on collaboration and communication.
Familiarity with incident management processes and tools (e.g., ServiceNow, XMatters).
Experience with infrastructure as code (IaC) and version control systems (Git).
Knowledge of scripting in Python/Bash
Knowledge of Go programming language
Knowledge of Ansible & Terraform for writing most of the infrastructure automation
Experience with Kubernetes
Understanding of metrics collectors such as Graphite or Prometheus
Experience with DevOps tools
Ability to learn and adapt in a fast-paced environment, while producing quality code
Ability to work collaboratively on a cross-functional team with a wide range of experience levels
Ability to analyse existing services and identify technical debt to work toward increasing sustainability
Finds creative way to execute even when there is no historical context or known path forward
Ability to design roadmaps and relevant solutions for end-users to access interfaces
Ability to assess the benefits, risks and success factors of potential applications
Strong mentoring and coaching skills that encourage growth for more junior members

What you can expect from us

We won’t just meet your expectations. We’ll defy them. So you’ll enjoy the comprehensive rewards package you’d expect from a leading technology company. But also, a degree of personal flexibility you might not expect. Plus, thoughtful perks, like flexible working hours and your birthday off.

You’ll also benefit from an investment in cutting-edge technology that reflects our global ambition. But with a nimble, small-business feel that gives you the freedom to play, experiment and learn.

And we don’t just talk about diversity and inclusion. We live it every day – with thriving networks including dh Gender Equality Network, dh Proud, dh Family, dh One and dh Thrive as the living proof. We want everyone to have the opportunity to shine and perform at your best throughout our recruitment process. Please let us know how we can make this process work best for you. For an informal and confidential chat please contact [email protected] to discuss how we can meet your needs.

Our approach to Flexible Working

At dunnhumby, we value and respect difference and are committed to building an inclusive culture by creating an environment where you can balance a successful career with your commitments and interests outside of work.

We believe that you will do your best at work if you have a work / life balance. Some roles lend themselves to flexible options more than others, so if this is important to you please raise this with your recruiter, as we are open to discussing agile working opportunities during the hiring process.

For further information about how we collect and use your personal information please see our Privacy Notice which can be found (here)

Top Skills

Bash

Python

Similar Jobs

Lead Product Developer

9 Hours Ago

Hybrid

Senior level

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting

As a Lead Product Developer, you will develop and implement user interface components while instilling best practices in software development. You'll troubleshoot interface software, optimize components for performance, and guide a web development team in delivering high-quality, scalable solutions. The role requires collaboration with cross-functional teams and a strong foundation in full software lifecycle methodology.

Top Skills: JavaScriptReact

Lead AI Engineer - Machine Learning

Yesterday

Hybrid

Senior level

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting

As a Lead AI Engineer, you will build and refine ML engineering platforms, scale algorithms, orchestrate model pipelines, implement ML Ops, and write production-ready code while collaborating with teams to address technical requirements and improve machine learning solutions.

Top Skills: PysparkPythonScalaSQL

Morningstar

Lead Software Engineer

20 Days Ago

Hybrid

Gurugram, Haryana, IND

Senior level

Enterprise Web • Fintech • Financial Services

As a Lead Software Engineer, you will oversee the development and maintenance of Datafeed applications, leading engineering teams and collaborating with clients to enhance functionality, resolve issues, and improve application performance. You will design strategic enhancements and drive best practices in software development.

Top Skills: Asp.NetNode.js

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.