A.P. Moller - Maersk Logo

A.P. Moller - Maersk

Software Engineer - SRE

Posted 10 Days Ago
Be an Early Applicant
In-Office
2 Locations
Mid level
In-Office
2 Locations
Mid level
The Software Engineer will enhance SRE capabilities using AI/ML, improve reliability and operational efficiency, and collaborate with various teams.
The summary above was generated by AI

About the Role

We are looking for a highly skilled Software Engineer with strong AI/ML expertise and a foundational understanding of SRE principles to help transform reliability engineering through intelligent, automation-driven solutions.

This role is not just about applying AI; it’s about applying engineering mindset and AI capabilities to reliability problems. You should be comfortable writing clean, maintainable code and have a understanding of SRE principles such as observability, incident response, and automation. By combining software skills with practical knowledge of operational challenges, you'll help eliminate toil, drive proactive reliability improvements, and embed intelligence into day-to-day engineering workflows. Your efforts will directly contribute to unifying reliability efforts across teams, enabling consistent engineering standards, and fostering a shared accountability model for service health. By driving operational discipline and aligning reliability goals with business priorities, you will help create a culture where platform stability, developer productivity, and customer experience go hand in hand. These contributions will play a vital role in supporting the organization's broader strategy—enabling faster innovation, scalable growth, and a resilient technology foundation aligned with long-term business outcomes.

Key Responsibilities

· Support initiatives to enhance SRE capabilities using AI/ML, ensuring strong foundations in reliability engineering and operational excellence.

· Leverage AI and machine learning technologies to architect and implement solutions that advance the overall SRE agenda—improving reliability, automation, observability, and operational efficiency across complex systems.

· Contribute to incident management, change management, and release processes—bringing structure, automation, and intelligent insights to drive stability, safety, and velocity.

· Participate and Drive key SRE practices and routines—including initiation and facilitation of SRE Community of Practice (CoP), aligning SLAs/SLOs, launching error budget governance, and enabling data-driven process improvements across reliability areas.

· Partner effectively with SREs, platform engineers, and data teams to develop production-grade, measurable, and reliable models and tools.

· Develop and maintain internal frameworks and tooling to accelerate AI/ML adoption across reliability use cases.

· Partner , Understand and assist in driving Zero-Touch Operations by enabling platforms to detect, analyze, and resolve issues autonomously.

· Utilize metrics, logs, and historical incident data to build actionable insights and reliability dashboards.

· Actively participate in on-call rotations, improving incident response processes and escalation management.

· Integrate security best practices into workflows and collaborate with security teams to ensure platform stability.

· Contribute significantly to shaping the AI-in-SRE strategy and mentor junior team members.

Required Skills & Qualifications

· 3–5 years of experience as a software engineer or platform engineer, with a strong focus on building production-grade systems, developer tooling, or intelligent automation.

· LLM-Native Development Approach- Proficiency in leveraging LLM-powered tools (e.g., for research, code generation, or automation). Demonstrated experience building AI-assisted workflows or custom automations that enhance engineering efficiency, reduce manual effort, or accelerate operational tasks.

· Proficient in Python, Go, or equivalent, with strong software engineering fundamentals—testing, version control, CI/CD, and clean code practices.

· Understanding of core SRE principles (SLIs/SLOs, incident response, error budgets), with the ability to partner with SREs to productionize reliability tooling.

· Hands-on experience with cloud platforms (AWS, GCP, Azure), containers/orchestration (Docker, Kubernetes), and infrastructure-as-code patterns.

· Familiarity with observability and telemetry systems—building or integrating with tools like Prometheus, OpenTelemetry, or Elastic stack.

· Comfortable working with Linux-based systems, debugging performance issues, and understanding systems-level behavior.

· Ability to translate operational pain points into intelligent, automated solutions using software, AI, and data-driven techniques.

Preferred Qualifications.

· Advanced SRE Practice Exposure: Familiarity with operating in mature SRE environments—such as participating in production readiness reviews, chaos engineering exercises, Capacity planning, Error budget governance and operational health reviews etc.

· Exposure to building AI-assisted tools using LLMs, vector databases, or prompt engineering techniques to streamline engineering or operational workflows would be a big plus.

Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements.

 

We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing  [email protected]

Top Skills

AI
AWS
Azure
Docker
Elastic Stack
GCP
Go
Kubernetes
Ml
Opentelemetry
Prometheus
Python

Similar Jobs

5 Days Ago
Hybrid
Bengaluru, Karnataka, IND
Junior
Junior
Financial Services
As a Software Engineer II, you'll ensure system reliability and performance, automate processes, and collaborate on infrastructure maintenance for applications.
Top Skills: AWSAzureBashDockerElk StackGitlab CiGCPGrafanaJenkinsKubernetesMySQLNoSQLPostgresPrometheusPythonTerraform
25 Days Ago
Hybrid
Bengaluru, Karnataka, IND
Mid level
Mid level
Financial Services
The Software Engineer III role focuses on solving business problems with innovative solutions. Responsibilities include infrastructure management, collaborating with teams, and implementing design best practices.
Top Skills: .NetAWSDatadogDockerDynatraceEcsGitlabGrafanaJava/Spring BootJenkinsKubernetesPrometheusPythonSplunkSreTerraform
9 Days Ago
In-Office
2 Locations
Senior level
Senior level
Logistics • Transportation
The Senior SRE will enhance reliability engineering through AI, oversee incident management, improve automation, and drive best practices across teams.
Top Skills: AWSAzureDockerGCPGoKubernetesPython

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account