A.P. Moller - Maersk Logo

A.P. Moller - Maersk

Senior Software Engineer - SRE

Posted 2 Days Ago
Be an Early Applicant
In-Office
2 Locations
Senior level
In-Office
2 Locations
Senior level
The Senior SRE will enhance reliability engineering through AI, oversee incident management, improve automation, and drive best practices across teams.
The summary above was generated by AI

We are looking for a highly skilled and experienced Site Reliability Engineer (SRE) who will play a key role in transforming reliability engineering through AI-based innovation—while bringing deep expertise in core SRE practices.
 

This role is not just about applying AI; it’s about being a hands-on SRE first—someone who understands real-world operational pain points and knows how to drive systemic improvements through automation, observability, and intelligent tooling. You'll play a key role in institutionalizing SRE best practices and routines and embedding intelligence-driven operations into the engineering culture.
Your efforts will directly contribute to unifying reliability efforts across teams, enabling consistent engineering standards, and fostering a shared accountability model for service health. By driving operational discipline and aligning reliability goals with business priorities, you will help create a culture where platform stability, developer productivity, and customer experience go hand in hand. These contributions will play a vital role in supporting the organization's broader strategy—enabling faster innovation, scalable growth, and a resilient technology foundation aligned with long-term business outcomes.

Key Responsibilities

· Drive strategic initiatives to transform SRE capabilities through AI/ML innovation—while setting the vision for reliability engineering and operational excellence.

· Leverage AI and machine learning technologies to architect and oversee solutions that advance the overall SRE agenda—improving reliability, automation, observability, and operational efficiency across complex systems.

· Own, govern, and continuously improve incident management, change management, and release processes to ensure highest levels of stability, safety, and velocity.

· Lead and champion key SRE practices and routines—driving organization-wide adoption of SRE Community of Practice (CoP), SLA/SLO alignment, error budget governance, and data-driven process optimization.

· Guide and influence cross-functional teams including SREs, platform engineers to develop reliable, scalable AI/ML tools and frameworks.

· Oversee engineering strategies that improve service reliability, availability, and performance at scale.

· Define, build, and evangelize internal frameworks and tooling to accelerate AI/ML adoption across all reliability domains.

· Lead Zero-Touch Operations initiatives and roadmap, empowering platforms for autonomous issue detection and resolution.

· Leverage advanced metrics, telemetry, and incident data analytics to inform strategic decisions and build enterprise-grade reliability dashboards.

· Own on-call strategy, escalation policies, and incident response governance across teams.

· Drive security integration across all reliability workflows, leading vulnerability management, compliance, and collaboration with security leadership.

· Shape and own the AI-in-SRE strategic vision—serving as a thought leader and mentor to the entire SRE organization.

Required Skills & Qualifications

· Extensive experience (5+ years) as a senior SRE, Platform Engineer, or DevOps Engineer responsible for large-scale, complex distributed systems, with a strong understanding of AI/ML fundamentals and hands-on experience applying AI-powered tools.

· Automation-First Mindset: Demonstrated ability to drive end-to-end automation across incident response, change/release workflows, observability, and daily operations. Strong “never do it twice manually” attitude with a proven track record of eliminating toil through intelligent tooling, scripting, and systematic process optimization.

· Expert-level programming and scripting skills (Python, Go, or similar) with experience designing automation at scale.

· AI-Accelerated Mindset: You actively leverage modern AI tools (e.g., LLMs) to boost productivity, streamline development workflows, and augment traditional engineering tasks—demonstrating a willingness to adapt and innovate with evolving technologies.

· Mastery of core SRE principles including SLIs/SLOs, incident management governance, root cause analysis, scalability, fault tolerance, and capacity planning.

· Proven leadership in incident, change, and release management driving automation, auditability, and continuous service reliability improvements.

· Strategic ability to establish and evangelize reliability frameworks, rituals, and operational excellence aligned with enterprise-wide goals.

· Deep expertise in cloud architectures (AWS, GCP, Azure), container ecosystems (Docker), and orchestration platforms (Kubernetes).

· Advanced knowledge of observability systems and the ability to architect enterprise-grade monitoring and alerting solutions.

· In-depth understanding of Linux/Unix internals, performance optimization, and complex OS-level production troubleshooting.

· Strong grasp of networking, security best practices, vulnerability management, and compliance requirements.

· Experience influencing cross-team collaboration and mentoring junior engineers in SRE practices.


Preferred Skills

· LLM-Native Development Approach: Proficiency in using LLM-powered tools for research, automation, or code generation. Experience building custom AI-assisted automations or tools that deliver measurable engineering efficiency gains.

· Statistical Quality Verification: Hands-on experience with experimental design, statistical analysis, and scripting to measure the impact of system changes. Familiarity with confidence intervals, significance testing, and frameworks for validating probabilistic AI/ML models.

Maersk is committed to a diverse and inclusive workplace, and we embrace different styles of thinking. Maersk is an equal opportunities employer and welcomes applicants without regard to race, colour, gender, sex, age, religion, creed, national origin, ancestry, citizenship, marital status, sexual orientation, physical or mental disability, medical condition, pregnancy or parental leave, veteran status, gender identity, genetic information, or any other characteristic protected by applicable law. We will consider qualified applicants with criminal histories in a manner consistent with all legal requirements.

 

We are happy to support your need for any adjustments during the application and hiring process. If you need special assistance or an accommodation to use our website, apply for a position, or to perform a job, please contact us by emailing  [email protected]

Top Skills

AWS
Azure
Docker
GCP
Go
Kubernetes
Python

Similar Jobs

15 Hours Ago
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
Entry level
Entry level
Fintech • Financial Services
The Technology Program Analyst will engage in training, participate in low complexity initiatives, and review policies and procedures while gaining organizational knowledge.
15 Hours Ago
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
The Senior Value Engineer will support customer value journeys by identifying business challenges, conducting ROI analyses, and driving process improvements using the Celonis platform.
Top Skills: Business Intelligence ToolsPythonRpaSaaSSQL
15 Hours Ago
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
Mid level
Mid level
Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
The Technical Solutions Engineer works with Value Engineers to design technical solutions that align with client goals and ensure the adoption of the Celonis platform, driving innovation and collaboration.
Top Skills: CelonisMakePython

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account