The Site Reliability Engineer III will ensure application reliability and stability, manage incidents, and develop automation while promoting team collaboration and operational excellence.
Job Description
As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications. You'll participate in incident management, troubleshooting, and continuous improvement, and help implement automation and monitoring solutions. On-call rotation is part of the role, requiring effective action during production incidents and a commitment to operational excellence. You'll share knowledge, follow best practices, and contribute to a culture of learning and innovation. We value team players who communicate clearly, solve problems proactively, and focus on customer needs.
Job responsibilities
Required qualifications, capabilities, and skills
Preferred qualifications, capabilities, and skills
As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications. You'll participate in incident management, troubleshooting, and continuous improvement, and help implement automation and monitoring solutions. On-call rotation is part of the role, requiring effective action during production incidents and a commitment to operational excellence. You'll share knowledge, follow best practices, and contribute to a culture of learning and innovation. We value team players who communicate clearly, solve problems proactively, and focus on customer needs.
Job responsibilities
- Design, develop, and operate solutions for application reliability, monitoring, and automation.
- Execute incident response, troubleshooting, and root cause analysis to resolve production issues and improve system stability.
- Build and maintain CI/CD pipelines using Jenkins (including global libraries), and implement infrastructure as code with Terraform.
- Develop and support containerized applications using Docker and Kubernetes, ensuring robust deployments and scalability.
- Implement and maintain observability solutions using tools such as Grafana, Prometheus, Splunk, and OpenTelemetry.
- Collaborate with engineering and support teams to drive continuous improvement and operational excellence.
- Participate in on-call rotation, responding to production incidents and ensuring timely resolution.
Required qualifications, capabilities, and skills
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
- Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
- Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).
- Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
- Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
- Willingness to participate in on-call rotation and respond to production incidents.
- Ability to break down issues, document solutions, and communicate effectively with team members and customers.
Preferred qualifications, capabilities, and skills
- Familiar in banking, fintech, or regulated environments.
- Participation in game days or chaos engineering.
- Interest in sharing knowledge and best practices with peers.
Top Skills
AWS
Azure
Docker
GCP
Git
Grafana
Jenkins
Kubernetes
Opentelemetry
Prometheus
Splunk
Terraform
Similar Jobs at JPMorganChase
Financial Services
As a Site Reliability Engineer III, you will optimize applications, implement deployment approaches, and enhance reliability and scalability through collaboration and infrastructure automation.
Top Skills:
.NetDatadogDockerDynatraceEcsGitlabGrafanaJavaJenkinsKubernetesPrometheusPythonSplunkSpring BootTerraform
Financial Services
As a Site Reliability Engineer III, you'll solve complex problems by optimizing applications and infrastructure, guiding design approaches, and collaborating with teams on deployments and best practices.
Top Skills:
AWSAzureDevOpsDockerGCPGitGrafanaJenkinsKubernetesOpentelemetryPrometheusSite Reliability EngineeringSlisSlosSplunkTerraform
Financial Services
As a Site Reliability Engineer III, you will maintain and optimize applications and infrastructure, guide peers, and enhance reliability and scalability through collaboration and automation tools.
Top Skills:
AWSAzureDockerGCPGitGrafanaJenkinsKubernetesOpentelemetryPrometheusSplunkTerraform
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.