Lead SRE at JPMorgan Chase focusing on site reliability, defining requirements, managing incidents, mentoring, and developing AI/ML solutions.
Job Description
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Principal Site Reliability Engineer at JPMorgan Chase within the AI/ML & Data platform team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production.
Job responsibilities
Required qualifications, capabilities, and skills
Preferred qualifications, capabilities, and skills
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Principal Site Reliability Engineer at JPMorgan Chase within the AI/ML & Data platform team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products' design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production.
Job responsibilities
- Demonstrate expertise in application development and support with multiple technologies such as Databricks, Snowflake, AWS, Kubernetes, etc.
- Coordinate incident management coverage to ensure effective resolution of application issues.
- Collaborate with cross-functional teams to perform root cause analysis and implement production changes.
- Mentor and guide team members to foster innovation and strategic change.
- Develop and support AI/ML solutions for troubleshooting and incident resolution.
Required qualifications, capabilities, and skills
- Formal training or certification on SRE concepts and 5+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficiency in running production incident calls and managing incident resolution.
- Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
- Strong understanding of SLI/SLO/SLA and Error Budgets
- Proficiency in Python or PySpark for AI/ML modeling.
- Must be able to reduce toil by building new tools to automate repeated tasks.
- Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
- Understanding of network topologies, load balancing, and content delivery networks.
- Awareness of risk controls and compliance with departmental and company-wide standards.
- Ability to work collaboratively in teams and build meaningful relationships to achieve common goals.
Preferred qualifications, capabilities, and skills
- SRE or production support role with AWS Cloud, Databricks, Snowflake or similar Technologies.
- AWS and Databricks certifications.
Top Skills
AWS
Databricks
Datadog
Dynatrace
Grafana
Kubernetes
Prometheus
Pyspark
Python
Snowflake
Splunk
Similar Jobs at JPMorganChase
Financial Services
The Manager of Software Engineering will lead multiple teams, guide daily activities, manage stakeholder relationships, and ensure compliance while fostering a diverse culture.
Top Skills:
AWSJavaReactSpring
Financial Services
As a Software Engineer II, you will design, develop, and troubleshoot software components securely and efficiently while applying Agile methodologies.
Top Skills:
AWSCi/CdJavaReactSpringSQL
Financial Services
Lead multiple software engineering teams at JPMorgan Chase, overseeing tasks, ensuring compliance, and fostering a diverse team culture, with a focus on delivering secure applications.
Top Skills:
AWSJavaReactSpring
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

