Qualys

Lead Site Reliability Engineer, DevOps

Posted 19 Days Ago

Be an Early Applicant

In-Office

Pune, Mahārāshtra

Senior level

In-Office

Pune, Mahārāshtra

Senior level

The Senior Site Reliability Engineer will enhance observability and reliability in large distributed systems through monitoring, incident response, and automation.

The summary above was generated by AI

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Job Title

Senior Site Reliability Engineer (SRE) – Observability & DevOps

Role Summary

We are looking for a Senior SRE who will own and evolve our observability and reliability platform. The ideal candidate has strong Linux fundamentals, hands-on experience with modern monitoring stacks, and the ability to design scalable alerting and metrics pipelines for large, distributed systems.

This role requires both deep technical expertise and production ownership mindset.

Primary ResponsibilitiesObservability & Monitoring

Design, implement, and maintain end-to-end observability using:
- Prometheus for metrics collection
- Alertmanager for alert routing, deduplication, and escalation
- Grafana for visualization and dashboards
- AppDynamics for APM, transaction tracing, and application health
Build actionable dashboards for:
- SLIs, SLOs, and error budgets
- Application, infrastructure, and platform health
Reduce alert fatigue by implementing signal-based alerting and proper severity models

Data & Metrics Platform

Manage and optimize ClickHouse for:
- High-volume metrics, logs, or traces
- Long-term retention and fast analytical queries
Work on schema design, performance tuning, and cost optimization

Reliability & Operations

Define and measure SRE best practices (SLIs, SLOs, SLAs)
Participate in incident response, postmortems, and root cause analysis
Drive reliability improvements through automation and capacity planning

Automation & Engineering

Develop tooling and automation using at least one scripting/programming language
Automate monitoring onboarding, alert generation, dashboard creation
Improve operational efficiencies across DevOps tooling

Required Technical Skills (Must-Have)Core Skills

Strong Linux fundamentals
- Troubleshooting, performance tuning, networking, system internals
Scripting / Programming (Any one or more):
- Python (preferred), Bash, Go, or similar
Observability Tools (Hands-on):
- Prometheus
- Alertmanager
- Grafana
- AppDynamics
Data Platform:
- Hands-on experience with ClickHouse

Monitoring & Alerting Concepts

Metrics vs logs vs traces
Golden signals (latency, traffic, errors, saturation)
Alert thresholds, routing policies, escalation strategies

Preferred / Nice-to-Have Skills

Kubernetes monitoring (Prometheus Operator, kube-state-metrics)
Infrastructure as Code (Terraform, Helm)
CI/CD observability
Cloud platforms (AWS / Azure / GCP)
Experience managing observability at scale (100+ services / platforms)

Senior-Level Expectations

Ability to architect observability solutions, not just operate them
Strong production troubleshooting and incident ownership
Mentoring junior engineers
Influence DevOps and SRE best practices across teams
Communicate clearly with developers and leadership

Experience & Qualification

5-7 years of experience in SRE / DevOps / Production Engineering
Experience operating high-availability, large-scale systems
Proven background in observability-driven reliability improvements

Top Skills

Alertmanager

Appdynamics

Bash

Clickhouse

Grafana

Prometheus

Python

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park, Balewadi, Pune, Maharashtra , India, 411045

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park,, Shivaji Nagar, 411005, India

Similar Jobs

Mastercard

Senior Software Engineer

An Hour Ago

Hybrid

Pune, Mahārāshtra, IND

Senior level

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing

Design and code AI, cloud, and machine learning solutions. Build scalable software, engage in prioritization, and mentor team members.

Top Skills: AWSAzureCheckmarxJavaJenkinsJfrog XraySonarSpring BootSQLVeracode

Rubrik

Engineering Manager

An Hour Ago

In-Office

Pune, Mahārāshtra, IND

Senior level

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy

As an Engineering Manager, you'll lead software engineering teams, improve product quality and stability, and mentor employees while driving customer-centric initiatives and project delivery.

Top Skills: Data ProtectionMachine LearningSecurityStorage

Zocdoc

Operations Associate

An Hour Ago

Easy Apply

Hybrid

Pune, Mahārāshtra, IND

Easy Apply

Junior

Healthtech • Information Technology • Software • Telehealth

The Provider Data Operations Associate will maintain a database of provider profiles, ensure data accuracy, analyze datasets, and comply with privacy standards, requiring attention to detail and SQL skills.

Top Skills: ExcelGoogle SheetsSQL

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Qualys

Lead Site Reliability Engineer, DevOps

Top Skills

Qualys Pune, Mahārāshtra, IND Office

Qualys Shivaji Nagar, Maharashtra, IND Office

Similar Jobs

Senior Software Engineer

Engineering Manager

Operations Associate

What you need to know about the Pune Tech Scene