Qualys Logo

Qualys

Lead Site Reliability Engineer, DevOps

Posted 19 Days Ago
Be an Early Applicant
In-Office
Pune, Mahārāshtra
Senior level
In-Office
Pune, Mahārāshtra
Senior level
The Senior Site Reliability Engineer will enhance observability and reliability in large distributed systems through monitoring, incident response, and automation.
The summary above was generated by AI

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Job Title

Senior Site Reliability Engineer (SRE) – Observability & DevOps

Role Summary

We are looking for a Senior SRE who will own and evolve our observability and reliability platform. The ideal candidate has strong Linux fundamentals, hands-on experience with modern monitoring stacks, and the ability to design scalable alerting and metrics pipelines for large, distributed systems.

This role requires both deep technical expertise and production ownership mindset.

Primary ResponsibilitiesObservability & Monitoring
  • Design, implement, and maintain end-to-end observability using:
    • Prometheus for metrics collection
    • Alertmanager for alert routing, deduplication, and escalation
    • Grafana for visualization and dashboards
    • AppDynamics for APM, transaction tracing, and application health
  • Build actionable dashboards for:
    • SLIs, SLOs, and error budgets
    • Application, infrastructure, and platform health
  • Reduce alert fatigue by implementing signal-based alerting and proper severity models
Data & Metrics Platform
  • Manage and optimize ClickHouse for:
    • High-volume metrics, logs, or traces
    • Long-term retention and fast analytical queries
  • Work on schema design, performance tuning, and cost optimization
Reliability & Operations
  • Define and measure SRE best practices (SLIs, SLOs, SLAs)
  • Participate in incident response, postmortems, and root cause analysis
  • Drive reliability improvements through automation and capacity planning
Automation & Engineering
  • Develop tooling and automation using at least one scripting/programming language
  • Automate monitoring onboarding, alert generation, dashboard creation
  • Improve operational efficiencies across DevOps tooling
Required Technical Skills (Must-Have)Core Skills
  • Strong Linux fundamentals
    • Troubleshooting, performance tuning, networking, system internals
  • Scripting / Programming (Any one or more):
    • Python (preferred), Bash, Go, or similar
  • Observability Tools (Hands-on):
    • Prometheus
    • Alertmanager
    • Grafana
    • AppDynamics
  • Data Platform:
    • Hands-on experience with ClickHouse
Monitoring & Alerting Concepts
  • Metrics vs logs vs traces
  • Golden signals (latency, traffic, errors, saturation)
  • Alert thresholds, routing policies, escalation strategies
Preferred / Nice-to-Have Skills
  • Kubernetes monitoring (Prometheus Operator, kube-state-metrics)
  • Infrastructure as Code (Terraform, Helm)
  • CI/CD observability
  • Cloud platforms (AWS / Azure / GCP)
  • Experience managing observability at scale (100+ services / platforms)
Senior-Level Expectations
  • Ability to architect observability solutions, not just operate them
  • Strong production troubleshooting and incident ownership
  • Mentoring junior engineers
  • Influence DevOps and SRE best practices across teams
  • Communicate clearly with developers and leadership
Experience & Qualification
  • 5-7 years of experience in SRE / DevOps / Production Engineering
  • Experience operating high-availability, large-scale systems
  • Proven background in observability-driven reliability improvements

Top Skills

Alertmanager
Appdynamics
Bash
Clickhouse
Go
Grafana
Prometheus
Python

Qualys Pune, Mahārāshtra, IND Office

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park, Balewadi, Pune, Maharashtra , India, 411045

Qualys Shivaji Nagar, Maharashtra, IND Office

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park,, Shivaji Nagar, 411005, India

Similar Jobs

An Hour Ago
Hybrid
Pune, Mahārāshtra, IND
Senior level
Senior level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Design and code AI, cloud, and machine learning solutions. Build scalable software, engage in prioritization, and mentor team members.
Top Skills: AWSAzureCheckmarxJavaJenkinsJfrog XraySonarSpring BootSQLVeracode
An Hour Ago
In-Office
Pune, Mahārāshtra, IND
Senior level
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
As an Engineering Manager, you'll lead software engineering teams, improve product quality and stability, and mentor employees while driving customer-centric initiatives and project delivery.
Top Skills: Data ProtectionMachine LearningSecurityStorage
An Hour Ago
Easy Apply
Hybrid
Pune, Mahārāshtra, IND
Easy Apply
Junior
Junior
Healthtech • Information Technology • Software • Telehealth
The Provider Data Operations Associate will maintain a database of provider profiles, ensure data accuracy, analyze datasets, and comply with privacy standards, requiring attention to detail and SQL skills.
Top Skills: ExcelGoogle SheetsSQL

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account