Qualys

Site Reliability Engineer, Cloud Platform

Posted 8 Days Ago

Be an Early Applicant

Pune, Maharashtra

Senior level

Pune, Maharashtra

Senior level

The Site Reliability Engineer will co-develop and enhance cloud platform services, focusing on performance, reliability, and automation. Responsibilities include system design, monitoring, incident response, process improvement, and leading post-mortem analyses, as well as maintaining production systems and supporting new features.

The summary above was generated by AI

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Site Reliability Engineer, Cloud Platform

Description

Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation and improvement by applying scientific principles.
Increase the effectiveness, reliability and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.
Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies
Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis
Requirements
4+ years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming language: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field
Bonus Points if you have:
Experience with managing large scale deployments of search engines like Elasticsearch
Experience with managing large scale deployments of message-oriented middleware such as Kafka
Experience with managing large scale deployments of RDBMS systems such as oracle
Experience with managing large scale deployments of NoSQL databases such as Cassandra
Experience with managing large scale deployments of In-memory caching using Redis, Memcached, etc.
Experience with container and orchestration technologies such as Docker, Kubernetes etc
Experience with monitoring tools such as Graphite, Grafana and Prometheus
Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
Experience with configuration management tools such as Chef, Puppet or Ansible
In-depth experience with continuous integration and continuous deployment pipelines
Exposure to Maven, Ant or Gradle for builds

Top Skills

Java

Python

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park, Balewadi, Pune, Maharashtra , India, 411045

Survey No. 20, 10th to 16th Floor, Tower B Panchshil Business Park,, Shivaji Nagar, 411005, India

Similar Jobs

Senior Cloud Site Reliability Engineer

Be an Early Applicant

12 Hours Ago

Pune, Maharashtra, IND

Hybrid

13,000 Employees

Mid level

Apply

13,000 Employees

Mid level

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting

As a Senior Cloud Site Reliability Engineer, you will analyze, maintain, and nurture Cloud solutions/products. You will coordinate emergency responses, conduct root cause analysis, and identify improvements for system performance. You are expected to promote industry best practices, troubleshoot across infrastructure and software stacks, and collaborate with teams to enhance the quality and reliability of cloud services.

Morningstar

Site Reliability Engineer

Be an Early Applicant

2 Days Ago

Navi Mumbai, Thane, Maharashtra, IND

Hybrid

12,700 Employees

Mid level

Apply

12,700 Employees

Mid level

Enterprise Web • Fintech • Financial Services

The Site Reliability Engineer will onboard users to observability platforms, ensure best practices are followed, collaborate with teams to educate on observability features, assist with anomaly analysis, automate tasks, and maintain operational documentation.

JPMorganChase

Site Reliability Engineer III

Be an Early Applicant

2 Days Ago

Mumbai, Maharashtra, IND

Hybrid

289,097 Employees

Mid level

Apply

289,097 Employees

Mid level

Financial Services

As a Site Reliability Engineer III at JPMorgan Chase, you will optimize applications and infrastructure, develop deployment strategies using CI/CD, and enhance reliability and scalability. You will collaborate with teams to solve complex problems, support SRE best practices, and implement infrastructure as code.

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Qualys

Site Reliability Engineer, Cloud Platform

Top Skills

Qualys Pune, Mahārāshtra, IND Office

Qualys Shivaji Nagar, Maharashtra, IND Office

Similar Jobs

Senior Cloud Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer III

What you need to know about the Pune Tech Scene