Site Reliability Engineer, Cloud Platform

Posted 4 Hours Ago
Be an Early Applicant
Pune, Maharashtra
3-5 Years Experience
Information Technology • Security • Cybersecurity
The Role
Develop and participate in the full lifecycle development of cloud platform services, increase effectiveness and reliability of technologies, maintain and monitor system health, automate processes, participate in on-call rotations, drive efficiencies, and improve system reliability and scalability.
Summary Generated by Built In

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Site Reliability Engineer, Cloud Platform

Description

Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation and improvement by applying scientific principles.
Increase the effectiveness, reliability and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.
Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies
Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis
Requirements
4+ years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming language: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field
Bonus Points if you have:
Experience with managing large scale deployments of search engines like Elasticsearch
Experience with managing large scale deployments of message-oriented middleware such as Kafka
Experience with managing large scale deployments of RDBMS systems such as oracle
Experience with managing large scale deployments of NoSQL databases such as Cassandra
Experience with managing large scale deployments of In-memory caching using Redis, Memcached, etc.
Experience with container and orchestration technologies such as Docker, Kubernetes etc
Experience with monitoring tools such as Graphite, Grafana and Prometheus
Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
Experience with configuration management tools such as Chef, Puppet or Ansible
In-depth experience with continuous integration and continuous deployment pipelines
Exposure to Maven, Ant or Gradle for builds

Top Skills

Go
Java
Python
The Company
Shivaji Nagar, 411005
2,736 Employees
On-site Workplace
Year Founded: 1999

What We Do

Qualys, Inc. (NASDAQ: QLYS) is a pioneer and leading provider of disruptive cloud-based security, compliance and IT solutions with more than 10,000 subscription customers worldwide, including a majority of the Forbes Global 100 and Fortune 100. Qualys helps organizations streamline and automate their security and compliance solutions onto a single platform for greater agility, better business outcomes, and substantial cost savings.
The Qualys Cloud Platform leverages a single agent to continuously deliver critical security intelligence while enabling enterprises to automate the full spectrum of vulnerability detection, compliance, and protection for IT systems, workloads and web applications across on premises, endpoints, servers, public and private clouds, containers, and mobile devices. Founded in 1999 as one of the first SaaS security companies, Qualys has strategic partnerships and seamlessly integrates its vulnerability management capabilities into security offerings from cloud service providers, including Amazon Web Services, the Google Cloud Platform and Microsoft Azure, along with a number of leading managed service providers and global consulting organizations. For more information, please visit http://www.qualys.com

Jobs at Similar Companies

Fusion92 Logo Fusion92

Account Executive

AdTech • Agency • Digital Media • Enterprise Web • Marketing Tech • Analytics • Web3
IL, USA
263 Employees

ForeFlight Logo ForeFlight

Product Designer II

Aerospace • Software • App development
Remote
Austin, TX, USA
466 Employees

IonQ Logo IonQ

Lead Ion Trap Design Engineer

Artificial Intelligence • Hardware • Information Technology • Internet of Things • Software
Easy Apply
Seattle, WA, USA
305 Employees

Snap Inc. Logo Snap Inc.

Application Engineer, Salesforce UI

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
New York, NY, USA
5000 Employees

Similar Companies Hiring

CrowdStrike Thumbnail
Security • Sales • Information Technology • Cybersecurity • Cloud
Austin, TX
10000 Employees
LogicMonitor Thumbnail
Software • Machine Learning • Information Technology • Cloud • Artificial Intelligence
Santa Barbara, CA
1100 Employees
TransUnion Thumbnail
Information Technology • Fintech • Financial Services • Cybersecurity • Business Intelligence • Big Data Analytics • Big Data
Chicago, IL
15000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account