Pattern

Senior Site Reliability Engineer

Posted 9 Days Ago

Be an Early Applicant

Pune, Maharashtra

Senior level

Pune, Maharashtra

Senior level

The Senior Site Reliability Engineer will build and manage scalable infrastructure, automate deployments, ensure system reliability, and collaborate with teams on innovative solutions.

The summary above was generated by AI

Job Description:

The role of our Site Reliability Engineer encompasses software, systems, and operations engineering. If you have a passion for constructing stable, scalable systems for an expanding array of innovative products and enjoy easing the deployment process for our engineering team, then Pattern is the ideal workplace for you. Join us in building a best-in-class platform for remarkable growth.

Key Responsibilities

Infrastructure and Automation
- Design, build, and manage scalable and reliable infrastructure in AWS (Postgres, Redis, Docker, Queues, Kinesis Streams, S3, etc.)
- Develop Python or shell scripts for automation, reducing operational toil.
- Implement and maintain CI/CD pipelines for efficient build and deployment processes using Github Actions.
Monitoring and Incident Response
- Establish robust monitoring and alerting systems using observability methods, logs, and APM tools.
- Participate in on-call rotations to respond to incidents, troubleshoot problems, and ensure system reliability.
- Perform root cause analysis on production issues and implement preventative measures to mitigate future incidents.
Cloud Administration
- Manage AWS resources, including Lambda functions, SQS, SNS, IAMs, RDS, etc.
- Perform Snowflake administration and set up backup policies for various databases.
Reliability Engineering
- Define Service Level Indicators (SLIs) and measure Service Level Objectives (SLOs) to maintain high system reliability.
- Utilise Infrastructure as Code (IaC) tools like Terraform for managing and provisioning infrastructure.
Collaboration and Empowerment
- Collaborate with development teams to design scalable and reliable systems.
- Empower development teams to deliver value quickly and accurately.
- Document system architectures, procedures, run books and best practices.
- Assist developers in creating automation scripts and workflows to streamline operational tasks and deployments.
Innovative Infrastructure Solutions
- Spearhead the exploration of innovative infrastructure solutions and technologies aligned with industry best practices.
- Embrace a research-based approach to continuously improve system reliability, scalability, and performance.
- Encourage a culture of experimentation to test and implement novel ideas for system optimization.

Required Qualifications

Bachelor’s degree in a technical field or relevant work experience.
6+ years of experience in engineering, development, DevOps/SRE fields.
3+ years experience deploying and managing systems using Amazon Web Services.
Proven “doer” attitude with the ability to self-start, and take a project to completion. Demonstrate project ownership.
Familiarity with container orchestration tools like Kubernetes, Fargate, etc.
Familiarity with Infrastructure as Code tooling like Terraform, CloudFormation, Ansible, Puppet.
Experience working with CI/CD automated deployments using tools like Github Actions, Jenkins, CircleCI.
Experience working on observability tools like Datadog, NewRelic, Dynatrace, Grafana, Prometheus, etc.
Experience with Linux server management, bash scripting, SSH keys, SSL/TLS certificates, MFA, cron, and log files.
Deep understanding of AWS networking (VPCs, subnets, security groups, route tables, internet gateways, NAT gateways, NACLs), IAM policies, DNS, Route53, and domain management.
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration abilities.
Desire to help take Pattern to the next level through exploration and innovation.

Preferred Qualifications

Experience in deploying applications on ECS, Fargate with ELB/ALB and Auto Scaling Groups.
Experience in deploying serverless applications with Lambda, API Gateway, Cognito, CloudFront.
Experience in deploying applications built using JavaScript, Ruby, Go, Python.
Experience with Infrastructure as Code (IaC) using Terraform.
Experience with database administration for Snowflake, Postgres.
AWS Certification would be a plus.
A focus on adopting security best practices while building great tools.

What We're About

Data Fanatics: Our edge is always found in the data
Partner Obsessed: We are obsessed with partner success
Team of Doers: We have a bias for action
Game Changers: We encourage innovation

Pattern is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Top Skills

AWS

CircleCI

Datadog

Docker

Dynatrace

Fargate

Github Actions

Grafana

Jenkins

Kubernetes

Linux

Newrelic

Postgres

Prometheus

Python

Redis

Shell

Snowflake

Terraform

Similar Jobs

Integral Ad Science

Senior SRE

10 Days Ago

Easy Apply

Hybrid

Pune, Maharashtra, IND

Easy Apply

Senior level

AdTech • Big Data • Digital Media • Marketing Tech

The Senior Site Reliability Engineer will enhance platform stability, collaborate on system performance, implement automation tools, and troubleshoot incidents while ensuring reliability and availability.

Top Skills: AirflowBashCloudFormationGitlab CiHadoopHiveJenkinsNoSQLPuppetPythonSparkSQLTerraform

Baker Hughes

Senior Site Reliability Engineer- Cloud Platform

4 Days Ago

Senior level

Energy

The Senior Site Reliability Engineer will build and support Cloud infrastructure automation solutions, ensuring security, monitoring, and improving cloud services while collaborating with cross-functional teams.

Top Skills: AnsibleAppdynamicsAWSAws CodepipelineAzureAzure DevopsChefConfluenceDockerDynatraceElk StackGitGitlab Ci/CdGrafanaJenkinsJIRAKubernetesLinux Shell ScriptingPrometheusPythonSplunkTerraformWindows Powershell

ISS (Institutional Shareholder Services)

Senior Site Reliability Engineer

4 Days Ago

Mumbai, Maharashtra, IND

Senior level

Fintech

The Senior Site Reliability Engineer will enhance application performance, mentor team members, implement observability, and manage incidents.

Top Skills: AirflowApigeeCi/CdData DogElkGithub ActionsGrafanaPrometheusPythonSplunkTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.