Smarsh

Sr. Site Reliability Engineer

Posted 4 Days Ago

Be an Early Applicant

Hybrid

India

Senior level

Hybrid

India

Senior level

The Senior Site Reliability Engineer will enhance production performance and reliability through engineering practices while focusing on observability and minimizing operational toil.

The summary above was generated by AI

Who are we?

Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80+ communication channels before those risks become regulatory fines or headlines. Relentless innovation has fueled our journey to consistent leadership recognition from analysts like Gartner and Forrester, and our sustained, aggressive growth has landed Smarsh in the annual Inc. 5000 list of fastest-growing American companies since 2008.

About the team

Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.

Responsibilities

Attend and actively participate in team ceremonies (stand-ups, retros, and planning meetings). Occasionally run these meetings.
Respond to incidents coordinated by SRE and Incident Response teams. Act as a Incident Commander during incidents.
Help define technology choices, best practices and process for the team.
Develop and maintain documentation standard for the team.
Develop tools and libraries for broader use by SaaS Operations and Engineering teams.
Enable engineering teams to discover and understand problems quicker.
Work closely with Engineering and peer SRE teams to design and use Smarsh coding standards and best practices.
Work with product architects and make suggestions for architectural changes and design platform component roadmaps.
Coordinate with other senior leaders to help set process and direction for the platform as a whole.
Develop new and novel DevOps tools and systems that aren't used anywhere else.
Demonstrate technical leadership to groups inside and outside the company.
Assist engineering teams in deep troubleshooting and application code review to find opportunities to improve performance and scalability.
Collaborate with team in US hours and provide support if needed over weekends.
Act as a subject matter expert (SME) for the majority of all platform components.
Adopt and embrace qualities of an SRE as defined in our team charter. Help set them for the rest of the team.
Mentor and train junior members of the organization. Design training curriculum for the Ops organization as a whole.

Desired skills & experience

A minimum 8-10 years industry experience
Masters in CS or equivalent desired
Cloud infrastructure, Identity management, and networking experience (GCP)
Experience managing Elasticsearch and Hadoop infrastructure
Experience managing MySql database
Experience with Ansible and Terraform
Experience with builds and packaging in a Linux and Java environment strongly preferred
Broad range of programming/scripting experience (i.e. Python, Bash, Go, etc.).
Strong background in managing code with Git
Experience managing continuous integration systems(Jenkins)
Experience with automated configuration management and deployment tool
Background working in a multi-platform environment (Linux, Windows.)
Experience with containerization (Docker, Kubernetes, etc.)
Experience with Datadog

Additional Skills

Exceptional analytical and problem-solving skills
Expert administrator and/or expert at programming skills in relevant languages
Strong communication and collaboration skills
Deep understanding of modern software architecture
Deep domain knowledge of the industry, platform, and existing processes
Fault-tolerant design & maintenance
Knowledge and understanding of modern software programming/engineering
Product delivery lifecycle - requirement refinement through ops

About our culture

Smarsh hires lifelong learners with a passion for innovating with purpose, humility and humor. Collaboration is at the heart of everything we do. We work closely with the most popular communications platforms and the world’s leading cloud infrastructure platforms. We use the latest in AI/ML technology to help our customers break new ground at scale. We are a global organization that values diversity, and we believe that providing opportunities for everyone to be their authentic self is key to our success. Smarsh leadership, culture, and commitment to developing our people have all garnered Comparably.com Best Places to Work Awards. Come join us and find out what the best work of your career looks like.

Top Skills

Containerization

Observability

Orchestration

Similar Jobs

Integral Ad Science

Senior SRE

9 Days Ago

Easy Apply

Hybrid

Pune, Maharashtra, IND

Easy Apply

Senior level

AdTech • Big Data • Digital Media • Marketing Tech

The Senior Site Reliability Engineer will enhance platform stability, collaborate on system performance, implement automation tools, and troubleshoot incidents while ensuring reliability and availability.

Top Skills: AirflowBashCloudFormationGitlab CiHadoopHiveJenkinsNoSQLPuppetPythonSparkSQLTerraform

Cisco ThousandEyes

Senior Site Reliability Engineer II, Efficiency and Performance

8 Days Ago

Easy Apply

Hybrid

Bengaluru, Karnataka, IND

Easy Apply

Senior level

Cloud • Software

The Senior Site Reliability Engineer will optimize AWS cost intelligence, manage infrastructure for ThousandEyes, and ensure resource efficiency. Responsibilities include incident response and driving continuous service improvement.

Top Skills: AWSGoKubernetesPrometheusPythonTerraform

Cisco ThousandEyes

Senior Site Reliability Engineer I, Efficiency and Performance

8 Days Ago

Easy Apply

Hybrid

Bengaluru, Karnataka, IND

Easy Apply

Senior level

Cloud • Software

The Senior Site Reliability Engineer will optimize AWS costs, manage infrastructure, and enhance service reliability and performance for the ThousandEyes platform.

Top Skills: AWSGoKubernetesPrometheusPythonTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.