Cisco ThousandEyes

Senior Site Reliability Engineering Manager, Network Assurance Data Platform

Posted Yesterday

Be an Early Applicant

Easy Apply

Hybrid

Bengaluru, Bengaluru Urban, Karnataka

Senior level

Easy Apply

Hybrid

Bengaluru, Bengaluru Urban, Karnataka

Senior level

Lead site reliability engineering teams to enhance cloud, big data, and ML/AI infrastructure while ensuring reliability and operational excellence.

The summary above was generated by AI

Please note that we have a hybrid approach to work and would like to find someone who can come into our office in Bangalore at least two days per week.

Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end- user experiences.

ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

About the Role

As the Senior Site Reliability Engineering Manager for our Network Assurance Data Platform you will play a critical role in shaping and executing our cloud and big data, ML/AI infrastructure strategy, driving operational excellence, and ensuring the highest levels of system reliability and security. You will lead teams of talented engineers and collaborate closely with cross-functional teams, including software development, operations, and security, to design, build, and maintain our infrastructure, cloud platforms, and security practices, operating at a multi-region scale.

What You'll Do

Lead and inspire a talented team of site reliability engineers, fostering a culture of innovation, collaboration, and excellence in development and operation of infrastructure platforms
Drive the strategic vision for the development, implementation, and management of cloud, data, ML/AI platforms.
Collaborate closely with cross-functional teams, including development, product management, and security to define and implement reliable, secure, and scalable infrastructure platforms
Provide oversight and direction in the development and operation of cloud platforms, ensuring high-quality, scalable, and reliable solutions that meet customer needs
Drive operational excellence in operations and security processes
Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the site reliability engineering group

Qualifications

You have a deep understanding of the distributed systems design, cloud technology and their components, dependencies, and code that define infrastructure
You possess a deep understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
Extensive hands-on experience building cloud, big data and/or ML/AI infrastructure (e.g. EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc)
Extensive hands-on experience operating mission-critical services in production environments which are required to have high availability and reliability.
Proven ability to think strategically and align technical initiatives with business objectives
Can provide a strong technical vision for your teams and ensure consistent delivery of objectives
Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute shared goals
Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters
Proven site reliability engineering management experience leading multiple teams

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That's why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.
We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you're interested in this work.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Top Skills

Airflow

Aws Sagemaker

Comet Ml

Emr

Spark

Similar Jobs at Cisco ThousandEyes

Cisco ThousandEyes

Site Reliability Engineer, Network Assurance Data Platform

Yesterday

Easy Apply

Hybrid

Bengaluru, Karnataka, IND

Easy Apply

Mid level

Cloud • Software

The Site Reliability Engineer will ensure the reliability and scalability of cloud and big data platforms, collaborating with teams to optimize systems and enhance operational efficiency, while mentoring peers and shaping technical strategies.

Top Skills: AirflowAWSAws SagemakerComet MlEmrGoKubernetesPrometheusPythonSparkTerraformUnix/Linux

Cisco ThousandEyes

Senior Site Reliability Engineer, Network Assurance Data Platform

Yesterday

Easy Apply

Hybrid

Bengaluru, Karnataka, IND

Easy Apply

Senior level

Cloud • Software

As a Senior Site Reliability Engineer, you will ensure the reliability, scalability, and security of cloud and big data platforms, collaborating with cross-functional teams to optimize systems for ML and AI initiatives.

Top Skills: AirflowAWSAws SagemakerBig DataEmrGoKubernetesMl/AiPrometheusPythonSparkTerraformUnix/Linux

Cisco ThousandEyes

Senior Site Reliability Engineer II, Efficiency and Performance

5 Days Ago

Easy Apply

Hybrid

Bengaluru, Karnataka, IND

Easy Apply

Senior level

Cloud • Software

The Senior Site Reliability Engineer will optimize AWS cost intelligence, manage infrastructure for ThousandEyes, and ensure resource efficiency. Responsibilities include incident response and driving continuous service improvement.

Top Skills: AWSGoKubernetesPrometheusPythonTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.