Tecsys

Site Reliability Engineer

Posted 7 Days Ago

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Bengaluru, Bengaluru Urban, Karnataka

Senior level

In-Office or Remote

Hiring Remotely in Bengaluru, Bengaluru Urban, Karnataka

Senior level

Seeking Site Reliability Engineers for NSOC to improve reliability and uptime of platforms through automation, monitoring, and collaboration with global teams.

The summary above was generated by AI

Description

About Tecsys

Tecsys is a global supply chain technology company that helps organizations achieve operational excellence through smarter supply chains. With a strong customer base across healthcare, retail, distribution, and complex logistics, we continue to grow our global footprint—and we’re excited to expand our team in India.

Earlier this year, we established Tecsys Supply Chain Solutions PVT Limited in Bangalore, further strengthening our global presence. This office builds on our existing India-based support capabilities by introducing new roles and functions that are critical to our 24/7 "follow the sun" global support model. This approach allows us to better serve customers across time zones while ensuring a balanced workload for our teams around the world.

Our growing India team plays a key role in supporting and enhancing our solutions, contributing to service delivery, innovation, and the ongoing success of some of the world’s most respected brands.

At Tecsys, we believe in empowering our people, fostering collaboration, and building a workplace where talent thrives. Join us and be part of a globally connected team that’s transforming the future of supply chain.

About the Role

We are looking for Site Reliability Engineers to join our Network and Security Operations Center (NSOC) team. The NOC team focuses on improving the reliability and uptime of our platform and applications in a data-driven way to meet the needs of both internal and external customers.

While this role formally reports to the Director of Support and Services in India, the role involves close coordination with the NOC team manager based in Canada. You’ll work with a high degree of autonomy and as this is a globally collaborative role, flexibility in working hours is essential to accommodate regular coordination and meetings with colleagues in North American time zones.

Responsibilities

Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Develop tools & automation on top of Azure & AWS to continuously reduce the need for manual intervention.
Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
Be on-call.
Practice sustainable incident response and blameless postmortems.
Implement automated solutions for continuous integration and delivery (CI / CD).
Implement monitoring, Logging, alerting, and SLA Reporting.
Implement service monitoring dashboards displaying key metrics.
Create and maintain technical documentation.
Apply SRE best practices.
Take command of high-severity incidents and facilitate their resolution.
Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth.
Collaborate with members of the Platform Engineering team to implement and support far-reaching strategic efforts, provide constructive feedback, and foster a collaborative environment.
Work cross-functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users.

Requirements

Qualifications

Bachelor's degree in computer science or related technical discipline.
At least 5 years’ experience in systems engineering experience; demonstrable technical experience in new platform development, orchestration, product ownership, and iterative design and deployment.
Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure.
Strong knowledge of system design; high performance computing; file, block, and storage technologies; integration of compute, storage, and network technologies to deliver cohesive infrastructure solutions.
High level of understanding and examples of executing projects with full stack automation; our scale is going to require a lot of it, we grow to use less manual intervention and work with both internal and open-source tools to automate day-to-day activities.
Self-organize, collaborate, and manage efforts with peers and teams across responsibility areas, languages, geography, and time zones.
Be a self-starter, curious, and not afraid to ask questions and challenge the way things are done today.
See a problem or opportunity, take ownership and act on it independently.
Knowledge of Datadog preferred (or at least, similar/equivalent product)
Knowledge of Rapid7 Insight preferred (or at least, similar/equivalent product)
Knowledge and experience of AWS or Azure required.
Basic knowledge of Java- or .Net-based development required.
Knowledge of GitLab (enterprise license) preferred (or at minimum, Jenkins required)
Experience with SaaS company is a strong asset.
Experience with FedRamp (The Federal Risk and Authorization Management Program) compliance is a strong asset.
Proficient English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues worldwide.

Additional requirements:

Escalation on-call rotation

At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.

Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.

Top Skills

.Net

AWS

Azure

Datadog

Gitlab

Java

Jenkins

Rapid7 Insight

Similar Jobs

Astreya

Site Reliability Engineer

7 Days Ago

Remote

India

Senior level

Information Technology

The role involves monitoring critical services, developing solutions for reliability, resolving technical issues, enhancing tools for incident management, and leading incident resolution efforts.

Top Skills: ElkGoGrafanaJavaKubernetesOpentelemetryPrometheusPython

Pythian

Site Reliability Engineer

10 Days Ago

Remote

Mid level

Cloud • Analytics

As a Site Reliability Engineer, you will optimize and operate large-scale distributed systems, automate workflows, and improve infrastructure resilience.

Top Skills: GoGCPGrafanaIstioKubernetesLinuxLokiPrometheusPythonShellTerraform

Granicus LLC

Site Reliability Engineer

11 Days Ago

In-Office or Remote

Bengaluru, Karnataka, IND

Senior level

Cloud • Marketing Tech • Professional Services • Social Impact • Software

The Site Reliability Engineer will ensure reliability and performance of services, automate processes, handle incidents, and collaborate with teams for system improvements.

Top Skills: AnsibleAWSAzureBashC++ChefCloudFormationGCPGoGrafanaJavaKubernetesLinuxNoSQLPrometheusPuppetPythonRubySplunkSQLTerraformUnix

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.