Orion Innovation Logo

Orion Innovation

Site Reliability Engineer

Reposted Yesterday
Be an Early Applicant
India
Mid level
India
Mid level
Design and maintain resilient deployment patterns, optimize logs and metrics, troubleshoot GKE workloads, and improve Terraform modules while collaborating with developers on delivery pipelines.
The summary above was generated by AI

Orion Innovation is a premier, award-winning, global business and technology services firm.  Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity.  We work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education.

Key Responsibilities
  • Design and maintain resilient deployment patterns (blue-green, canary, GitOps syncs) across services.
  • Instrument and optimize logs, metrics, traces, and alerts to reduce noise and improve signal.
  • Review backend code (e.g., Django, Node.js, Go, Java) with a focus on infra touchpoints like database usage, timeouts, error handling, and memory consumption.
  • Tune and troubleshoot GKE workloads, HPA configs, network policies, and node pool strategies.
  • Improve or author Terraform modules for infrastructure resources (e.g., VPC, CloudSQL, Secrets, Pub/Sub).
  • Diagnose production issues from logs, traces, dashboards, and lead or support incident response.
  • Reduce config drift across environments and standardize secrets, naming, and resource tagging.
  • Collaborate with developers to harden delivery pipelines, standardize rollout readiness, and clean up infra smells in code.
Requirements
  • Have 4–6+ years of experience in backend or infra-focused engineering roles (e.g., SRE, platform, DevOps, or fullstack).
  • Can confidently write or review production-grade code and infra-as-code (Terraform, Helm, GitHub Actions, etc.).
  • Have deep hands-on experience with Kubernetes in production, ideally on GKE, including workload autoscaling and ingress strategies.
  • Understand cloud concepts like IAM, VPCs, secret storage, workload identity, and CloudSQL performance characteristics.
  • Think in systems: you understand cascading failure, timeout boundaries, dependency health, and blast radius.
  • Regularly contribute to incident mitigation or long-term fixes (not just closing alerts).
  • Can influence through well-written PRs, documentation, and thoughtful design reviews.
Tools and Expectations
  • Datadog - Monitor infrastructure health, capture service-level metrics, reduce alert fatigue through high signal thresholds.
  • PagerDuty - Own incident management pipeline. Route alerts by severity and align with business SLAs.
  • GKE / Kubernetes - Improve cluster stability and workload isolation. Define auto-scaling configurations and tune for efficiency.
  • Helm / GitOps (ArgoCD/Flux) - Validate release consistency across clusters. Monitor sync status and rollout safety.

Orion is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, citizenship status, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Candidate Privacy Policy

Orion Systems Integrators, LLC and its subsidiaries and its affiliates (collectively, “Orion,” “we” or “us”) are committed to protecting your privacy. This Candidate Privacy Policy (orioninc.com) (“Notice”) explains:

  • What information we collect during our application and recruitment process and why we collect it;
  • How we handle that information; and
  • How to access and update that information.

Your use of Orion services is governed by any applicable terms in this notice and our general Privacy Policy.


Top Skills

Datadog
Django
Gitops
Gke
Go
Helm
Java
Kubernetes
Node.js
Pagerduty
Terraform

Similar Jobs

5 Hours Ago
Hybrid
Hyderabad, Telangana, IND
Junior
Junior
Financial Services
As a Site Reliability Engineer II, you will manage system reliability, execute projects, resolve incidents, and enhance automation and monitoring practices while collaborating with teams.
Top Skills: Cloud InfrastructureDynatraceGrafanaJenkinsLinuxSoftware EngineeringSplunkTerraformWindows
11 Hours Ago
Hybrid
Pune, Maharashtra, IND
Mid level
Mid level
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
The SRE will manage and support applications on Unix, write SQL queries, troubleshoot deployments, utilize monitoring tools, and provide production support for Java applications while possessing strong communication skills.
Top Skills: ChefCloudDevOpsDynatraceJenkinsLinuxPuppetSplunkSQL
13 Hours Ago
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Fintech • Financial Services
The Senior Systems Operations Engineer will lead operations within the systems area, manage installed systems, enhance system efficiencies, support production and application operations, and ensure high availability while utilizing automation and problem-solving skills.
Top Skills: Ansible)Apache HttpdApm)AppdAutomation Tools (ItrsAutosysCloud PlatformsDevOpsF5GrafanaIhsKafkaMonitoring Tools (AppdynamicsNasOpenshiftOraclePowershellPythonSanShell ScriptingSitescopeSplunkSQLUnix/LinuxVenafiWebsphere Application Server

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account