Fortrea

Platform Reliability & Observability Lead (SRE)

Posted 2 Days Ago

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Bangalore, Bengaluru Urban, Karnataka

Expert/Leader

In-Office or Remote

Hiring Remotely in Bangalore, Bengaluru Urban, Karnataka

Expert/Leader

The Platform Reliability & Observability Lead (SRE) enhances operational excellence by ensuring reliability, managing observability strategies, automation, and incident management across cloud environments.

The summary above was generated by AI

Job Overview:

The Platform Reliability & Observability Lead (SRE) will own and elevate the reliability, availability, and operational excellence of its hosting and platform services. This is an engineering led role, accountable for measurable reliability outcomes across cloud and hybrid environments supporting regulated clinical workloads. The role leads observability strategy, SLO and error budget programs, incident automation, and root cause engineering, ensuring platforms are resilient, predictable, compliant, and scalable. This position is critical to enabling Operational Excellence, Embedded Quality, Financial Discipline, and Customer Trust.

Summary of Responsibilities:

Engineer reliability into hosting and platform services through design reviews, resilience patterns, and readiness assessments.
Define and enforce standards for availability, latency, durability, recoverability, and scalability.
Own end‑to‑end observability strategy, including metrics, logs, traces, alerting, dashboards, and service health reporting.
Establish and operationalize SLIs, SLOs, and error budgets to guide prioritization, release readiness, and risk decisions.
Design and automate incident detection, triage, mitigation, rollback, and diagnostics to improve MTTD and MTTR.
Lead blameless post‑incident reviews, identify systemic issues, and drive remediation to closure.
Reduce operational toil through automation, engineering rigor, and self‑service tooling.
Partner with cloud, hosting, IaC, and application teams to embed reliability into the SDLC.

Qualifications (Minimum Required):

Bachelor’s degree in computer science, Computer Engineering, or a related field.
Excellent communication and public speaking skills, with the ability to present complex architectural concepts to senior leadership, technical teams, and non‑technical stakeholders.
Fortrea may consider relevant and equivalent experience in lieu of educational requirements.

Required skills (Minimum Required):

9+ years in Site Reliability Engineering, Platform Engineering, or Production Engineering.
Proven ownership of production reliability in cloud or hybrid platforms.
Strong foundations in distributed systems, Linux, networking, and system internals.
Hands‑on experience with observability architectures and alerting best practices.
Strong expertise in SLIs, SLOs, SLAs, and error budgets.
Proficiency in Python, Go, Java, or equivalent, with a strong automation mindset.
Experience with Azure (preferred), AWS, or GCP
Experience with Kubernetes and Infrastructure as Code (Terraform, Bicep, ARM, etc.)

Preferred Qualifications Include:

Regulated or GxP environments.
Open Telemetry, distributed tracing, and service dependency mapping.
Chaos engineering, DR testing, or resilience validation.
FinOps and cost‑aware reliability engineering.
Building shared reliability or observability platforms.

Physical Demands / Work Environment:

Remote-Based, as requested by the line manager
Work Timings: 2:00 PM IST to 11.00 PM IST

Learn more about our EEO & Accommodations request here.

Top Skills

Arm

AWS

Azure

Bicep

GCP

Java

Kubernetes

Python

Terraform

Similar Jobs

Circle (circle.so)

Customer Success Manager

2 Hours Ago

Easy Apply

Remote

India

Easy Apply

Mid level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

As a Customer Success Manager at Circle Plus, you will guide strategic customers through onboarding, product adoption, and value creation, while managing their community success and consulting on best practices.

Top Skills: CanvaGoogle SuiteHubspotNotionZapier

ServiceNow

Consultant

2 Hours Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The role involves managing technical delivery of ServiceNow solutions, optimizing IT asset management processes, mentoring colleagues, and engaging with stakeholders to drive customer satisfaction and adoption.

Top Skills: AnsibleAWSAzureChefHTTPJavaScriptOauthPowershellPuppetPythonRestServicenowSnmpSoapSsoTcpVMwareXML

BlackLine

Data Engineer

2 Hours Ago

Remote or Hybrid

Expert/Leader

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI

Lead architecture, design, development, and delivery of enterprise data platform components. Build and optimize ETL/ELT pipelines, data models, and warehouses (Snowflake). Mentor engineers, drive data governance, ensure performance, security, and scalability, and collaborate cross-functionally to adopt best practices and new tools.

Top Skills: Cloud-Native (Google Cloud Preferred)CtesData WarehouseElt/Etl ToolsJavaNoSQLPythonRdbmsSnowflakeSQLStored ProceduresUdfs

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.