The DevOps Engineer will design, build, and manage secure platforms for AI/ML workloads on Azure and AWS, ensuring reliability and scalability. Responsibilities include automating CI/CD processes, managing cloud infrastructure, and collaborating with Data Science teams for optimal AI service delivery.
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
We are seeking an accomplished DevOps Engineer to design, build, and operate secure, scalable, and automated platforms that support advanced AI/ML and Generative AI workloads across Azure and AWS, with solid capability to interoperate with GCP. You will own CI/CD, infrastructure-as-code, container orchestration, observability, and reliability engineering, partnering with Data Science and Security teams to deliver responsible, reliable AI services for healthcare analytics.
Role Summary
We're looking for a DevOps Engineer to design, build, and operate secure, scalable, and cost efficient platform capabilities for AI/ML and GenAI workloads on Azure and AWS.
Primary Responsibilities:
Required Qualifications:
Preferred Qualifications:
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
#Nic
We are seeking an accomplished DevOps Engineer to design, build, and operate secure, scalable, and automated platforms that support advanced AI/ML and Generative AI workloads across Azure and AWS, with solid capability to interoperate with GCP. You will own CI/CD, infrastructure-as-code, container orchestration, observability, and reliability engineering, partnering with Data Science and Security teams to deliver responsible, reliable AI services for healthcare analytics.
Role Summary
We're looking for a DevOps Engineer to design, build, and operate secure, scalable, and cost efficient platform capabilities for AI/ML and GenAI workloads on Azure and AWS.
- Manage and operate cloud infrastructure to ensure reliability, scalability, and cost efficiency of applications and AI services
- Plan and execute CI/CD pipelines across the lifecycle - plan → code → build → test → stage → release → config → monitor
- Onboard applications to the DevOps toolchain; standardize golden paths and reusable Terraform modules and Helm charts
- Automate testing and deployments end-to-end; enforce trunk-based development and automated quality gates
- Collaborate with developers to integrate application code with OS/runtime and production infrastructure (container images, base OS hardening, dependencies)
- Provide timely support on DevOps tooling; resolve incidents and requests within SLAs and follow the escalation matrix; perform RCA and implement durable fixes
Primary Responsibilities:
- Platform, Automation & Reliability
- Design, provision, and operate production-grade AKS (Azure) and EKS (AWS) clusters; implement autoscaling, multi-AZ/region topologies, and safe upgrades
- Implement Infrastructure-as-Code with Terraform/Terragrunt and Helm; enforce GitOps with Argo CD or Flux for declarative, auditable changes
- Build CI/CD with GitHub Actions and Azure DevOps; also support Jenkins, GitLab CI/CD; manage artifact provenance and deployment strategies (blue/green, canary)
- Establish observability using OpenTelemetry, Prometheus/Grafana, ELK/OpenSearch, Azure Monitor, and CloudWatch; define SLOs/SLIs
- Engineer networking and traffic controls: ingress controllers, API gateways (NGINX/Envoy/Kong), service mesh (Istio/Linkerd), and WAFs; implement rate limiting and DDoS protections
- AI/ML & GenAI Enablement
- Operate AI training/inference platforms on Azure Machine Learning and Amazon SageMaker; manage model and data artifacts with MLflow/registries
- Operationalize RAG/LLM services with Azure OpenAI and AWS Bedrock; standardize serving via KServe or managed endpoints; integrate vector databases
- Implement data/model lineage, drift detection, shadow testing, and automated rollback based on health and evaluation signals
- Security, Compliance & Governance
- Apply Zero-Trust and least-privilege access (Azure AD, AWS IAM); implement RBAC, workload identity, network segmentation, and pod security standards
- Centralize secrets with Azure Key Vault and AWS Secrets Manager/Parameter Store; implement rotation and access auditing
- Maintain SBOMs and image signing with attestations; prevent deployment of non-compliant artifacts; automate compliance evidence collection
- Operations & Support
- Run on-call and incident response with playbooks and blameless postmortems; drive MTTR reduction and reliability improvements
- Provide timely support across multiple platforms; ensure customer satisfaction and SLA adherence; follow escalation matrix for complex cases
- Implement Infrastructure-as-Code with Terraform and Deployment Manager
- Build CI/CD pipelines with GitHub Actions (and Cloud Build where applicable)
- Containerize and deploy applications using Docker and Kubernetes (GKE)
- Automate operational tasks using Linux, Bash, and Python scripting
- Monitor systems with Prometheus, Grafana, Splunk, and Kibana
- DevOps & SRE Competencies
- Monitoring and logging solutions: Prometheus, Grafana, ELK/Elastic Stack, OpenSearch, Splunk, Kibana
- Understanding of security best practices and compliance automation integrated into pipelines
- Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
- Graduate degree or equivalent experience
- 6+ years in DevOps/Platform/SRE with 3+ years of operating Kubernetes in production
- Hands-on depth with Azure and AWS: AKS/EKS, Azure ML/SageMaker, ACR/ECR, IAM/Key Vault/Secrets Manager, and observability (Azure Monitor/CloudWatch)
- Hands-on experience in DevOps and CI/CD with a solid track record of successful project delivery
- Experience applying SRE principles, including SLOs/SLIs, error budgets, and availability management
- Deep knowledge of containerization (Docker) and orchestration (Kubernetes)
- Expertise in Infrastructure-as-Code with Terraform (and Ansible where applicable)
- Solid scripting and automation skills: Python and Bash
- Proficiency with Terraform/Terragrunt, Helm, GitOps (Argo CD/Flux); CI/CD with GitHub Actions and Azure DevOps; exposure to Jenkins/GitLab/CircleCI
- Proficiency with CI/CD tools: Jenkins, GitHub Actions, Azure DevOps, GitLab CI/CD, CircleCI
- Proven solid troubleshooting, root-cause analysis, and platform ownership; excellent communication skills
Preferred Qualifications:
- Cloud certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer)
- LLMOps/RAG experience with Azure OpenAI and AWS Bedrock; vector databases; evaluation pipelines
- Knowledge of Service mesh (Istio/Linkerd), API gateways (NGINX/Envoy/Kong), and streaming (Kafka/MSK/Event Hubs)
- Healthcare data privacy/compliance familiarity; audit evidence automation
- Knowledge of Representative Tech Stack
- Azure: AKS, Azure Machine Learning, Azure OpenAI, ACR, Key Vault, Azure Monitor/Application Insights, App Gateway, Data Factory
- AWS: EKS, SageMaker, Bedrock, ECR, IAM/KMS, Secrets Manager, CloudWatch, ALB/NLB
- GCP: GKE, Compute Engine, VPC, Cloud IAM, Cloud Run, Cloud Functions, Cloud DNS, Cloud Monitoring, MIGs
- DevOps & Infra: Terraform/Terragrunt, Helm, Argo CD/Flux, Docker, KServe, NGINX/Envoy
- CI/CD: GitHub Actions, Azure DevOps, Jenkins, GitLab CI/CD, CircleCI
- Security: OPA/Gatekeeper, Kyverno, Trivy, Snyk, Checkov, SonarQube, Cosign/SBOM (SPDX/CycloneDX)
- Observability: OpenTelemetry, Prometheus, Grafana, ELK/Elastic Stack, OpenSearch, Azure Monitor, CloudWatch, Splunk, Kibana
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
#Nic
Top Skills
Argo Cd
AWS
Aws Secrets Manager
Azure
Azure Devops
Azure Key Vault
Bash
Docker
Elk
Flux
GCP
Github Actions
Grafana
Helm
Jenkins
Kubernetes
Opentelemetry
Prometheus
Python
Terraform
Similar Jobs at Optum
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The DevOps Engineer analyzes workflows, optimizes processes, and supports technical upgrades. They act as a liaison between engineering teams and business users, promoting automation and compliance within the ServiceNow ecosystem.
Top Skills:
Servicenow
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design and implement AI-driven CI/CD pipelines, automate deployment workflows, manage containerization, and ensure compliance and security across all deployments.
Top Skills:
ArtifactoryAWSAzureBashDockerGCPGithub ActionsJenkinsKubernetesPythonTerraform
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The DevOps Engineer will design, implement, and maintain CI/CD pipelines, manage cloud infrastructure, deploy applications, and support data pipeline processing while ensuring reliability, scalability, and security.
Top Skills:
ArmAWSAzureAzure DevopsBashCi/CdCloudFormationDatadogDockerElkGCPGithub ActionsGitlab CiGrafanaJenkinsKubernetesPrometheusPysparkPythonReactTerraform
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

