Role:Manager, Technology.
Experience Required: 12+ years
Responsibilities
- Experience in platform release cycles, SRE best practices, infrastructure code reviews, and incident/defect management.
- Efficient in handling a team of 6-8 engineers across SRE, Platform Operations, and platform capabilities functions.
- Out of the box thinking and creative problem-solving skills is desired.
- Works with architects and product owners/managers to design and implement scalable platform solutions addressing reliability, operability, and developer experience.
- Works with engineering teams to establish SRE practices including monitoring, alerting, SLOs, and performance tuning across production infrastructure.
- Manages day-2 operations for AKS clusters, including twice-yearly upgrades, patching, and capacity planning.
- Identifies optimal technologies and practices to improve platform reliability and developer self-service. Stays current with the evolving cloud-native and platform engineering space to evaluate tools and capabilities for integration.
- Involved in creating POCs, interacting with architects within groups to strategize platform development and build technical roadmaps.
- Drives Spec Driven Development (SDD) practices for platform capabilities and SRE automation — defining clear specifications for infrastructure contracts, APIs, and platform services before implementation to ensure predictable, well-documented outcomes.
- Leverages AI tools (Co-pilot, Claude, agentic workflows) to accelerate infrastructure-as-code development, incident triage, runbook automation, and platform self-service capabilities.
- Helps in maintaining proper platform documentation, runbooks, and operational playbooks.
- Supports and mentors team members by providing training, advice, coaching, and educational opportunities.
- Promotes culture of using various AI tools in SDLC including Co-pilot, Claude, etc.
Job Qualifications
- Bachelor's degree in information technology or related field.
- Preferred 12+ years' experience in SRE, Platform Engineering, or Infrastructure domain.
- Extensive experience working in Agile, participating in various L0, technical design, and roadmap initiatives.
- Strong hands-on experience required in Kubernetes (AKS preferred). Should be proficient in operating and managing Kubernetes clusters at an organization level.
- Strong experience in Terraform, ArgoCD & Argo Workflows, Helm/Kustomize, and GitOps practices.
- Experience with CI/CD pipelines (GitHub Actions, Azure DevOps) and infrastructure-as-code at scale.
- Strong experience in driving adoption of platform capabilities for developers, setting up COPs, and driving onboarding of engineering teams onto the platform.
- Proven experience defining and managing SLOs, SLIs, and error budgets to balance reliability with feature velocity across production systems.
- Experience building and running incident management processes including on-call rotations, escalation frameworks, blameless post-mortems, and RCA documentation.
- Hands-on experience with production operations at scale — capacity planning, disaster recovery, failover strategies, and business continuity for cloud-hosted infrastructure.
- Experience building internal developer platform capabilities — self-service tooling, golden paths, developer portals, and "platform as a product" approaches that reduce toil for engineering teams.
- Demonstrated ability to drive toil reduction through automation — scripting operational tasks, building self-healing systems, and implementing proactive alerting to reduce reactive firefighting.
- Experience conducting Production Readiness Reviews (PRRs) and defining operational standards for services transitioning to production.
- Experience with Spec Driven Development (SDD) methodologies — defining infrastructure and platform service specifications upfront to drive implementation, testing, and validation of platform capabilities.
- Familiarity with AI-assisted development tools and agentic automation patterns applied to SRE workflows such as intelligent alerting, automated remediation, and infrastructure provisioning.
- Passionate about sharing your experiences and knowledge and growing your team.
- Ability to creatively handle challenges and obstacles, innovating solutions balancing both immediate needs with longer-term ownership and maintenance.
- Preferred experience with observability tooling (Datadog, Prometheus, Grafana) for platform monitoring.
- Preferred knowledge in microservices architecture, service mesh, and cloud-native patterns.
- Strong interpersonal and communication skills, coupled with solid teamwork ethic and customer focus.
To maintain a fair and genuine hiring process, we kindly ask that all candidates participate in interviews without the assistance of AI tools or external prompts. Our interview process is designed to assess your individual skills, experiences, and communication style. We value authenticity and want to ensure we’re getting to know you—not a digital assistant. To help maintain this integrity, we ask to remove virtual backgrounds and include in-person interviews in our hiring process. Please note that use of AI-generated responses or third-party support during interviews will be grounds for disqualification from the recruitment process.
Applicants may be required to appear onsite at a Wolters Kluwer office as part of the recruitment process.



