Prophecy

Site Reliability Engineer

Reposted 17 Days Ago

Be an Early Applicant

India

Senior level

India

Senior level

As a Site Reliability Engineer, you will ensure the reliability of Prophecy's platform across multi-cloud environments, optimizing Kubernetes, managing networking and identity services, and enhancing observability and resilience in SaaS solutions.

The summary above was generated by AI

About Prophecy

Prophecy is a rapidly growing startup enabling all the data users to visually build data pipelines with modern software practices including code on Github using its Low-Code Data Engineering Platform.

Prophecy is trusted by top Fortune 500 firms to replace their legacy ETL tools as they re-platform to the Cloud or Apache Spark. We're very well funded, backed by top VCs, and on the path to establishing ourselves as the leader on the cloud.

Prophecy is a Core Technology and Deep-IP company with engineering centered in India. Prophecy engineers often say that they have never worked in a more productive, higher-horsepower organization in their careers. The engineers love their work, are being challenged, and are doing the best work of their careers. To learn more, visit us on LinkedIn

Position Summary

As a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of Prophecy’s platform across multi-cloud and SaaS environments. You will provide technical expertise in Kubernetes, networking, identity, observability, and automation, working to resolve challenges that impact the availability and resilience of our platform. Customers and internal teams will look to you for solutions ranging from infrastructure troubleshooting to complex architectural designs spanning Kubernetes, cloud-native services, and enterprise security. You will partner closely with product engineering and support teams to deliver a highly reliable experience to our enterprise customers.

The Impact You Will Have

Operate and optimize Kubernetes platforms (EKS, AKS, GKE) with Helm, namespaces, pods, autoscaling, node pools.
Manage ingress & networking: NGINX, ALB/AGIC, DNS, TLS/certificates, proxies, VNET/VPC routing, PrivateLink/peering.
Implement identity & secrets management: SSO (OIDC/SAML), SCIM, service principals/managed identities, vaults, key rotation.
Maintain platform service health across UI, APIs, orchestrators, workflow services using readiness/liveness probes and capacity planning.
Enable storage & I/O: object stores (S3, ADLS, GCS), DBFS mounts, IAM roles, access connectors, throughput/quota optimization.
Execute release & upgrades: version rollouts, canary/blue-green strategies, rollback automation, image registries, SBOM/vulnerability scanning.
Deliver observability: build dashboards, log pipelines, SLO/SLA monitoring with Prometheus, Grafana, CloudWatch, Log Analytics, ELK.
Strengthen resilience & DR: multi-AZ architectures, backup/restore, chaos testing, RTO/RPO validation, recovery runbooks.
Drive release automation: GitOps (ArgoCD/Flux), pre-flight checks, automated smoke tests, post-upgrade validation suites.
Ensure cloud-specific reliability: IAM, private connectivity, security groups, application gateways across AWS, Azure, GCP.
Enforce security & compliance: CIS hardening, benchmarks, network segmentation, vulnerability management, auditability.
Support high-governance SaaS deployments: dedicated SaaS controls, change control, strict egress policies, artifact provenance, customer-owned KMS.

What We Look For

4+ years in SRE, platform engineering, or enterprise production support.
Strong hands-on experience with Kubernetes and multi-cloud (AWS, Azure, GCP).
Expertise in networking, identity, secrets, and platform automation.
Proven track record in observability, reliability engineering, and incident management.
Familiarity with GitOps/CI/CD pipelines and modern automation practices.
Strong problem-solving, ownership, and ability to work in a fast-moving startup culture.
Technical degree or the equivalent experience.

What You'll Have At Prophecy

Great company culture.
Competitive compensation.
Fair and Open Equity awards for everyone.
Flexible hybrid/remote work environment
Private medical insurance.
Learning and career development opportunities
End-to-end project ownership and high-growth career path

Our Commitment to Diversity and Inclusion

At Prophecy, we hire for merit and foster an inclusive culture where people from diverse backgrounds can excel and do their best work. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Prophecy are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and any other protected characteristics under applicable laws.

Top Skills

Alb/Agic

AWS

Azure

Docker

Elk

GCP

Gitops

Grafana

Helm

Kubernetes

Nginx

Prometheus

Scim

Tls

Similar Jobs

JPMorganChase

Site Reliability Engineer

2 Days Ago

Hybrid

Hyderabad, Telangana, IND

Senior level

Financial Services

Lead SRE at JPMorgan Chase focusing on site reliability, defining requirements, managing incidents, mentoring, and developing AI/ML solutions.

Top Skills: AWSDatabricksDatadogDynatraceGrafanaKubernetesPrometheusPysparkPythonSnowflakeSplunk

ServiceNow

Site Reliability Engineer

3 Days Ago

Remote or Hybrid

Hyderabad, Telangana, IND

Mid level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The Site Reliability Engineer will enhance platform reliability, prevent system issues, and drive automation culture while ensuring high performance standards.

Top Skills: C++JavaJavaScriptLinuxMySQLPython

Pfizer

Site Reliability Engineer

8 Days Ago

Hybrid

Entry level

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical

The Associate, Site Reliability Engineer will ensure reliable software delivery, automate processes, maintain cloud infrastructure, and collaborate cross-functionally for enhancements.

Top Skills: AnsibleArgocdAws Ec2BashC#DockerDocumentdbDynatraceEbsEckEfsEksEksElkFluxcdGithub ActionsGrafanaJavaJavaScriptKubernetesMongoDBMskNeo4JPostgresPrometheusPythonRdsS3SQL ServerTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.