Kontakt.io

Senior Site Reliability Engineer

Posted 4 Days Ago

Remote

Senior level

Remote

Senior level

As a Senior Site Reliability Engineer at Kontakt.io, you will ensure the scalability, availability, and security of our AI-driven healthcare platform by designing and maintaining cloud infrastructure, implementing monitoring tools, automating processes, and collaborating with various teams to enhance operational efficiency and patient care.

The summary above was generated by AI

Kontakt.io is building the platform that care operations run on.

We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.

Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, Kontakt.io is the go-to platform for better and faster care delivery operations.

As a Site Reliability Engineer (SRE) at Kontakt.io, you will be responsible for ensuring the scalability, availability, and security of our cloud-based AI-driven healthcare platform. You will collaborate with software, data, and infrastructure teams to build highly resilient and automated systems, allowing hospitals and care facilities to operate seamlessly and without downtime.

Your expertise in cloud infrastructure, automation, monitoring, and performance optimization will directly impact how healthcare organizations leverage real-time data to enhance patient care and operational efficiency.

If you are passionate about highly available systems, automation, and making an impact in healthcare, join Kontakt.io and help us build the future of smart care operations!

Key Responsibilities:

Design and maintain highly available, fault-tolerant, and scalable cloud infrastructure.
Implement SLOs, SLIs, and SLAs to track system reliability and optimize uptime.
Participate in 24/7 on-call rotation
Oversee production platform deployments
Monitor latency, traffic, errors, and system health using modern observability tools.
Conduct root cause analysis (RCA) and post-mortems to continuously improve system resilience.
Automate infrastructure provisioning using Terraform, Ansible, or Pulumi.
Implement CI/CD pipelines to ensure seamless and safe deployments.
Enable self-healing mechanisms using Kubernetes operators, auto-scaling, and fault detection.
Ensure compliance with HIPAA, GDPR, and other healthcare data regulations.
Define and execute disaster recovery (DR) and business continuity plans.
Manage and optimize AWS environments for cost-efficiency and performance.
Deploy and manage observability tools and build real-time alerting and response frameworks
Establish best practices for logging, debugging, and performance monitoring.
Improve incident response automation through runbooks, AI-based anomaly detection, and predictive analytics.

What You Bring

3+ years of experience as an SRE
Strong expertise in Kubernetes, Docker, and container orchestration.
Experience managing cloud-native environments (AWS).
Experience with event-driven architectures, Kafka, or real-time data streaming.
Knowledge of machine learning infrastructure.
Previous experience in healthcare, compliance (HIPAA), and highly regulated environments.
Proficiency in Infrastructure as Code (IaC) using Terraform.
Deep knowledge of networking, DNS, load balancing, and security best practices.
Experience with CI/CD pipelines (Jenkins, CI, or ArgoCD).
Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK, OpenTelemetry).
Strong programming skills in Python, Golang, or Bash for automation.
Knowledge of machine learning infrastructure.

We offer:

Work on a mission-driven platform that improves healthcare operations and patient outcomes.
B2B contract or an employment agreement
Competitive salary and stock option plan
Collaborate with top engineers, data scientists, and AI experts.
Flexible remote or hybrid work options (office in Krakow)
Collaborative and self-organized environment
private medical care, cafeteria system

Ready to Build the Future of Healthcare?

Apply now and help scale the platform that care operations run on. 🚀

Top Skills

Automation

AWS

Bash

Ci/Cd Pipelines

Cloud Infrastructure

Docker

Elk

Event-Driven Architectures

Gdpr

Grafana

Hipaa

Infrastructure As Code

Jenkins

Kafka

Kubernetes

Monitoring

Monitoring And Logging Tools

Opentelemetry

Performance Optimization

Prometheus

Python

Slas

Slis

Slos

Terraform

Similar Jobs

Movable Ink

Senior Site Reliability Engineer

8 Days Ago

Easy Apply

Remote

Hybrid

United States

Easy Apply

Senior level

Artificial Intelligence • Marketing Tech • Software

As a Senior Site Reliability Engineer, you will enhance tooling, automate infrastructure, and support core applications. Responsibilities include monitoring systems, troubleshooting, and collaborating with service engineering teams to ensure service delivery. Candidates should have experience in reliability engineering, cloud platforms like AWS, and a strong foundation in observability tools and infrastructure as code.

Cisco Meraki

Senior Site Reliability Engineer, Engineering Enablement - REMOTE

8 Days Ago

Easy Apply

Remote

Hybrid

United States

Easy Apply

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

As a Senior Site Reliability Engineer, you will architect and evolve developer experiences for cloud engineering teams, lead critical infrastructure design, resolve complex problems, and improve operational excellence while collaborating across teams. You are expected to support sustainable incident response and embrace automation best practices.

Top Skills: AnsibleArtifactory,AthenaCi/CdDockerGitGitlabJenkinsKubernetesPythonRubyTerraformUnix/Linux

GitLab

Senior Site Reliability Engineer, Database Operations:Clickhouse

11 Days Ago

Easy Apply

Remote

United States

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

The Senior Site Reliability Engineer, Database Operations at GitLab ensures the reliability of user-facing services and GitLab's production systems. Responsibilities include designing and maintaining ClickHouse and PostgreSQL clusters, provisioning cloud infrastructure, implementing high-availability solutions, optimizing databases, monitoring performance, and ensuring security compliance, while collaborating with various teams.

Top Skills: AnsibleChefClickhouseGoGrafanaHelmKubernetesLinuxPostgresPrometheusPythonRubyTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.