Kontakt.io Logo

Kontakt.io

Senior Site Reliability Engineer

Posted 4 Days Ago
Remote
Senior level
Remote
Senior level
As a Senior Site Reliability Engineer at Kontakt.io, you will ensure the scalability, availability, and security of our AI-driven healthcare platform by designing and maintaining cloud infrastructure, implementing monitoring tools, automating processes, and collaborating with various teams to enhance operational efficiency and patient care.
The summary above was generated by AI

Kontakt.io is building the platform that care operations run on.


We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.


Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, Kontakt.io is the go-to platform for better and faster care delivery operations.


As a Site Reliability Engineer (SRE) at Kontakt.io, you will be responsible for ensuring the scalability, availability, and security of our cloud-based AI-driven healthcare platform. You will collaborate with software, data, and infrastructure teams to build highly resilient and automated systems, allowing hospitals and care facilities to operate seamlessly and without downtime.

Your expertise in cloud infrastructure, automation, monitoring, and performance optimization will directly impact how healthcare organizations leverage real-time data to enhance patient care and operational efficiency.


If you are passionate about highly available systems, automation, and making an impact in healthcare, join Kontakt.io and help us build the future of smart care operations!

Key Responsibilities:

  • Design and maintain highly available, fault-tolerant, and scalable cloud infrastructure.
  • Implement SLOs, SLIs, and SLAs to track system reliability and optimize uptime.
  • Participate in 24/7 on-call rotation
  • Oversee production platform deployments
  • Monitor latency, traffic, errors, and system health using modern observability tools.
  • Conduct root cause analysis (RCA) and post-mortems to continuously improve system resilience.
  • Automate infrastructure provisioning using Terraform, Ansible, or Pulumi.
  • Implement CI/CD pipelines to ensure seamless and safe deployments.
  • Enable self-healing mechanisms using Kubernetes operators, auto-scaling, and fault detection.
  • Ensure compliance with HIPAA, GDPR, and other healthcare data regulations.
  • Define and execute disaster recovery (DR) and business continuity plans.
  • Manage and optimize AWS environments for cost-efficiency and performance.
  • Deploy and manage observability tools and build real-time alerting and response frameworks
  • Establish best practices for logging, debugging, and performance monitoring.
  • Improve incident response automation through runbooks, AI-based anomaly detection, and predictive analytics.

What You Bring

  • 3+ years of experience as an SRE
  • Strong expertise in Kubernetes, Docker, and container orchestration.
  • Experience managing cloud-native environments (AWS).
  • Experience with event-driven architectures, Kafka, or real-time data streaming.
  • Knowledge of machine learning infrastructure.
  • Previous experience in healthcare, compliance (HIPAA), and highly regulated environments.
  • Proficiency in Infrastructure as Code (IaC) using Terraform.
  • Deep knowledge of networking, DNS, load balancing, and security best practices.
  • Experience with CI/CD pipelines (Jenkins, CI, or ArgoCD).
  • Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK, OpenTelemetry).
  • Strong programming skills in Python, Golang, or Bash for automation.
  • Knowledge of machine learning infrastructure.

We offer:

  • Work on a mission-driven platform that improves healthcare operations and patient outcomes.
  • B2B contract or an employment agreement
  • Competitive salary and stock option plan
  • Collaborate with top engineers, data scientists, and AI experts.
  • Flexible remote or hybrid work options (office in Krakow)
  • Collaborative and self-organized environment
  • private medical care, cafeteria system

Ready to Build the Future of Healthcare?

Apply now and help scale the platform that care operations run on. 🚀

Top Skills

Automation
AWS
Bash
Ci/Cd Pipelines
Cloud Infrastructure
Docker
Elk
Event-Driven Architectures
Gdpr
Go
Grafana
Hipaa
Infrastructure As Code
Jenkins
Kafka
Kubernetes
Monitoring
Monitoring And Logging Tools
Opentelemetry
Performance Optimization
Prometheus
Python
Slas
Slis
Slos
Terraform

Similar Jobs

8 Days Ago
Easy Apply
Remote
Hybrid
United States
Easy Apply
Senior level
Senior level
Artificial Intelligence • Marketing Tech • Software
As a Senior Site Reliability Engineer, you will enhance tooling, automate infrastructure, and support core applications. Responsibilities include monitoring systems, troubleshooting, and collaborating with service engineering teams to ensure service delivery. Candidates should have experience in reliability engineering, cloud platforms like AWS, and a strong foundation in observability tools and infrastructure as code.
8 Days Ago
Easy Apply
Remote
Hybrid
United States
Easy Apply
Senior level
Senior level
Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
As a Senior Site Reliability Engineer, you will architect and evolve developer experiences for cloud engineering teams, lead critical infrastructure design, resolve complex problems, and improve operational excellence while collaborating across teams. You are expected to support sustainable incident response and embrace automation best practices.
Top Skills: AnsibleArtifactory,AthenaCi/CdDockerGitGitlabJenkinsKubernetesPythonRubyTerraformUnix/Linux
11 Days Ago
Easy Apply
Remote
United States
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
The Senior Site Reliability Engineer, Database Operations at GitLab ensures the reliability of user-facing services and GitLab's production systems. Responsibilities include designing and maintaining ClickHouse and PostgreSQL clusters, provisioning cloud infrastructure, implementing high-availability solutions, optimizing databases, monitoring performance, and ensuring security compliance, while collaborating with various teams.
Top Skills: AnsibleChefClickhouseGoGrafanaHelmKubernetesLinuxPostgresPrometheusPythonRubyTerraform

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account