Are you passionate about cutting edge technology?
Does building next generation Cloud Computing technology excites you?
Join our Compute Site Reliability team!
Our team is responsible for monitoring and measuring the reliability of our suite of Compute products and platform. In collaboration with Engineering and Product teams, we focus on improving the performance and reliability of the products we support.
Partner with the best
You'll apply statistical data analysis and networking knowledge to diagnose and solve some of the Internet's most difficult content delivery/cloud challenges. You will work with cross-functional teams to benchmark the performance of our key products and services and influence their evolution.
As a Senior Site Reliability Engineer, you will be responsible for:
- Investigating and troubleshoot networking problems within Linux based networking stack
- Monitoring the functioning and performance of the networking infrastructure via Prometheus metric systems and Grafana dashboards
- Solving complex problems in a timely and accurate manner and avoid recurrence through proactive troubleshooting, automation, and systems programming
- Building software tools and systems to automate analytical tasks and workflows to increase efficiency and reliability.
- Leveraging skills in data analysis, network diagnostics and debugging tools to characterize performance and recommend improvements.
Do what you love
To be successful in this role you will:
- Have 5+ years' experience in Site Reliability or System Engineering role, and bachelor's degree in computer science or related field
- Have expertise in L7 traffic management (Envoy, HAProxy, NGINX) in large-scale distributed systems.
- Be proficient in coding with Python, Perl, R, Java, or SQL & have networking knowledge including routing, firewalls, and DNS
- Have experience with Linux systems and tools such as netstats, traceroute, tcpdump
- Be proficient in configuration management and container technologies including Ansible, Salt Stack, Chef, Puppet, Terraform, Docker, Podman, Kubernetes, and Nomad
About us
At Akamai, we make life better for billions of people, trillions of times a day.
Whether you're streaming live events, scrolling social media, watching your favorite series, or managing your savings, we're the engine behind the scenes. We provide the world's most distributed platform from Cloud to Edge to help the giants of the digital world work faster and stay more secure, making the internet a better experience for everyone.
Our focus is simple:
Cloud and Edge: Running apps closer to users for instant performance.
Security: Neutralizing threats before they ever reach your data.
Content Delivery: Scaling the world's biggest moments without a glitch.
AI: Enabling our customers to build, secure, and scale AI apps on the world's most distributed cloud platform.
At Akamai, we don't just support the internet; we power and protect it, because behind every great digital experience is a massive hidden challenge. And we're the ones who solve it. When millions of people hit play or pay, Akamai ensures it just works.
Benefits at Akamai: We support your health, well-being, finances, and life beyond work. See our benefits.
FlexBase adapts to your job's needs
Akamai's FlexBase program is yet another way we show our commitment to providing employees with an exceptional workplace experience. It's not about telling employees where to work; it's about supporting employees to do their best work.
We trust our incredible employees to work in ways that suit them best: at home, in an office, or a combination of both.



.jpg)