Safe Fleet Logo

Safe Fleet

Site Reliability Engineer Lead

Posted Yesterday
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Coquitlam, BC
Senior level
In-Office or Remote
Hiring Remotely in Coquitlam, BC
Senior level
Lead efforts to enhance cloud infrastructure resiliency through SRE practices, automation, and incident response management.
The summary above was generated by AI
Description

Meet the Smart Safety Company

At Safe Fleet our name says it all. We make fleet vehicles – and everyone in and around them – safer. Our fleet safety platform brings together best-in-class products, ground-breaking technology, and a 100-year history of fleet know-how and innovation to solve the world’s biggest fleet safety problems.

Our core value is safety. Without safety first, efficiency and productivity are not possible. This is true for our products, our culture, and our relationship with our community. Our vision is to reduce preventable deaths and injuries in and around fleet vehicles with a goal of ZERO accidents.

We are re-defining what safety means for fleets of every type – from school buses to waste collection trucks, firefighting to utility vehicles, police cruisers to delivery vans.

Whether you work in our Charlotte plant to build life-saving stop arms for school buses, or design advanced camera vision products in our Vancouver office, forge valves and high-quality nozzles to fight fires, or dream up new ways to protect fleet operators in our Corporate HQ in Kansas City, you’ll contribute to our goal to keep everyone safe.

We are a fast-growing manufacturing, service, and technology company with over 1700 employees in over 15 locations across Canada and the US. We’re looking for motivated self-starters with innovative thinking to join our team and help us achieve our growth and performance goals. Sound like you?


Job Summary



As the Site Reliability Engineer Lead at SafeFleet, you will be a key leader in enhancing our cloud infrastructure's resilience and efficiency. This role combines deep technical expertise with leadership responsibilities, ensuring that both infrastructure and application layers are optimized to meet the demands of our service level objectives (SLOs) and agreements (SLAs).

Responsibilities

  • Lead the definition and implementation of Service Level Indicators (SLIs) and Objectives (SLOs) by workflow to enhance product resiliency and reliability.
  • Oversee and improve SRE practices, ensuring system availability, scalability, and observability.
  • Collaborate with software engineering teams to embed effective SRE practices into the development lifecycle.
  • Mentor and lead our 24/7 incident response team, fostering a culture of continuous improvement and technical excellence.
  • Drive the adoption of automation and orchestration projects that improve operational efficiencies and proactive management capabilities.
  • Manage operational incident response, change management, and root cause analysis workflows, setting best practices in these areas.
  • Engage in capacity management and proactive performance tuning to ensure that our services meet the demands of our SLAs.
  • Document and maintain operational procedures, ensuring that they align with best practices and promote knowledge sharing across teams.

Salary: $100,000 to $125,000/yr


Requirements
  • Minimum 5 years of experience in cloud or DevOps engineering, with a proven track record in a leadership or technical lead role.
  • Strong knowledge of cloud services (Azure preferred), container orchestration via Kubernetes, and infrastructure as code practices.
  • Proficiency in monitoring tools such as Grafana, Prometheus, and Elasticsearch.
  • Excellent communication and project management skills, capable of leading projects and initiatives independently.
  • Experience with automation tools and scripting languages such as PowerShell, Bash, Terraform, Kubernetes, Docker, Jenkins, and Azure CLI.
  • Strong analytical skills and the ability to engage with both technical staff and executive-level stakeholders

Top Skills

Azure
Azure Cli
Bash
Docker
Elasticsearch
Grafana
Jenkins
Kubernetes
Powershell
Prometheus
Terraform

Similar Jobs

17 Days Ago
Remote
Canada
Mid level
Mid level
Cloud • Information Technology
The Site Reliability Engineer enhances system reliability and scalability by automating operational tasks, optimizing cloud infrastructure, and collaborating with development teams.
Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonTerraform
An Hour Ago
Remote
Canada
Expert/Leader
Expert/Leader
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Lead strategic UX research initiatives to shape product and organizational strategy, guide decision-making, and mentor researchers.
Top Skills: Cognitive ScienceHuman-Computer InteractionPsychologyUx Research
2 Hours Ago
In-Office or Remote
7 Locations
Senior level
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Global Procurement Analyst will manage purchase requisitions, ensure compliance with policies, analyze spending data, collaborate cross-functionally, and drive procurement efficiency and transparency.
Top Skills: DocusignGoogle SuiteJIRALookerMalbekMsofficeOraclecloudSalesforceSnowflakeSource-To-Procure ToolsThinksmart

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account