Position Summary:
As a Senior Engineer in the Monitoring and Observability team, you will be responsible for designing, implementing, and optimizing monitoring solutions to ensure reliability and performance of Ensono distributed and application services. This role requires deep expertise in real-time monitoring, alerting, anomaly detection, and automation to proactively identify and rapid resolution of incidents. You will also be responsible for designing and implementing client solutions that Ensono manages.
What You Will Do:
- Engineer & operate scalable monitoring & observability platform for Ensono’s Hybrid Cloud clients, using current tools in Ensono fleet such as BMC Truesight, BMC Helix, Entuity, VMWare Aria.
- Plan and execute strategic roadmap for observability and monitoring tools, ensuring alignment with business and clients’ requirements
- Define monitoring best practices, including proactive alerting, anomaly detection and performance analytics
- Operate and optimize end-to-end monitoring solutions, for real-time visibility into network, distributed systems and applications
- Establish automated alerting thresholds based on Service Level Objectives (SLO) and Service Level Agreement (SLA)
- Establish monitoring audit standards for conformance and compliant purposes on standard as well as custom monitors
- Point of escalation for day-to-day monitoring related incidents
- Automate monitoring configurations and telemetry collection using scripting and Infrastructure as a Code (IAC) tool like Ansible and Terraform
We want all new Associates to succeed in their roles at Ensono. That's why we've outlined the job requirements below. To be considered for this role, it's important that you meet all Required Qualifications. If you do not meet all of the Preferred Qualifications, we still encourage you to apply.
Required Qualifications:
- 7+ years of experience in observability or monitoring engineering operational roles
- 7+ years of hands-on experience with ITSM platforms such as ServiceNow and Monitoring Tools such as BMC, Data Dog, Entuity, or others
- Strong proficiency in Python, Bash, JavaScript for automation and scripting
- Experience with Infrastructure as Code (Ansible, Terraform, etc) for observability tools deployment
- Strong analytical and problem-solving skills for diagnosing complex issues
- Effective communication and leadership, especially in training and cross functional team collaborations
- Ability to think outside the box, holistically about processes that impact the business engagement, and continually refine processes
- Ability to thrive in an independent and collaborative fast-paced environment, managing priorities properly
- Bachelor’s degree in related field
Preferred Qualifications:
- Master’s degree in information technology related field
- Proficiency in cloud platforms (AWS, Azure, GCP) and Kubernetes deployment & monitoring
- Advanced ITIL, including ITIL v3 and v4, certification or training
- Experience on AI/ML integration into ITSM practices is a plus
- Flexible work schedule