Incident & Problem Manager
- - - - - - - - - - - -
KEY EXPECTED ACHIEVEMENTS:
Incident Management:
-
Track and manage the status of major incidents, ensuring timely updates and communication to stakeholders.
-
Minimize business impact by ensuring efficient incident resolution through coordination with the appropriate support teams.
-
Monitor adherence to SLAs, ensuring incidents are resolved within agreed timelines.
-
Provide clear and concise updates to senior leadership on the status and progress of major incidents.
Problem Management:
-
Drive root cause analysis (RCA) quality to prevent recurrence of incidents.
-
Ensure thorough documentation of problem records and RCAs, following industry best practices.
-
Monitor and validate the implementation of corrective and preventive actions.
Process Improvement:
-
Continuously assess and improve incident and problem management processes to enhance efficiency and effectiveness.
-
Develop and implement best practices, leveraging ITIL frameworks where applicable.
-
Identify trends and patterns in incidents and problems and recommend proactive solutions.
Collaboration:
-
Act as the primary point of contact for major incidents, coordinating with cross-functional teams and external partners.
-
Collaborate with teams across different time zones to ensure seamless resolution of incidents.
-
Foster strong relationships with internal and external stakeholders, including vendors and third-party support teams.
24x7 Incident Support:
-
Ensure 24x7 availability to manage critical incidents, leveraging and coordinating with dedicated support teams.
-
Establish and maintain an on-call schedule to address major incident escalations promptly.
Reporting and Metrics:
-
Develop and present incident and problem management performance reports, highlighting trends and areas for improvement.
-
Track and report on KPIs, including mean time to resolution (MTTR) and first-time fix rates.
Required Technical Skills:
-
Strong knowledge of ITIL framework (certification preferred).
-
Proficiency in incident and problem management tools such as ServiceNow, Remedy, or similar platforms.
-
Experience with root cause analysis techniques and tools.
-
Familiarity with infrastructure technologies, including networking, servers, databases, and cloud environments.
-
Knowledge of monitoring and alerting tools like Splunk, Dynatrace, or SolarWinds.
-
Understanding of cybersecurity principles and their impact on incident resolution.
-
Ability to analyze and interpret technical data to identify trends and patterns.
Availability
-
Flexibility to work 3-4 days from the office while managing cross-country collaboration remotely.
-
Availability to oversee and coordinate 24x7 support for major incidents.