Monitor global infrastructure using Datadog and SolarWinds, triage and resolve L1 incidents, escalate complex issues, participate in incident response and post-incident reviews, maintain SOPs and ServiceNow tickets, support automation with basic scripting, and provide weekend on-call coverage.
JOB DESCRIPTION
- Monitor Sysco’s global infrastructure and systems using tools such as Datadog, SolarWinds, and other enterprise monitoring platforms.
- Detect, triage, and respond to incidents proactively before customer or business impact.
- Independently resolve:
- Server performance issues
- Monitoring agent issues
- Basic infrastructure and system alerts
- Escalate major incidents, complex infrastructure issues, and application-related incidents to L2/L3 teams in line with SOPs and SLAs.
- Ensure initial response and resolution targets are met for all priority levels.
- Participate in incident bridge calls and coordinate with internal and external stakeholders.
- Perform initial investigations and document findings to support faster resolution.
- Contribute to post-incident reviews and root cause analysis, including analysis via Datadog Watchdog.
- Follow and execute Standard Operating Procedures (SOPs) for known incidents.
- Maintain accurate documentation and ticket updates in ServiceNow.
- Support initiatives to improve First-Time Resolution (FTR) and reduce MTTR.
- Contribute to project-level operational improvements and initiatives tracked in Jira.
- Apply basic scripting or automation knowledge where applicable to support monitoring improvements and operational efficiency.
- Actively participate in knowledge sharing and continuous learning initiatives.
- Standard shift: Monday to Friday, from 10:30 AM to 7:30 PM CST
- Weekend on-call coverage required (one day per weekend, 10:30 AM – 7:30 PM CST; monthly shift rotation defined based on business needs, with prior notification provided by the team manager).
- Bachelor’s degree in Information Technology or equivalent experience.
- 2 years of experience in Operations Engineering, NOC, SRE, or similar roles.
- Strong understanding of:
- Windows Server and/or UNIX/Linux environments
- Networking fundamentals (LAN/WAN, TCP/IP, DHCP, firewalls, routing)
- Experience with an enterprise ticketing tool (e.g., ServiceNow,Jira).
- Strong communication skills in English and ability to work under pressure.
- Willingness to work in a Weekend on-call coverage required
- Excellent communication skills in English (B2+ or higher) and ability to collaborate across functions and geographies.
- Experience with Datadog, SolarWinds, or similar monitoring platforms.
- Exposure to AWS, Azure, or GCP.
- Familiarity with Jira for tracking initiatives and projects.
- ITIL certification or hands-on experience with ITIL practices.
- Basic scripting or automation knowledge (e.g., PowerShell, Bash, Python).
Benefits:
- This is a hybrid position based in Ultra Park II, Lagunilla (Heredia). On-site presence is required only when necessary, such as for meetings, trainings, or collaborative activities, in alignment with the company’s telework agreement, which currently requires employees to work on-site three (3) days per week)
- Private Medical Insurance
- Asociacion Solidarista
- Life Insurance
- Personal Day Off
Note: Only candidates with Costa Rican nationality or valid immigration status will be considered; applicants residing outside Costa Rica will not be considered, and relocation is not available
Similar Jobs
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Develops, evaluates and optimizes high-volume automotive manufacturing and assembly processes. Designs tooling and fixtures, leads process launch readiness, drives continuous improvement (CI), implements APQP/FMEA/SPC controls, manages corrective actions (NCR/8D), supports production troubleshooting, and ensures compliance with IATF:16949 and ISO14001.
Top Skills:
8DApqpBlueprint ReadingCadCncDoeDraftsightFmeaGd&TIatf:16949Iso14001Measurement AnalysisNcrQs9000Robotic ControlsRobotics ProgrammingRoot Cause AnalysisSix SigmaSolidworksSpc
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Program, set up, adjust and troubleshoot CNC machines to produce high-quality automotive parts. Develop CNC programs, perform tooling maintenance, train operators, coordinate with engineering/QA/maintenance, and support continuous improvement while following safety and quality standards.
Top Skills:
Blueprint ReadingCnc Machine ControlsCnc ProgrammingFlow FormingMeasuring EquipmentSpcTool & DieTooling Setup
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Support plant quality and assurance activities: maintain control plans, perform PPAP/PPAP submissions, investigate customer non-conformances, stop shipments for defects, manage quality gates, audit suppliers, run CI/Kaizen projects, implement capability analysis, and support PFMEA/inspection and gauging activities.
Top Skills:
Aiag ApqpBlueprint ReadingCapability StudiesCare StationsControl PlanDesign Of Experiments (Doe)Gauge CalibrationGauge R&RGd&TGp12Iatf16949Inspection Control CardsKaizenLeanMeasuring EquipmentPfmeaPpapQpfSpc
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

