The LLM Ops Engineer monitors and optimizes AI workflows, collaborates to integrate LLM Ops platforms, and enhances AI performance metrics.
Monitor, evaluate, and optimize AI/LLM workflows in production environments. Ensure reliable, efficient, and high-quality AI system performance by building out an LLM Ops platform that is self-serve for the engineering and data science departments.
Key Responsibilities:-
- Collaborate with data scientists and software engineers to integrate an LLM Ops platform (Opik by CometML) for existing AI workflows
- Identify valuable performance metrics (accuracy, quality, etc) for AI workflows and create on-going sampling evaluation processes using the LLM Ops platform that alert when metrics drop below thresholds
- Cross-team collaboration to create datasets and benchmarks for new AI workflows
- Run experiments on datasets and optimize performance via model changes and prompt adjustments
- Debug and troubleshoot AI workflow issues
- Optimize inference costs and latency while maintaining accuracy and quality Develop automations for LLM Ops platform integration to empower data scientists and software engineers to self-serve integration with the AI workflows they build
Requirements:-
- Strong Python programming skills
- Experience with generative AI models and tools (OpenAI, Anthropic, Bedrock, etc)
- Knowledge of fundamental statistical concepts and tools in data science such as: heuristic and non-heuristic measurements in NLP (BLEU, WER, sentiment analysis, LLM-as-judge, etc), standard deviation, sampling rate, and a high level understanding of how modern AI models work (knowledge cutoffs, context windows, temperature, etc)
- Familiarity with AWS
- Understanding of prompt engineering concepts
- People skills: you will be expected to frequently collaborate with other teams to help to perfect their AI workflows
- Experience Level 4-7 years of experience in LLM/AI Ops, MLOps, Data Science, or MLE
Top Skills
Anthropic
AWS
Bedrock
Openai
Python
Similar Jobs
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Engineering Manager will lead the Linux sensor development team, manage engineers, drive technical strategy, and ensure high code quality for cybersecurity features.
Top Skills:
CC++EbpfKubernetesLinuxUnix
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Sr. Software Engineer will create file format parsers, collaborate on machine learning features, and maintain software systems. Responsibilities include testing, optimization, and documentation.
Top Skills:
AWSAzureBitbucketC++GCPGitJenkinsJIRAPythonRust
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Support business service requests and incidents for NetSuite. Enhance policies and processes, ensure service levels, maintain relationships, and drive customer service excellence.
Top Skills:
BlacklineCoupaExpensifyNetSuiteServicenow
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

