Design and optimize LLM systems, manage scalable infrastructure, implement CI/CD and automation, and ensure system reliability and compliance.
Company Description
👋🏼We're Nagarro
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (17500 experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Job DescriptionREQUIREMENTS:
- Experience : 7.5+ Years
- 10-12 years in infrastructure, platform, DevOps, or MLOps roles
- Strong experience with cloud platforms (AWS/GCP/Azure) and Kubernetes
- Hands-on experience deploying and operating LLMs (OpenAI, Anthropic, open-source models)
- Proficiency with GPU infrastructure, model serving frameworks, and vector databases
- Strong programming skills in Python; experience with Bash/Go is a plus
- Experience with monitoring, logging, and performance tuning for distributed systems
- Preferred Qualifications
- Experience with LLM fine-tuning, RAG pipelines, and prompt/version management
- Familiarity with tools like Terraform, Helm, Argo, Ray, or similar
- Exposure to cost optimization strategies for large-scale AI systems
Responsibilities:
- Design and manage scalable infrastructure for training, fine-tuning, serving, and monitoring LLMs
- Build and maintain LLMOps pipelines (deployment, versioning, rollback, monitoring, evaluation)
- Optimize inference performance (latency, throughput, cost) across GPU/accelerator stacks
- Implement CI/CD, IaC, and automation for AI/ML workloads
- Ensure observability, reliability, and governance of LLM systems in production
- Collaborate with ML, platform, and product teams to operationalize AI solutions
- Manage security, compliance, and access control for model and data pipelines
Bachelor’s or master’s degree in computer science, Information Technology, or a related field.
Top Skills
Aws,Gcp,Azure,Kubernetes,Python,Bash,Go,Tensorflow,Pytorch,Terraform,Helm,Argo,Ray
Similar Jobs
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
Design, develop, and deploy solutions using Salesforce Data Cloud and Marketing Cloud. Requires strong experience with data integration, modeling, and governance. Collaborate with teams to implement marketing solutions and ensure data compliance.
Top Skills:
AWSAzureEltETLMulesoftSalesforce Data CloudSalesforce Marketing Cloud
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
The Senior Staff Engineer will lead AI platform development, design reusable frameworks, and manage MLOps for ML workloads, ensuring engineering quality and mentoring team members.
Top Skills:
AzureCompass Ai ServicesGenaiMachine LearningMlopsPython
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
The Senior Staff Engineer will develop automation features, manage cloud infrastructure on AWS, and optimize CI/CD pipelines. Responsibilities include code reviews, documentation, and leveraging infrastructure as code tools like Terraform and Ansible.
Top Skills:
Ai Coding AssistantsAnsibleAWSBashDockerGitJenkinsKubernetesPythonTerraform
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.
