N-iX Logo

N-iX

SRE / MLOps Engineer – Ray.io (Python)

Reposted 14 Hours Ago
Be an Early Applicant
India
Senior level
India
Senior level
You will build and support ML infrastructure, automate deployment, troubleshoot issues, and collaborate with teams to enhance operational excellence.
The summary above was generated by AI

N-iX is a global software development service company that helps businesses across the world develop successful software products. Founded in 2002, N-iX has come a long way, expanding its presence across Europe, the US, and Latin America. Today, we are a strong community of 2,000+ professionals and a reliable partner for global industry leaders and Fortune 500 companies. 

Our client is a global commerce leader where you can influence how the world buys, sells, and gives. You’ll be part of a work culture that’s been genuinely committed to diversity and inclusion since its founding over twenty five years ago. Here, you can be yourself, do your best work along with a team of professionals, and have a meaningful impact on people across the globe. We seek people with drive, ideas, and a passion for helping small businesses succeed to help.

We are seeking a highly motivated, experienced SRE/MLOps engineer with Python and Ray.io to build and maintain the next generation AI platform. This role focuses on developing software on top of open-source libraries such as Ray, enabling internal teams to run ML workloads efficiently. 

Responsibilities:

  • Build, refactor, and release software for the AI platform (feature development and bug fixes)
  • Deploy and manage applications on Ray.io, including workload management, cluster deployment, distributed task scheduling, and troubleshooting
  • Use Ray Dashboard and CLI tools to monitor and debug distributed jobs
  • Work with Ray ecosystem libraries: Ray Train, Ray Tune, Ray Serve, Ray Data
  • Integrate with tools such as Airflow, MLflow, Dask, DeepSpeed (a plus)
  • Collaborate with AI platform developers to provide CI/CD pipelines for automated deployment and configuration
  • Ensure high availability (target 99.999%) and monitor production systems.
  • Develop automation for problem management and operational efficiency
  • Write documentation and provide technical support for internal users
  • Follow best practices for development: versioning, source control, branching, and merging patterns.

Requirements:

  • Main coding language: Python (C++ good to have)
  • Strong experience with Ray.io, including at least two areas such as Ray Train or Ray Serve
  • Kubernetes / Docker: Proficient / Experienced
  • Hands-on experience with distributed systems, cluster management, and cloud technologies
  • Familiarity with DevOps practices, CI/CD pipelines, and test automation
  • Excellent problem-solving, debugging, and triaging skills
  • Strong communication skills for collaboration with partners, customers, and engineers
    Ability to manage multiple projects in a fast-paced environment
  • TensorRT, DeepSpeed, PyTorch Distributed - will be a plus
  • English proficiency (oral and written).

Role specifics:

  • Infra vs. coding requirements: 30% infrastructure (can be learned with guidance), 70% coding (essential for features and bug fixes)
  • The role targets engineers rather than data scientists: focus on deployment, abstractions, monitoring, and alerting of Ray applications at scale
  • Ray proficiency is critical; second version of the platform will be built on Ray
  • Understanding Racer for real-time serving and Ray Train for model training is required

We offer*:

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers

Top Skills

Docker
Jenkins
Kubernetes
PyTorch
Ray.Io
TensorFlow

Similar Jobs

An Hour Ago
Easy Apply
Hybrid
Bangalore, Bengaluru, Karnataka, IND
Easy Apply
Senior level
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
This role involves designing and managing scalable cloud infrastructure, building microservices in Java and Golang, responding to incidents, and ensuring compliance across infrastructure and applications.
Top Skills: AWSCrossplaneDockerFalcoGoGrafanaInfrastructure As CodeJavaNessusPrometheusPulumiTerraformTrivyWazuh
An Hour Ago
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Gaming
Develop innovative features for games used by millions, ensure performance and reliability, mentor junior engineers, and collaborate with cross-functional teams.
Top Skills: AndroidAws CloudC#C++iOSJavaJavaScriptPythonReact
An Hour Ago
Hybrid
Chennai, Tamil Nadu, IND
Senior level
Senior level
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
As a Frontend Engineer at Capco, you'll design and develop user interfaces using React.js, collaborate with teams, integrate APIs, and participate in automated testing.
Top Skills: CSS3CypressGithub ActionsGitlab CiHTML5JavaJestNext.JsReact Testing LibraryReactSpring BootTypescriptViteWebpack

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account