Marvell Technology Logo

Marvell Technology

Senior Staff AI/ML Scale Engineer

Reposted 12 Days Ago
Be an Early Applicant
In-Office
2 Locations
Senior level
In-Office
2 Locations
Senior level
The role involves simulation, modeling, performance analysis, and tooling for AI/ML workloads, focusing on hardware/software co-design in advanced computing environments.
The summary above was generated by AI

About Marvell

Marvell’s semiconductor solutions are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, automotive, and carrier architectures, our innovative technology is enabling new possibilities. 

At Marvell, you can affect the arc of individual lives, lift the trajectory of entire industries, and fuel the transformative potential of tomorrow. For those looking to make their mark on purposeful and enduring innovation, above and beyond fleeting trends, Marvell is a place to thrive, learn, and lead. 

Your Team, Your Impact

This team at Marvell develops Murals, a next-generation AI/ML infrastructure simulation and design platform that enables in-depth analysis and optimization of large-scale training and inference workloads. Leveraging trace-driven simulation, performance modeling, and hardware/software co-design, the team helps shape scalable and resilient solutions for advanced workloads such as LLMs, DLRMs, GenAI, and GNNs.
Working closely with system architects, hardware designers, and ML practitioners, the team explores innovative ways to optimize compute, memory, and networking subsystems across complex datacenter environments.

What You Can Expect

  • Simulation & Modeling – Implement workflows to study AI/ML workloads using trace-driven and analytical models.

  • Performance Analysis – Profile and analyze system bottlenecks across compute, memory, and network layers.

  • Networking Studies – Evaluate collective communication performance (all-reduce, all-to-all, reduce-scatter) across different topologies and fabrics.

  • Tooling & Automation – Develop utilities for trace generation, merging, conversion, and visualization.

  • Prototype & Validation – Test distributed training and inference pipelines in simulated and real environments.

  • Hardware/Software Co-Design – Collaborate on emerging technologies (CXL, DPUs, NVLink, PCIe, UET/UEC, in-network compute).

  • Scaling Studies – Conduct performance projections and trade-off studies for next-gen AI infrastructure.

  • Knowledge Sharing – Document workflows, publish internal reports, and drive peer learning.

What We're Looking For

  • Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or related field with 4–12 years of relevant professional experience.

  • Strong foundation in computer architecture, distributed systems, AI/ML, and operating systems.

  • Solid networking fundamentals including TCP/IP, RDMA, RoCE, UET/UEC, and switching/routing.

  • Experience with simulation frameworks (e.g., Astra-Sim, Chakra, gem5, SST, NS-3).

  • Hands-on with PyTorch/TensorFlow and distributed training frameworks (DDP, Horovod, DeepSpeed).

  • Strong programming skills in Python, C++, and scripting for automation.

  • Familiarity with interconnect and memory technologies (CXL, PCIe, NVLink, UAL).

  • Experience with profiling, telemetry, observability, and debugging tools.

  • Knowledge of collective communication algorithms and topology-aware scheduling.

  • Exposure to AI accelerators, memory disaggregation, DPUs, and custom silicon.

  • Familiarity with visualization tools (Perfetto, Chrome Tracing, Chakra Timeline, Flamegraphs).

  • Experience with large-scale AI training pipelines and scaling studies.

  • Interest in energy/performance trade-offs and resilience techniques.

Additional Compensation and Benefit Elements

With competitive compensation and great benefits, you will enjoy our workstyle within an environment of shared collaboration, transparency, and inclusivity. We’re dedicated to giving our people the tools and resources they need to succeed in doing work that matters, and to grow and develop with us. For additional information on what it’s like to work at Marvell, visit our Careers page.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.

Interview Integrity
 

As part of our commitment to fair and authentic hiring practices, we ask that candidates do not use AI tools (e.g., transcription apps, real-time answer generators like ChatGPT, CoPilot, or note-taking bots) during interviews.
 
Our interviews are designed to assess your personal experience, thought process, and communication skills in real-time. If a candidate uses such tools during an interview, they will be disqualified from the hiring process.

This position may require access to technology and/or software subject to U.S. export control laws and regulations, including the Export Administration Regulations (EAR). As such, applicants must be eligible to access export-controlled information as defined under applicable law. Marvell may be required to obtain export licensing approval from the U.S. Department of Commerce and/or the U.S. Department of State. Except for U.S. citizens, lawful permanent residents, or protected individuals as defined by 8 U.S.C. 1324b(a)(3), all applicants may be subject to an export license review process prior to employment.

#LI-MN1

Top Skills

Ai/Ml
Astra-Sim
C++
Chakra
Cxl
Ddp
Deepspeed
Gem5
Horovod
Ns-3
Nvlink
Pcie
Python
PyTorch
Rdma
Roce
Sst
Tcp/Ip
TensorFlow
Uet

Similar Jobs

4 Hours Ago
Hybrid
Hyderabad, Telangana, IND
Mid level
Mid level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Sales Operations Analyst supports the forecasting process, analyzes customer data, suggests pricing solutions, and ensures order accuracy while interfacing with various teams.
Top Skills: AICRMExcelPowerPointWord
4 Hours Ago
Remote or Hybrid
Hyderabad, Telangana, IND
Expert/Leader
Expert/Leader
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Design and build large-scale AI native cloud solutions on platforms like AWS and Azure. Lead architectural discussions, mentor junior engineers, and integrate AI into workflows.
Top Skills: AnsibleAWSAzureChefCi/CdDockerGoogle Cloud PlatformJavaJavaScriptKafkaKubernetesMySQLPostgresPuppetPythonRabbitMQ
4 Hours Ago
Remote or Hybrid
Hyderabad, Telangana, IND
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Software Engineer will develop scalable code, collaborate with product owners, implement user-friendly software, and mentor colleagues. Responsibilities include ensuring code quality, integrating AI, and enhancing product features.
Top Skills: AngularJavaJavaScriptReactVue

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account