The HPC Engineer - Storage focuses on deploying high-performance storage systems, managing configurations, automating installations, and maintaining I/O performance benchmarks in a cluster environment.
Job Summary & Responsibilities
Technical Competencies
Essential Skills
High-Performance Storage:
- Parallel Filesystems: Hands-on operational experience with at least one major AI storage platform: VAST Data, Weka.io, DDN Lustre (Exascaler), or IBM GPFS (Spectrum Scale).
- Linux I/O Stack: Deep understanding of the Linux VFS (Virtual File System), block devices, and how to debug I/O performance using tools like iostat, iotop, and strace.
- RDMA Storage: Experience configuring NVMe-over-Fabrics (NVMe-oF) or NFS-over-RDMA, understanding the dependency on the underlying InfiniBand/RoCE network.
Automation & Containerisation:
- Ansible Storage: Proficiency in writing Ansible playbooks to automate the installation of storage clients and configuration of mount points.
- Kubernetes Storage: Understanding of StorageClasses, PVCs, and how to debug CSI Driver pods (checking logs for mount failures).
- GPUDirect: Conceptual understanding of NVIDIA GPUDirect Storage (GDS) and the ability to verify if GDS is active.
Desirable Experience
- Vendor Specifics: Deep certification or experience with Pure Storage (FlashBlade) or NetApp ONTAP AI configurations.
- Object Storage: Experience interacting with S3-compatible object stores via CLI for model weight retrieval.
- Data Migration: Experience using tools like fpsync or rclone to move petabyte-scale datasets between tiers.
Certifications
Highly Desirable:
- NVIDIA-Certified Associate: AI Infrastructure and Operations (NCA-AIIO)
- Vendor Certifications:
- VAST Certified Administrator (VCP-AD1)
- WEKA Technical Xpert Certification
- Red Hat Certified Specialist in Storage Administration
Success Metrics (KPIs)
- I/O Performance: Achieving >95% of the theoretical line-rate throughput on IOR/FIO benchmarks for provisioned clients.
- Mount Stability: Zero "Stale File Handles" or disconnected mounts across the cluster during the 72-hour burn-in period.
- Ticket Velocity: Consistently meeting SLAs for storage-related support tickets.
Similar Jobs
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
As the Lead Product Designer for Discover, you'll design AI-driven experiences for a two-sided marketplace while mentoring the design team and owning the product design direction.
Top Skills:
Ai-Assisted Prototyping Tools (CursorClaude CodeFigmaLovable)V0
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Lead the vision and development of internal platforms and APIs for Mastercard's digital payments, focusing on secure and scalable solutions. Collaborate with engineering teams to drive platform capabilities aligned with product strategy and regulatory needs.
Top Skills:
AgileAPIsCloud-Native ArchitecturesDomain-Driven DesignMicroservicesPci ComplianceSecurityTokenization
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
The role involves acquiring new enterprise-level clients, managing sales cycles, developing territory plans, and collaborating across teams to drive revenue.
Top Skills:
Enterprise SoftwareFinanceLegalMarketingSales Engineering
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.



