3Pillar Global Logo

3Pillar Global

Lead Data Engineer with AI experience

Posted 3 Hours Ago
Be an Early Applicant
Remote
Hiring Remotely in India
Senior level
Remote
Hiring Remotely in India
Senior level
Lead Data Engineer to design, build, and operate production data pipelines, retrieval/vector infrastructure, semantic/feature stores, and ML/LLMOps foundations. Drive CI/CD, governance, monitoring, and agent/data APIs for RAG, LLM, and predictive model workloads.
The summary above was generated by AI
3Pillar is an AI transformation partner on a mission to help enterprises build the AI-native products and intelligent agents that will define the next era of business. With teams across North America, Europe, Latin America, and Asia, we work with the most ambitious companies in financial services, healthcare, media, and technology — helping them move faster, modernize boldly, and compete on their own terms. Our HelixAI platform and Helix Pods delivery model put our engineers at the center of real agentic transformation — doing work that is open, portable, and built to last. We are building the future of enterprise AI
 
We are looking Lead Data Engineer to build, operate, and continuously improve the
data pipelines, retrieval infrastructure, and ML/LLMOps foundations that power our AI
initiatives. The resource will work on turning reference architectures and data contracts
into robust, production-grade implementations that serve conversational AI assistants,
dashboard copilots, autonomous agents, RAG applications, and predictive ML models.

Key Responsibilities:

    Data Pipeline Engineering : Build, test, and maintain production pipelines (batch & real-time) on Snowflake, PySpark, Delta Lake, and Kafka.
    Implement data quality checks, schema validation, and alerting at every pipeline stage.
    Migrate legacy ETL/DWH to cloud-native AWS/Azure architectures with measurable latency and cost improvements. 
    Maintain CI/CD pipelines: automated testing, deployment, rollback, and IaC (Terraform, GitHub Actions). 
     
    RAG, Vector & Retrieval Infrastructure: Build end-to-end retrieval infrastructure: document ingestion, embedding pipelines, vector store management (Pinecone, FAISS, ChromaDB, OpenSearch), and hybrid retrieval layers.
    Implement chunking, metadata filtering, and re ranking — tuning for precision, recall, and latency. 
    Maintain data freshness and index consistency; instrument with context relevance and faithfulness metrics.
     
    Semantic Layer & Knowledge Infrastructure: Implement and maintain business entity mappings, ontologies, and knowledge graphs (Neo4j) per Architect design.
    Build and version the feature store and semantic data contracts serving both ML models and LLM applications.
    Manage metadata, data lineage, and audit trail instrumentation across the platform.
     
    ML/LLMOps Pipeline Support: Build ML data infrastructure: training curation, feature engineering, MLflow experiment tracking, dataset versioning.
    Support LLM fine-tuning workflows — corpus curation, quality filtering, dataset formatting.
    Implement automated evaluation pipelines: factual accuracy, hallucination detection, regression tracking.
    Maintain production monitoring dashboards for pipeline health, model metrics, and alerting.
     
    Agentic Data Infrastructure: Build and maintain data APIs, tool schemas, and memory/state stores that autonomous agents depend on.
    Implement agent observability: capture inputs, retrieved context, tool calls, reasoning traces, and outputs.
    Maintain text-to-SQL layers, semantic query interfaces, and context APIs for conversational AI consumers.
     
    Governance, Security & Data Quality: Implement RBAC, attribute-based access, PII detection/masking, data classification, and audit logging.
    Enforce data contracts and schema governance with automated breaking-change detection and versioned migrations.
    Build data quality monitoring (completeness, freshness, consistency) with automated alerting and root-cause tooling.
    Support compliance readiness: audit trails, data provenance, and regulatory documentation. 

Qualifications:

  • 7+ years data engineering using Cloud services
  • 2+ years production AI/ML or LLM-era data infrastructure. Proven experience building production pipelines at scale — batch and streaming, Snowflake,AWS/Azure. 
  • Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka, Spark Structured Streaming. 
  • Hands-on with vector stores, embedding pipelines, and retrieval infrastructure in production RAG environments.
  • Working knowledge of MLOps: MLflow, CI/CD for AI, automated evaluation, and production monitoring.
  • Strong grounding in data governance, quality frameworks, and compliance-
    aligned engineering.
  • Technical Skills:

  • Primary skills: Python, SQL, PySpark, Kafka, Snowflake/DataBricks, Delta Lake, AWS (S3, Glue, Kinesis, EKS, Redshift), Docker, Kubernetes, GitHub Actions.
  • Secondary Skills : LangChain, LlamaIndex, LLM APIs (OpenAI, Bedrock, Claude, HuggingFace), Pinecone, FAISS, ChromaDB, OpenSearch, MLflow, FastAPI, Neo4j,  LangGraph, prompt engineering, RLHF dataset prep, LLM fine-tuning workflows

Connect:

    Regards,
    Kiran Dhanak
    Talent Acquisition Manager

Similar Jobs

An Hour Ago
Remote or Hybrid
India
Senior level
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Lead design and delivery of agentic LLM-powered workflows and autonomous agents across GTM systems (Salesforce, Slack). Build RAG/semantic search, orchestration, vector retrieval, evaluation frameworks, CI/CD, and secure AI integrations. Mentor engineers, replace legacy integrations, enforce AI engineering standards, and implement observability, governance, and automation for production-grade enterprise AI.
Top Skills: AgentcoreAgentforceApexAutogenAws BedrockCopadoCrewaiGithub ActionsJavaScriptJenkinsLangchainLanggraphLightning Web ComponentsLlamaindexMcpPythonRestSalesforce EinsteinSalesforce Platform EventsSemantic KernelSlackSlack Workflow BuilderSoapTypescriptVector DatabasesVertex Ai
An Hour Ago
Remote or Hybrid
India
Senior level
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Lead engineering delivery of agentic AI solutions for GTM systems, designing LLM-powered workflows, autonomous agents, RAG/semantic search, Salesforce and Slack integrations, CI/CD and observability, mentoring engineers, enforcing AI governance and security-first practices.
Top Skills: AgentcoreAgentforceApexAutogenAws BedrockCi/CdCopadoCrewaiDocument ParsingGithub ActionsJavaScriptJenkinsLangchainLanggraphLightning Web ComponentsLlamaindexMcpPythonRagSalesforce EinsteinSalesforce Platform EventsSemantic KernelSemantic SearchSlackSlack Workflow BuilderStructured Extraction PipelinesTypescriptVector DatabasesVertex Ai
3 Hours Ago
Remote or Hybrid
Pune, Maharashtra, IND
Expert/Leader
Expert/Leader
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Lead the data engineering team, managing Snowflake infrastructure, optimizing Tableau reporting, developing data strategies, and ensuring data integrity and compliance.
Top Skills: AirflowAWSDbtFivetranMatillionSnowflakeSQLTableau

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account