Malaria No More Logo

Malaria No More

Data Engineer

Posted 7 Days Ago
Be an Early Applicant
Remote
2 Locations
Senior level
Remote
2 Locations
Senior level
The Data Engineer will architect data lakes, curate datasets, automate data services, and develop training labs, leveraging cloud technologies and ensuring data quality for AI applications in climate and health.
The summary above was generated by AI
The Institute for Health Modeling and Climate Solutions (IMACS) is a global center of excellence, hosted by Malaria No More, with the mission to empower the world’s most climate-vulnerable countries with the tools, data, and expertise needed to predict, prevent, and respond to climate-sensitive health threats.  

IMACS is redefining how climate intelligence is operationalized in public health by building and scaling AI-powered digital public goods that integrate and model climate and health data. Through the application of machine learning, interoperable platforms, and next-generation early warning systems, IMACS enables real-time risk detection and proactive responses at scale. IMACS supports countries through co-designed implementation pathways– orchestrating data cooperation, strengthening national health and climate information systems with tailored innovations, training frontline actors and policymakers, and institutionalizing their use through clear SOPs and sustainability guidelines. By unlocking the value of climate and health data, IMACS helps transform fragmented information into strategic, actionable knowledge– enabling smarter decisions, better preparedness, and more resilient health systems in the era of climate disruption. 

Backed by the Patrick J. McGovern Foundation, we are building a Central Data & Analytics Hub (CDAH) to advance IMACS’ climate health AI foundation model and related digital public goods, as well as a training program, to equip public health professionals with the knowledge and tools required to make data-informed decisions at the intersection of climate and health.  

The CDAH will be a cloud-native, open-source “operating system” for integrated climate and health intelligence, built on five pillars:

  • AI R&D environment:Ingests multi-modal climate, environmental, epidemiological and socio-demographic data into a unified data lake & feature store; supports Kubeflow/PyTorch/TensorFlow pipelines with MLflow registry, automated benchmarking, architecture search, transfer learning and uncertainty-aware modeling. 

  • Digital tool marketplace & public goods registry: User-facing portal for dashboards, mobile apps and alerting platforms; structured backend registry of pre-trained model packages, microservices, ETL scripts, governance adapters, metadata and version history. 

  • Systems integration & deployment layer: Middleware adapters and Kafka messaging to plug AI services into DHIS2, HMIS, IDSR and similar platforms; Terraform/Ansible IaC, identity management, end-to-end encryption and compliance with data-governance standards. 

  • Training environment: Web portal and virtual bootcamp infrastructure hosting open-access modules, instructor-led sessions, hands-on Jupyter labs, code templates and certification tracks on climate-health AI workflows and interoperability. 

  • Real-world evaluation sandbox: Controlled simulation environment replicating public-health workflows, climate variability and institutional constraints; structured feedback loops for piloting, validating and refining tools prior to full-scale rollout. 

What We’re Looking For

  • Architect the data backbone:Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data. 

  • Source & curate open-source datasets: Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training. 

  • Automate data quality assurancet & governance: Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls. 

  • Ingest and harmonize datasets:Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS 

  • Automate data services: Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs. 

  • Develop Training Labs:Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges 

  • Co-lead bootcamps: Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback. 

  • Publish open-source components: Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses. 

What We’re Looking For

  • Deep technical expertise: 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines.Demonstrated experience discovering, evaluating and integrating diverse open-source data streams for ML pipelines. 

  • DataOps & Cloud proficiency:Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP. 

  • API & microservices: Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures. 

  • Consulting acumen: Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization. 

  • Autonomous delivery:Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight. 

Preferred Qualifications

  • Prior engagement in global health, One Health, or climate-health data initiatives.
  •  
  • Familiarity with data-governance frameworks (e.g., GDPR, HIPAA) and cybersecurity best practices. 

  • Experience designing and delivering technical training or bootcamps. 

  • Contributions to open-source digital public goods or curated registries. 

Why You’ll Love This Role

  • High-impact mission: Your work will directly strengthen early warning systems and resilience in climate-vulnerable regions. 

  • Technical leadership:Own the design and delivery of the CDAH's data backbone. 

  • Innovation-friendly environment:Leverage cutting-edge Big Data and cloud technologies in a dynamic, open-source ecosystem. 

  • Global collaboration: Engage a diverse network of public-health experts, policymakers, and open-source communities. 

Please submit your résumé, a brief cover letter outlining your most relevant Data projects/ consulting engagements, and links to GitHub repos or model demos.  

Malaria No More is an equal opportunity employer, and all qualified applicants will be considered without regard to race, color, religion, sex, disability status, sexual orientation, gender identity, national origin, veteran status, or any other characteristic protected by law. We are committed to fostering a diverse and inclusive workplace and provide equal opportunities in all terms and conditions of employment

Top Skills

Airflow
AWS
Azure
Dbt
Docker
Flink
GCP
Kafka
Kubernetes
Python
Spark
SQL

Similar Jobs

4 Days Ago
Remote or Hybrid
Hyderabad, Telangana, IND
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Lead the design and implementation of AI systems, integrate with Snowflake, mentor the engineering team, and ensure high-quality solutions.
Top Skills: Agentic AiAi/Ml TechnologiesDeep LearningLlmsMcpMl AlgorithmsPythonSnowflakeSQL
2 Hours Ago
In-Office or Remote
Bengaluru, Bengaluru Urban, Karnataka, IND
Mid level
Mid level
Fintech • Payments • Financial Services
The Data Engineer/Data Scientist will design, implement, and optimize ETL pipelines, manage data flows from AWS to GCP, and provide analytics for growth strategies.
Top Skills: AWSBigQueryETLGCPSQL
6 Days Ago
In-Office or Remote
3 Locations
Senior level
Senior level
Aerospace • Energy
The Operational Data Engineer designs and maintains data pipelines and analytical solutions to enhance operational efficiency and fuel savings for airline customers. Responsibilities include data validation, collaboration with teams, and supporting sustainability goals.
Top Skills: AWSAzureBigQueryGCPPowershellPythonRedshiftSnowflakeSQL

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account