Malaria No More

Data Engineer

Reposted Yesterday

Remote

2 Locations

Senior level

Remote

2 Locations

Senior level

The Data Engineer will architect data lakes, curate datasets, automate data services, and develop training labs, leveraging cloud technologies and ensuring data quality for AI applications in climate and health.

The summary above was generated by AI

The Institute for Health Modeling and Climate Solutions (IMACS) is a global center of excellence, hosted by Malaria No More, with the mission to empower the world’s most climate-vulnerable countries with the tools, data, and expertise needed to predict, prevent, and respond to climate-sensitive health threats.

IMACS is redefining how climate intelligence is operationalized in public health by building and scaling AI-powered digital public goods that integrate and model climate and health data. Through the application of machine learning, interoperable platforms, and next-generation early warning systems, IMACS enables real-time risk detection and proactive responses at scale. IMACS supports countries through co-designed implementation pathways– orchestrating data cooperation, strengthening national health and climate information systems with tailored innovations, training frontline actors and policymakers, and institutionalizing their use through clear SOPs and sustainability guidelines. By unlocking the value of climate and health data, IMACS helps transform fragmented information into strategic, actionable knowledge– enabling smarter decisions, better preparedness, and more resilient health systems in the era of climate disruption.

Backed by the Patrick J. McGovern Foundation, we are building a Central Data & Analytics Hub (CDAH) to advance IMACS’ climate health AI foundation model and related digital public goods, as well as a training program, to equip public health professionals with the knowledge and tools required to make data-informed decisions at the intersection of climate and health.

The CDAH will be a cloud-native, open-source “operating system” for integrated climate and health intelligence, built on five pillars:

AI R&D environment:Ingests multi-modal climate, environmental, epidemiological and socio-demographic data into a unified data lake & feature store; supports Kubeflow/PyTorch/TensorFlow pipelines with MLflow registry, automated benchmarking, architecture search, transfer learning and uncertainty-aware modeling.

Digital tool marketplace & public goods registry: User-facing portal for dashboards, mobile apps and alerting platforms; structured backend registry of pre-trained model packages, microservices, ETL scripts, governance adapters, metadata and version history.

Systems integration & deployment layer: Middleware adapters and Kafka messaging to plug AI services into DHIS2, HMIS, IDSR and similar platforms; Terraform/Ansible IaC, identity management, end-to-end encryption and compliance with data-governance standards.

Training environment: Web portal and virtual bootcamp infrastructure hosting open-access modules, instructor-led sessions, hands-on Jupyter labs, code templates and certification tracks on climate-health AI workflows and interoperability.

Real-world evaluation sandbox: Controlled simulation environment replicating public-health workflows, climate variability and institutional constraints; structured feedback loops for piloting, validating and refining tools prior to full-scale rollout.

What We’re Looking For

Architect the data backbone:Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data.

Source & curate open-source datasets: Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training.

Automate data quality assurancet & governance: Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls.

Ingest and harmonize datasets:Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS

Automate data services: Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs.

Develop Training Labs:Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges

Co-lead bootcamps: Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback.

Publish open-source components: Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses.

What We’re Looking For

Deep technical expertise: 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines.Demonstrated experience discovering, evaluating and integrating diverse open-source data streams for ML pipelines.

DataOps & Cloud proficiency:Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP.

API & microservices: Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures.

Consulting acumen: Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization.

Autonomous delivery:Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight.

Preferred Qualifications

Prior engagement in global health, One Health, or climate-health data initiatives.

Familiarity with data-governance frameworks (e.g., GDPR, HIPAA) and cybersecurity best practices.

Experience designing and delivering technical training or bootcamps.

Contributions to open-source digital public goods or curated registries.

Why You’ll Love This Role

High-impact mission: Your work will directly strengthen early warning systems and resilience in climate-vulnerable regions.

Technical leadership:Own the design and delivery of the CDAH's data backbone.

Innovation-friendly environment:Leverage cutting-edge Big Data and cloud technologies in a dynamic, open-source ecosystem.

Global collaboration: Engage a diverse network of public-health experts, policymakers, and open-source communities.

Please submit your résumé, a brief cover letter outlining your most relevant Data projects/ consulting engagements, and links to GitHub repos or model demos.

Malaria No More is an equal opportunity employer, and all qualified applicants will be considered without regard to race, color, religion, sex, disability status, sexual orientation, gender identity, national origin, veteran status, or any other characteristic protected by law. We are committed to fostering a diverse and inclusive workplace and provide equal opportunities in all terms and conditions of employment

Top Skills

Airflow

AWS

Azure

Dbt

Docker

Flink

GCP

Kafka

Kubernetes

Python

Spark

SQL

Similar Jobs

Motive

Data Engineer

3 Days Ago

Easy Apply

Remote

India

Easy Apply

Senior level

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation

As a Data Engineer, you will build data pipelines, design data models, and deploy Data Ops, collaborating with multiple teams to enhance product offerings.

Top Skills: AirflowAWSDbtPythonSQL

ServiceNow

Staff Software Engineer

2 Days Ago

Remote or Hybrid

Hyderabad, Telangana, IND

Expert/Leader

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

Lead the design and implementation of data platform solutions, ensure data lifecycle management, maximize performance, and mentor engineering teams. Focus on scalability, efficiency, and implement data best practices.

Top Skills: Ai Productivity ToolsData ModelingData Storage TechnologiesDatabasesJavaJavaScriptKubernetes

Data Engineer

4 Days Ago

Remote

India

Senior level

Artificial Intelligence • Big Data • Healthtech • Database • Business Intelligence

As a Senior Backend Software Engineer, you'll design and build systems for H1's data platform using Python, SQL, and cloud technologies while collaborating with cross-functional teams.

Top Skills: AWSDockerMySQLPostgresPysparkPythonSQL

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.