The Senior Data Engineer will design and maintain pipelines for event data ingestion and validation, ensuring operational reliability and consistency for analytics.
About HighLevel:
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.
To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.
Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.
Our People
With over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.
Our Impact
Every month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.
Learn more about us on our YouTube Channel or Blog Posts
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.
To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.
Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.
Our People
With over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.
Our Impact
Every month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.
Learn more about us on our YouTube Channel or Blog Posts
About the Role:
We are looking for a Lead Data Engineer to own the event ingestion and identity layer that connects product instrumentation to downstream analytical systems.
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
Responsibilities:
- Define event schemas, required fields, and compatibility rules in collaboration with the CDP team
- Implement automated validation and contract enforcement to prevent breaking schema changes
- Maintain versioning and compatibility guarantees for event producers and downstream consumers
- Build and maintain pipelines that ingest, validate, and process high-volume product events
- Ensure event streams are deduplicated, ordered correctly, and safe for downstream consumption
- Partner with platform teams to ensure ingestion pipelines scale with product growth
- Define and maintain identity stitching logic across anonymous and authenticated users
- Handle identity merges, splits, and corrections while preserving tenant boundaries
- Ensure identity resolution remains explainable, deterministic, and safe for downstream datasets
- Design workflows that allow event datasets and identity graphs to be replayed or rebuilt safely
- Build tooling for historical corrections, schema evolution, and dataset reprocessing
- Ensure downstream models can be rebuilt without manual intervention when definitions evolve
- Provide guidance and tooling that help product teams emit events consistently
- Maintain validation checks and schema enforcement that catch instrumentation issues early
- Collaborate with engineering teams to evolve instrumentation safely over time
- Ensure deletion and suppression requests propagate correctly through event and identity pipelines
- Partner with governance and security teams to support policy requirements
- Define requirements and interfaces for event infrastructure and downstream analytical systems
- Work with platform teams to ensure pipelines remain reliable, scalable, and observable.
Requirements:
- 7+ years of experience in data engineering, platform engineering, or product data roles
- Strong experience building and operating event ingestion or streaming pipelines
- Experience implementing schema validation, data contracts, or event governance frameworks
- Strong SQL and Python, with experience building data processing or validation tooling
- Familiarity with identity resolution, entity resolution, or customer identity systems
- Experience operating analytical data systems or large-scale event datasets
EEO Statement:
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
#LI-Remote #LI-NJ1
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
#LI-Remote #LI-NJ1
Similar Jobs
Healthtech • Software
The role involves designing and maintaining scalable data architectures and pipelines, collaborating with teams to deliver data-driven solutions, and optimizing complex SQL across various databases.
Top Skills:
Apache AirflowAws (S3BedrockDjangoEc2FlaskGithub CopilotLambda)LangchainMs Sql ServerOraclePostgresPythonSnowflakeSQL
Insurance • Software • Energy • Financial Services
Lead a team of data engineers to build and maintain data pipelines, ensure engineering best practices, and collaborate with cross-functional teams.
Top Skills:
AirflowBigQueryComposerDbtGCPGcsKafkaPythonSparkSQL
Healthtech • Pharmaceutical • Manufacturing
The Lead Data Engineer will design and develop scalable data pipelines, lead system integration, mentor the team, and ensure data quality for analytics solutions.
Top Skills:
AirflowAribaAws (S3Aws GlueBpcs)ConcurDbt CloudErp Applications (Sap S4FivetranInformatica IicsLambdaMs DynamicsOraclePower Bi/FabricSalesforceSnowflakeSQLStep Function)Workday
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

