HEROIC Logo

HEROIC

Threat Intelligence Data Engineer

Reposted 13 Days Ago
Be an Early Applicant
In-Office
Pune, Maharashtra, IND
Mid level
In-Office
Pune, Maharashtra, IND
Mid level
The Threat Intelligence Data Engineer designs and operates automated data collection systems for cybersecurity, focusing on discovering and indexing data from various web sources, including dark web and decentralized networks.
The summary above was generated by AI
About the Role: HEROIC Cybersecurity (HEROIC.com) is seeking a senior-level Threat Intelligence Data Engineer - Automated Collection & Dark Web Intelligence to design, build, and operate fully automated intelligence collection systems that power our AI-driven cybersecurity and breach intelligence platforms.
This role owns the end-to-end discovery, acquisition, and ingestion pipeline for continuously discovering, crawling, extracting, indexing, and normalizing millions of new artifacts daily—including documents, chats, forums, leaked datasets, repositories, threat actor communications, hacker marketplaces, unsecured infrastructure, and decentralized networks across the surface web, deep web, dark web, and anonymized networks.


Our Threat Research Team’s mission is aggressive:
achieve near-total coverage of global breach and leak data with 99%+ automation. Your work directly enables HEROIC’s ability to identify exposures before they are weaponized.

What You Will Do:
Automated Intelligence Collection & Discovery
  • Architect and operate large-scale, distributed crawling and discovery systems across:

    • Surface web, deep web, and dark web

    • Hacker forums, underground marketplaces, and breach communities

    • Chat platforms (Telegram, Discord, IRC, WhatsApp, etc.)

    • Paste sites, code repositories, and social platforms used for breach disclosure

  • Continuously discover, archive, and download newly released datasets, logs, credentials, and artifacts the moment they appear

Dark Web, Anonymized & Decentralized Networks
  • Build automated collectors and archivers for anonymized and decentralized networks including:

    • Tor (.onion), I2P, ZeroNet, Freenet, IPFS, GNUnet, Lokinet, Yggdrasil, and similar systems

  • Design resilient workflows for unreliable, adversarial, or ephemeral data sources

  • Normalize and index data from non-traditional network protocols and formats

Infrastructure & Exposure Discovery
  • Develop automated scanning systems to identify:

    • Unsecured databases (Elasticsearch, MySQL, PostgreSQL, MongoDB, etc.)

    • Exposed cloud storage (S3, Azure, GCP, DigitalOcean Spaces)

    • Open FTP servers, backups, and misconfigured archives

  • Monitor and ingest data from file hosting and distribution platforms commonly used for breach dumps

Pipeline Engineering & Operations
  • Build ETL pipelines to clean, normalize, enrich, and index structured and unstructured data

  • Implement advanced anti-bot evasion strategies (proxy rotation, fingerprinting, CAPTCHA mitigation, session management)

  • Integrate collected intelligence into centralized databases and search systems

  • Design APIs and internal tooling to support downstream analysis and AI/ML workflows

  • Implement advanced anti-bot, evasion, and resiliency techniques (proxy rotation, fingerprinting, CAPTCHA mitigation, session handling)

  • Automate deployment, scaling, and monitoring using Docker, Kubernetes, and cloud infrastructure

  • Continuously optimize performance, reliability, and cost efficiency of crawler clusters



Requirements
  • Minimum 4 years of hands-on experience in data engineering, intelligence collection, crawling, or distributed data pipelines

  • Strong Python expertise and experience with frameworks such as Scrapy, Playwright, Selenium, or custom async systems

  • Proven experience operating high-volume, automated data collection systems in production

  • Deep understanding of web protocols, HTTP, DOM parsing, and adversarial scraping environments

  • Experience with asynchronous, concurrent, and distributed architectures

  • Familiarity with SQL and NoSQL databases (PostgreSQL, MongoDB, Elasticsearch, Cassandra)

  • Strong Linux/Unix, shell scripting, and Git-based workflows

  • Experience deploying and operating systems using Docker, Kubernetes, AWS, or GCP

  • Excellent analytical, debugging, and problem-solving skills

  • Strong written and verbal communication skills.

Preferred / High-Value Experience
  • Direct experience with dark web intelligence, breach data, OSINT, or threat research

  • Familiarity with Tor, I2P, underground forums, stealer logs, or credential ecosystems

  • Experience processing large breach datasets or stealer logs

  • Background working in adversarial data environments

  • Exposure to AI/ML-driven intelligence platforms


Benefits
  • Position Type: Full-time
  • Location: Remote in India. Work from wherever you please! Your home, the beach, our offices, etc. 
  • Compensation: USD 1300-2000 monthly
  • Professional Growth: Amazing upward mobility in a rapidly expanding company.
  • Innovative Culture: Be part of a team that leverages AI and cutting-edge technologies. 

About Us:  HEROIC Cybersecurity (HEROIC.com) is building the future of cybersecurity. Unlike traditional cybersecurity solutions, HEROIC takes a predictive and proactive approach to intelligently secure our users before an attack or threat occurs. Our work environment is fast-paced, challenging, and exciting. At HEROIC, you’ll work with a team of passionate, engaged individuals dedicated to intelligently securing the technology of people all over the world. 

Similar Jobs

27 Minutes Ago
Hybrid
Mid level
Mid level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Design, build, test, and maintain scalable, fault-tolerant Java APIs using Spring/Spring Boot. Implement RESTful services, ORM (Hibernate/JPA), work with SQL/NoSQL databases, collaborate with stakeholders, resolve production issues, and mentor other engineers while following Agile practices and testing best practices.
Top Skills: HibernateJavaJpaJunitMocking FrameworksMongoDBMySQLOracleRestSpringSpring BootSpring Mvc
Senior level
Fintech • Information Technology • Financial Services
As an Index Researcher, you will produce index projections, conduct statistical analyses, build analytics platforms, and support various teams with data insights while ensuring quality in a fast-paced environment.
Top Skills: AladdinBloombergFactsetPythonRSQL
20 Hours Ago
Hybrid
Pune, Maharashtra, IND
Entry level
Entry level
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
The Finance Associate will manage client invoicing, billing compliance, and financial reporting, collaborating with project teams, ensuring timely transaction accuracy, and resolving financial issues.
Top Skills: MS OfficeSAP

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account