Division50
Data & AI Specialist - Data Scraping, Enrichment & Quality Assurance
Be an Early Applicant
The Data & AI Specialist will build and maintain data pipelines, ensure data quality, integrate AI tools, and collaborate with teams to ensure data accuracy and process optimization.
Overview
Requirements
Benefits
We’re looking for a data-obsessed explorer who can build and maintain pipelines that collect, clean, and enhance large volumes of data, then apply AI tools to keep it accurate, useful, and ready for analysis. This is initially a project-based role with the possibility of evolving into a full-time contract based on performance and business needs.
Key Responsibilities- Data Acquisition & Scraping
- Design, develop, and maintain scalable web-scraping systems and APIs to collect structured and unstructured data from diverse sources.
- Ensure compliance with data privacy laws (GDPR, CCPA) and site-specific terms of service.
- Data Enrichment & Transformation
- Implement pipelines to clean, normalize, and enrich raw data using third-party datasets, NLP (natural language processing), and machine learning techniques.
- Build automated matching and deduplication processes to maintain a unified source of truth.
- Quality Assurance & Monitoring
- Create automated QA checks to validate data accuracy, completeness, and consistency.
- Set up monitoring and alert systems to catch anomalies or pipeline failures early.
- AI & Process Optimization
- Integrate AI models for entity extraction, text classification, and predictive enrichment.
- Work with the data science team to design features that feed analytics and machine learning models.
- Collaboration & Documentation
- Partner with product, engineering, and analytics teams to define data requirements and priorities.
- Maintain clear technical documentation and data lineage records.
Requirements
- Strong programming skills in Python (Scrapy, BeautifulSoup, Selenium, Playwright) or equivalent languages.
- Experience with data pipelines and ETL tools (Airflow, Prefect, or similar).
- Proficiency in SQL/NoSQL databases and data warehousing (e.g., BigQuery, Snowflake).
- Familiarity with cloud platforms (AWS, GCP, or Azure) and containerization (Docker/Kubernetes).
- Knowledge of machine learning workflows and libraries (scikit-learn, spaCy, Hugging Face) is a big plus.
- Solid understanding of data privacy and ethical data collection practices.
- Experience with LLMs (large language models) for text enrichment.
- Background in data visualization or BI tools (Tableau, Looker, Power BI).
- Familiarity with real-time streaming data (Kafka, Kinesis).
- Detail-oriented with a knack for spotting hidden data issues.
- Curious problem solver who loves automation and efficiency.
- Comfortable in a fast-paced environment where requirements evolve quickly.
Benefits
Remote work.
Flexible work schedule .
Opportunity for a long term contract .
Top Skills
Airflow
AWS
Azure
Beautifulsoup
BigQuery
Docker
GCP
Hugging Face
Kubernetes
NoSQL
Playwright
Prefect
Python
Scikit-Learn
Scrapy
Selenium
Snowflake
Spacy
SQL
Similar Jobs
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
The Manager of Analytics will lead analytics initiatives, develop models, manage teams, and work closely with business leaders to drive data-driven decision-making in supply chain operations.
Top Skills:
AlteryxExcelPower BIPythonRSAPSQLTableau
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The role involves defining product roadmaps for operational technology in industrial environments, collaborating with engineering and UX teams, and understanding market challenges for effective product management.
Top Skills:
AIAPIsComputer NetworksCybersecurityData ManagementEnterprise ApplicationsIndustrial Control SystemsSaaSServicenow
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Platform Administrator will manage ServiceNow environments, troubleshoot issues, provide training, and work on system upgrades and documentation. Collaborating with teams, the role focuses on continuous improvement and maintaining high service standards.
Top Skills:
CmdbGlide ScriptingItomItsmJavaScriptMid ServersMySQLOraclePerformance AnalyticsRest ApisServicenowSoap ApisSplunk
What you need to know about the Pune Tech Scene
Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.