Fusemachines Logo

Fusemachines

Data Scientist

Posted 22 Days Ago
Be an Early Applicant
In-Office
Pune, Maharashtra
Mid level
In-Office
Pune, Maharashtra
Mid level
As a Data Scientist, you will architect scalable solutions, apply advanced analytics, develop data solutions, and communicate insights to improve business strategies.
The summary above was generated by AI

About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.
About the Role:
Location: Remote | Contractual Full-time
We are seeking a Data Scientist with hands-on Python experience and proven abilities to support software activities in an Agile software development lifecycle. We are seeking a well-rounded developer to lead a cloud-based big data application using a variety of technologies. The ideal candidate will possess strong technical, analytical, and interpersonal skills. In addition, the candidate will lead developers on the team to achieve architecture and design objectives as agreed with stakeholders.

Role Description

  • Work with developers on the team to meet product deliverables.
  • Work independently and collaboratively on a multi-disciplined project team in an Agile development environment.
  • Contribute detailed design and architectural discussions as well as customer requirements sessions to support the implementation of code and procedures for our big data product.
  • Design and develop clear and maintainable code with automated open-source test functions 
  • Ability to identify areas of code/design optimization and implementation.
  • Learn and integrate with a variety of systems, APIs, and platforms.
  • Interact with a multi-disciplined team to clarify, analyze, and assess requirements.
  • Be actively involved in design, development, and testing activities in big data applications.

Key Responsibilities

Data Engineering & Processing:
  • Develop scalable data pipelines using PySpark for processing large datasets.
  • Work extensively in Databricks for collaborative data science workflows and model deployment.
  • Handle messy, unstructured, and semi-structured data, performing thorough Exploratory Data Analysis (EDA).
  • Apply appropriate statistical measures and hypothesis testing to derive insights and validate assumptions.
Data Analysis & Modeling:
  • Write complex SQL queries for data extraction, transformation, and analysis.
  • Build and validate predictive models using techniques such as: Gradient Boosting Machines (GBMs) (e.g., XGBoost, LightGBM), Generalized Linear Models (GLMs) (e.g., logistic regression, Poisson regression)
  • Apply unsupervised learning techniques like clustering (K-Means, DBSCAN), PCA, and anomaly detection.
Automation & Optimization:
  • Automate data workflows and model training pipelines using scheduling tools (e.g., Airflow, Databricks Jobs).
  • Optimize model performance and data processing efficiency.
Cloud & Deployment:
  • Basic experience with Azure or other cloud platforms (AWS, GCP) for data storage, compute, and model deployment.
  • Familiarity with cloud-native tools like Azure Data Lake, Azure ML, or equivalent.

Required Skills:

  • Programming Languages: Python (with PySpark), SQL
  • Tools & Platforms: Databricks, Azure (or other cloud platforms), Git
  • Libraries & Frameworks: scikit-learn, pandas, numpy, matplotlib/seaborn, XGBoost/LightGBM
  • Statistical Knowledge: Hypothesis testing, confidence intervals, correlation analysis
  • Machine Learning: Supervised and Unsupervised learning, model evaluation metrics
  • Data Handling: EDA, feature engineering, dealing with missing/outlier data
  • Automation: Experience with job scheduling and pipeline automation​​​​​​.
Required Experience:
  • Minimum 5+ years in Data Science or related fields
  • Hands on experience with Databricks.
  • Experience with data cleansing, transformation, and validation.
  • Proven technical leadership on prior development projects.
  • Hands-on experience with versioning tools such as GitHub, Azure Devops, Bitbucket, etc.
  • Hands-on experience building pipelines in GitHub (or Azure Devops, etc.)
  • Hands-on experience using Relational Databases, such as Oracle, SQL Server, MySQL, Postgres or similar.
  • Experience using Markdown to document code in repositories or automated documentation tools like PyDoc.
  • Strong written and verbal communication skills.

Preferred Qualifications:
  • Experience with data visualization tools such as Power BI or Tableau.
  • Experience with MLOps, DEVOPS CI/CD tools and automation processes (e.g., Azure DevOPS, GitHub, BitBucket).
  • Containers and their environments (Docker, Podman, Docker-Compose, Kubernetes, Minikube, Kind, etc.)
  • Experience working in cross-functional teams and communicating insights to stakeholders.

Education
Master of Science/B. Tech degree from an accredited university
 

Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.

Top Skills

Spark
Databricks
Git
Jupyter Notebooks
Oracle
Postgres
Pyspark
Python
SQL
SQL Server

Similar Jobs

2 Days Ago
Hybrid
Mumbai, Maharashtra, IND
Mid level
Mid level
Financial Services
The Data Scientist Associate will analyze complex data sets, develop predictive models, and create dashboards to improve technology efficiency and business outcomes.
Top Skills: DatabricksPythonRedshiftSnowflakeSQLTableau
Yesterday
In-Office
Park Road, Andheri, Mumbai Suburban, Maharashtra, IND
Senior level
Senior level
Fintech • Financial Services
The Data Scientist will lead AI and data science projects, develop ML models, mentor juniors, and implement AI solutions for business needs.
Top Skills: AIAWSAzureCloud PlatformsComputer VisionDeep LearningGenerative AiNlpPythonSQL
5 Days Ago
In-Office
Mumbai, Maharashtra, IND
Entry level
Entry level
Insurance
Join the Data Lab team to develop AI solutions, utilizing machine learning and NLP, focusing on code quality and best practices for deployment.
Top Skills: Apache AirflowDockerGitKubernetesLinuxMachine LearningNatural Language ProcessingNoSQLPythonSQL

What you need to know about the Pune Tech Scene

Once a far-out concept, AI is now a tangible force reshaping industries and economies worldwide. While its adoption will automate some roles, AI has created more jobs than it has displaced, with an expected 97 million new roles to be created in the coming years. This is especially true in cities like Pune, which is emerging as a hub for companies eager to leverage this technology to develop solutions that simplify and improve lives in sectors such as education, healthcare, finance, e-commerce and more.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account