Portrait of Shamik Basu

Shamik Basu

AI Enabled Data Scientist

2+ Years Production ML MS Data Science @ USC GPA 3.7 / 4.0

Projects & Impact

Production systems and research built from scratch

View all projects on GitHub →

Work History

Data Science Associate Intern Current

KCC Capital Partners · Los Angeles, CA · Jan 2026 – Present

  • Fine-tuning and integrating an open-source SLM into a production JavaScript/Docker chatbot service to automate client service request classification and routing, reducing handling overhead for the automation team.
  • Evaluating model outputs against baseline response quality benchmarks to guide iteration on prompt design and fine-tuning parameters.
Data Scientist

Bajaj Finserv Health · Pune, India · Nov 2023 – Dec 2024

  • Architected a real-time medical document analytics system using RAG, LangChain, and REST APIs - processing 5M+ records/month at 92% extraction accuracy, replacing a fully manual workflow.
  • Engineered LLM-based inference pipelines (GPT-3.5 Turbo) to automate high-complexity decision workflows, cutting operational costs by 72%.
  • Built modular monitoring pipelines with LangChain and Langfuse for model observability, reducing GPU compute utilization by 15%.
  • Integrated ML model outputs into Power BI dashboards, cutting ad-hoc reporting turnaround time by 42% and enabling self-serve analytics for stakeholders.
Associate Data Scientist

Bajaj Finserv Health · Pune, India · Jul 2022 – Oct 2023

  • Developed a supervised ML model (Logistic Regression) for workforce performance prediction, improving efficiency outcomes by 22%.
  • Redesigned the NER-based name-matching algorithm in the fraud detection pipeline, increasing policyholder identification accuracy by 27% and reducing false positives.
  • Processed 10M+ records in Azure Synapse using SQL to deliver business intelligence reports for senior stakeholders.
Data Engineer Intern

Bajaj Finserv Health · Pune, India · Jan 2022 – Jun 2022

  • Ran A/B tests and cohort analyses identifying key user behavior patterns that improved web conversion by 37%.
  • Designed a distributed analytics system in C++, Trino, and Docker supporting 10M+ records across 200+ features.
Data Engineer Intern

Reomnify · Nov 2020 – Jan 2021

  • Engineered custom web scraping solutions for 500+ company templates with version control, reducing manual data collection overhead.

Academic Background

Master of Science, Data Science

University of Southern California

Los Angeles, CA · Jan 2025 – Dec 2026

GPA 3.7 / 4.0

Coursework: Machine Learning, Deep Learning, Data Management, Data Science

Bachelor of Technology, Computer Science

SRM Institute of Science and Technology

Chennai, India · May 2018 – Jun 2022

GPA 3.46 / 4.0

Coursework: Machine Learning, Artificial Intelligence, Data Structures & Algorithms, Probability & Queueing Theory

Technical Stack

Languages

PythonSQLC/C++CUDABashJavaScriptR

ML / AI

PyTorchTensorFlowscikit-learnLLMsRAGLangChainLangfuseNLPNERBERTspaCyDeep LearningComputer Vision

Data Engineering

pandasNumPyApache KafkaSparkTrinoInformaticaA/B TestingPredictive Modeling

Databases

MySQLPostgreSQLMongoDBAzure SynapseSnowflakeSQL Server

Visualization

Power BITableauPlotlyMatplotlibStreamlit

Cloud & DevOps

Microsoft AzureDockerKubernetesCI/CDNext.jsFastAPIREST APIsELK StackGCP

Community & Mentorship

Vice President

GRIDS - Graduates Rising in Data Science, USC

Sep 2025 – Present

Lead analytics workshops, ideathons, and data-driven projects for 250+ members across USC's largest data science organization.

Graduate Student Mentor

USC Viterbi School of Engineering

Jan 2025 – Present

Mentoring 6 graduate students on data science career paths, technical communication, and professional development at USC.

Let's Work Together

Primary portfolio: www.shamik-basu.com