Projects and Industry Applications

From laboratory automation to production ML systems: systematic thinking applied to solve real problems, with documented decisions and measurable results.

R2=0.92
Solar forecast accuracy
RMSE 18.83
RUL prediction
96%
Lab time saved
5 ADRs
Documented decisions
Click to enlarge
Completed and In Use

Automated Laser Polarimetry Platform

Real-time Stokes polarization analysis. Reduced scan time from 2 hours to 5 minutes.

96%
Time reduction
2h to 5 min

Challenge

Characterizing polarization at each focus of a Twin-Foci ultrafast laser setup required manually rotating a motorized waveplate, reading power at each angle, and fitting data in a spreadsheet. The process took over 2 hours per dataset and was error-prone.

Solution

Built a real-time PyQt5 platform driving a Newport ESP301 motion controller and Ophir NOVAII power meter simultaneously. Uses the Fourier-based Schaefer method to extract complete Stokes parameters. Includes a TDC synchronization module for photon-counting experiments, an offline analysis CLI, and a hardware simulation mode for testing without physical equipment.

Impact

  • +96% time reduction: 2-plus hours to roughly 5 minutes per dataset
  • +Full Stokes vector extraction with uncertainty propagation from covariance matrix
  • +TDC polarization controller for synchronized photon-counting experiments
  • +Publication-quality export at 300 DPI in PNG, PDF, and SVG formats
PythonPyQt5NumPySciPyMatplotlibPySerialPyVISA
Click to enlarge
Complete Private repo, launching soon

Probabilistic Solar Forecasting: Physics-Informed, Calibrated

Day-ahead solar generation forecasts for the German grid. Physics handles geometry; XGBoost learns only the residual.

R2=0.92
Physics + XGBoost
vs R2=0.78 physics only
60%
MAE reduction
1,552 vs 3,856 MW
CRPS 514.6
Calibrated UQ
P90 coverage: 0.869

Challenge

Germany's Energiewende targets 80% renewables by 2030. Solar is volatile and weather-dependent. A 10% forecasting error at midday peak costs real money on the balancing market. Operators need calibrated probabilistic forecasts, not point estimates: 12 GW plus or minus 2 GW, with a coverage guarantee.

Solution

Built a two-layer physics-informed architecture on 3 years of public SMARD and Open-Meteo data (26,000+ hourly records in TimescaleDB). The physics layer uses pvlib to compute what solar output should be on a geometrically perfect day. XGBoost learns only the residual: what physics cannot see, namely clouds, curtailments, and measurement noise. Calibrated P10/P50/P90 intervals via split conformal prediction. Managed as a 22-day research sprint with 5 ADRs, 3 weekly reports, 2 retrospectives, and a public Kanban board.

Impact

  • +Physics baseline alone: R2=0.78; Physics plus XGBoost: R2=0.92, MAE reduced 60%
  • +physics_pred is XGBoost's top feature by importance; model amplifies physics, not ignores it
  • +Split conformal prediction: P90 empirical coverage 0.869, distribution-free guarantee
  • +CRPS=514.6 MW evaluated with reliability diagrams; calibration reported honestly
  • +Full stack: TimescaleDB, FastAPI, and Streamlit dashboard starts with one command
  • +GitHub Actions CI: ruff and pytest on every push; 5 Architecture Decision Records
PythonpvlibXGBoostMAPIEproperscoringTimescaleDBFastAPIStreamlitDockerMLflowGitHub Actions
Click to enlarge
Complete Private repo, launching soon

Intelligent Predictive Maintenance System

Predicts turbofan engine remaining useful life with SHAP explainability, served via a production-grade API and auto-generated diagnostic reports.

18.83
RMSE (cycles)
vs 51.33 baseline
62%
Error reduction
from domain insight
2.0
Train-test gap
vs 36.8 baseline RF

Challenge

Industrial turbofan engines degrade gradually across hundreds of sensor channels. Threshold-based monitoring catches failures too late. Engineers need to know not just when failure occurs, but why: which sensors are driving the risk and what to do about it.

Solution

Built a complete end-to-end ML pipeline on NASA's CMAPSS dataset. The core insight: capping RUL targets at 125 cycles (treating early healthy cycles as interchangeable) dropped RMSE from 51 to 19, a domain decision that outperformed any algorithm choice. XGBoost with SHAP explainability, a 3-page Streamlit dashboard, FastAPI REST endpoint, and auto-generated PDF diagnostic reports in German DIN format. Everything runs with one command via Docker Compose.

Impact

  • +XGBoost RMSE: 18.83 cycles with a train/test gap of only 2.0 (vs. 12.6 for baseline RF)
  • +SHAP layer: top sensor drivers ranked per prediction with plain-language explanation
  • +PDF diagnostic report auto-generated in German DIN format from both API and dashboard
  • +FastAPI: POST /predict returns RUL, anomaly score, and status label in under 100ms
  • +Full MLflow experiment tracking: 162 logged runs, fully reproducible
  • +Docker Compose: TimescaleDB, API, and Dashboard starts with one command
PythonXGBoostSHAPFastAPIStreamlitMLflowfpdf2Dockerscikit-learnPlotly

Sprint starts May 2026

LangGraph · Claude API · ChromaDB

Tool: RUL APITool: SHAPTool: RAG
In Development — Coming May/June 2026

Sensor Intelligence Assistant

An agentic AI system that reasons across predictive maintenance tools to diagnose turbofan engine anomalies, grounded in real APIs, not hallucinated context.

P3
Agentic AI Sprint
Starting May 2026
3
Reasoning tools
RUL + SHAP + RAG
100%
Source-grounded
No hallucinations

Challenge

An operator types: What is happening with engine 14? A useful AI system should reason across multiple tools, deciding which to call and in what order based on what it finds at each step, and cite real sources rather than inventing them.

Solution

A LangGraph multi-agent system orchestrating three tools built in P1: Tool A calls the FastAPI prediction endpoint for remaining useful life. Tool B returns SHAP-grounded sensor explanations. Tool C retrieves relevant maintenance documentation from a ChromaDB vector store. The LLM synthesises a diagnostic report citing real API responses, then pauses for human confirmation before any action. Read-only by design. LangSmith tracing from Day 1.

Impact

  • +LangGraph agent with human-in-the-loop confirmation before any action
  • +Tool calls grounded in real FastAPI endpoints from P1; no hallucinated outputs
  • +ChromaDB vector store for maintenance documentation retrieval
  • +LangSmith tracing: every decision logged and fully auditable
LangGraphAnthropic Claude APIChromaDBsentence-transformersFastAPIStreamlitDockerLangSmith
Click to enlarge
In Active Use

PEPICO Data Analysis Pipeline

High-throughput spectroscopy pipeline. 10x throughput increase.

10x
Throughput
analysis speedup

Challenge

Time-resolved PEPICO experiments produce complex binary TDC data. Manual conversion and analysis limited throughput and introduced inconsistencies across datasets.

Solution

Python and Jupyter-based pipeline that converts raw TDC binary files to calibrated time-of-flight spectra, performs electron kinetic energy and mass calibration, coincidence analysis, and statistical evaluation. Users specify data location and basic parameters; the pipeline handles the rest.

Impact

  • +10x throughput increase: experiment pace now drives progress, not analysis
  • +Generates publication-ready plots and statistical summaries automatically
  • +Consistent, reproducible results across all datasets
PythonPandasNumPyMatplotlibBinary ParsingJupyter
Click to enlarge
Completed

INSPIRE Fellowship Research Program

5-year national research fellowship delivered end-to-end: budgeting, compliance, multi-institution coordination, and 12 first-author publications.

5 yrs
Program duration
INSPIRE Fellowship
12
First-author papers
delivered

Challenge

Manage a prestigious national fellowship (one of roughly 1,000 awarded annually in India) as principal researcher while completing PhD research, coordinating across institutions in Germany and India over 5 years.

Solution

End-to-end program delivery including annual financial reporting, procurement planning, progress tracking for government review, and multi-institution collaboration. Applied the same systematic documentation discipline to project management that underpins the current ML portfolio.

Impact

  • +Delivered 12 first-author publications over the fellowship period
  • +Contributed to one granted Indian patent for gas-separation membranes
  • +Maintained full regulatory compliance across a 5-year, multi-institution program
  • +Coordinated stakeholders across institutions in India and Germany
Project ManagementBudget PlanningRegulatory ComplianceStakeholder CommunicationScientific Writing

Real Code. Real Impact.

Each project includes documented decisions, measurable results, and production-grade architecture. Not just notebooks.