Projects and Industry Applications
From laboratory automation to production ML systems: systematic thinking applied to solve real problems, with documented decisions and measurable results.
Automated Laser Polarimetry Platform
Real-time Stokes polarization analysis. Reduced scan time from 2 hours to 5 minutes.
Challenge
Characterizing polarization at each focus of a Twin-Foci ultrafast laser setup required manually rotating a motorized waveplate, reading power at each angle, and fitting data in a spreadsheet. The process took over 2 hours per dataset and was error-prone.
Solution
Built a real-time PyQt5 platform driving a Newport ESP301 motion controller and Ophir NOVAII power meter simultaneously. Uses the Fourier-based Schaefer method to extract complete Stokes parameters. Includes a TDC synchronization module for photon-counting experiments, an offline analysis CLI, and a hardware simulation mode for testing without physical equipment.
Impact
- +96% time reduction: 2-plus hours to roughly 5 minutes per dataset
- +Full Stokes vector extraction with uncertainty propagation from covariance matrix
- +TDC polarization controller for synchronized photon-counting experiments
- +Publication-quality export at 300 DPI in PNG, PDF, and SVG formats
Probabilistic Solar Forecasting: Physics-Informed, Calibrated
Day-ahead solar generation forecasts for the German grid. Physics handles geometry; XGBoost learns only the residual.
Challenge
Germany's Energiewende targets 80% renewables by 2030. Solar is volatile and weather-dependent. A 10% forecasting error at midday peak costs real money on the balancing market. Operators need calibrated probabilistic forecasts, not point estimates: 12 GW plus or minus 2 GW, with a coverage guarantee.
Solution
Built a two-layer physics-informed architecture on 3 years of public SMARD and Open-Meteo data (26,000+ hourly records in TimescaleDB). The physics layer uses pvlib to compute what solar output should be on a geometrically perfect day. XGBoost learns only the residual: what physics cannot see, namely clouds, curtailments, and measurement noise. Calibrated P10/P50/P90 intervals via split conformal prediction. Managed as a 22-day research sprint with 5 ADRs, 3 weekly reports, 2 retrospectives, and a public Kanban board.
Impact
- +Physics baseline alone: R2=0.78; Physics plus XGBoost: R2=0.92, MAE reduced 60%
- +physics_pred is XGBoost's top feature by importance; model amplifies physics, not ignores it
- +Split conformal prediction: P90 empirical coverage 0.869, distribution-free guarantee
- +CRPS=514.6 MW evaluated with reliability diagrams; calibration reported honestly
- +Full stack: TimescaleDB, FastAPI, and Streamlit dashboard starts with one command
- +GitHub Actions CI: ruff and pytest on every push; 5 Architecture Decision Records
Intelligent Predictive Maintenance System
Predicts turbofan engine remaining useful life with SHAP explainability, served via a production-grade API and auto-generated diagnostic reports.
Challenge
Industrial turbofan engines degrade gradually across hundreds of sensor channels. Threshold-based monitoring catches failures too late. Engineers need to know not just when failure occurs, but why: which sensors are driving the risk and what to do about it.
Solution
Built a complete end-to-end ML pipeline on NASA's CMAPSS dataset. The core insight: capping RUL targets at 125 cycles (treating early healthy cycles as interchangeable) dropped RMSE from 51 to 19, a domain decision that outperformed any algorithm choice. XGBoost with SHAP explainability, a 3-page Streamlit dashboard, FastAPI REST endpoint, and auto-generated PDF diagnostic reports in German DIN format. Everything runs with one command via Docker Compose.
Impact
- +XGBoost RMSE: 18.83 cycles with a train/test gap of only 2.0 (vs. 12.6 for baseline RF)
- +SHAP layer: top sensor drivers ranked per prediction with plain-language explanation
- +PDF diagnostic report auto-generated in German DIN format from both API and dashboard
- +FastAPI: POST /predict returns RUL, anomaly score, and status label in under 100ms
- +Full MLflow experiment tracking: 162 logged runs, fully reproducible
- +Docker Compose: TimescaleDB, API, and Dashboard starts with one command
Sprint starts May 2026
LangGraph · Claude API · ChromaDB
Sensor Intelligence Assistant
An agentic AI system that reasons across predictive maintenance tools to diagnose turbofan engine anomalies, grounded in real APIs, not hallucinated context.
Challenge
An operator types: What is happening with engine 14? A useful AI system should reason across multiple tools, deciding which to call and in what order based on what it finds at each step, and cite real sources rather than inventing them.
Solution
A LangGraph multi-agent system orchestrating three tools built in P1: Tool A calls the FastAPI prediction endpoint for remaining useful life. Tool B returns SHAP-grounded sensor explanations. Tool C retrieves relevant maintenance documentation from a ChromaDB vector store. The LLM synthesises a diagnostic report citing real API responses, then pauses for human confirmation before any action. Read-only by design. LangSmith tracing from Day 1.
Impact
- +LangGraph agent with human-in-the-loop confirmation before any action
- +Tool calls grounded in real FastAPI endpoints from P1; no hallucinated outputs
- +ChromaDB vector store for maintenance documentation retrieval
- +LangSmith tracing: every decision logged and fully auditable
PEPICO Data Analysis Pipeline
High-throughput spectroscopy pipeline. 10x throughput increase.
Challenge
Time-resolved PEPICO experiments produce complex binary TDC data. Manual conversion and analysis limited throughput and introduced inconsistencies across datasets.
Solution
Python and Jupyter-based pipeline that converts raw TDC binary files to calibrated time-of-flight spectra, performs electron kinetic energy and mass calibration, coincidence analysis, and statistical evaluation. Users specify data location and basic parameters; the pipeline handles the rest.
Impact
- +10x throughput increase: experiment pace now drives progress, not analysis
- +Generates publication-ready plots and statistical summaries automatically
- +Consistent, reproducible results across all datasets
INSPIRE Fellowship Research Program
5-year national research fellowship delivered end-to-end: budgeting, compliance, multi-institution coordination, and 12 first-author publications.
Challenge
Manage a prestigious national fellowship (one of roughly 1,000 awarded annually in India) as principal researcher while completing PhD research, coordinating across institutions in Germany and India over 5 years.
Solution
End-to-end program delivery including annual financial reporting, procurement planning, progress tracking for government review, and multi-institution collaboration. Applied the same systematic documentation discipline to project management that underpins the current ML portfolio.
Impact
- +Delivered 12 first-author publications over the fellowship period
- +Contributed to one granted Indian patent for gas-separation membranes
- +Maintained full regulatory compliance across a 5-year, multi-institution program
- +Coordinated stakeholders across institutions in India and Germany
Real Code. Real Impact.
Each project includes documented decisions, measurable results, and production-grade architecture. Not just notebooks.