Building AI-powered data systems — from distributed ETL pipelines processing millions of records to RAG infrastructure and production ML models. I turn raw data into decisions that scale.
Perfect for recruiter calls — real charts, live demos, code walkthroughs.
Designed and deployed RAG pipeline on Databricks, increasing retrieval accuracy by 22% and reducing irrelevant outputs by ~28%. Implemented automated monitoring reducing model degradation detection time by ~50%.
PySpark ETL on 1M+ records with 48-feature engineering, XGBoost+LightGBM ensemble and SHAP explainability.
Migrated ML system to PySpark for large-scale inference achieving 4× faster processing. Improved anomaly detection precision by ~25% and delivered real-time dashboards reducing detection time by ~30%.
Built churn models on 100K+ engagement records using SQL-driven feature engineering. Achieved 87% accuracy, reducing attrition by 15% and saving ~$50K annually. Delivered interactive Tableau dashboards for real-time retention insights.
Centralised Tableau reporting layer over 3+ siloed hospital data sources — real-time KPI dashboards replacing manual spreadsheet reporting.
XGBoost + LightGBM ensemble predicting 30-day hospital readmission from EHR clinical features with risk stratification outputs.
8-slide deck covering: the three-part challenge (scale/precision/visibility), 6-step pipeline architecture, PySpark Before vs After comparison, anomaly detection methods (Isolation Forest + statistical controls + ensemble), real-time dashboard with mock wireframe, results (4×, ~25%, ~30%), and key takeaways. Charcoal + electric red-orange palette.
↓ Download PPTXActively seeking full-time Data Scientist and ML Engineer roles. Whether you have a position or just want to talk data — I respond within 24 hours.
✉ Send an email