Building AI-powered data systems — from distributed ETL pipelines to RAG infrastructure and predictive ML models. I turn messy data into decisions that scale.
End-to-end RAG pipeline with document ingestion, embedding generation, and FAISS vector indexing integrated with LLMs for enterprise semantic search.
Distributed ETL pipelines using PySpark processing 1M+ records. Ensemble ML models for risk prediction with full model explainability and drift monitoring.
Re-engineered Pandas workflow to distributed PySpark architecture. Automated anomaly detection and data quality monitoring dashboards for scalable analytical modeling.
Deep learning pipeline combining U-Net segmentation and spatial analysis on DermNet dataset. Evaluated ResNet34, VGG16, DenseNet121, InceptionV3, EfficientNet with a confidence scoring framework.
97.1% classification accuracyI'm actively looking for full-time Data Scientist and ML Engineer roles. Whether you have a project, a position, or just want to talk data — I'm always down to connect.
✉ Send an email