A collection of hands-on data science and ML engineering projects focused on practical, production-ready skills.
| Project | Description | Tech Stack |
|---|---|---|
| CICD-MLflow | Enterprise CI/CD simulation for ML pipelines with real MLflow tracking | Docker, FastAPI, Next.js, MLflow, PostgreSQL, MinIO |
| ab-test | Interactive A/B testing learning lab with CUPED and DiD methods | React, TypeScript, Vite, Recharts |
An enterprise CI/CD simulation for medical claim ML pipelines with real MLflow tracking. Learn the complete lifecycle of ML engineering in an enterprise environment.
- Interactive Pipeline DAG with real-time status updates
- Real MLflow integration for experiment tracking
- Champion vs Challenger model promotion logic
- Drift monitoring and rollback capabilities
cd CICD-MLflow
# Setup environment (optional - .env is already included)
cp .env.example .env
# Start all services with Docker
docker compose up --build- Frontend UI: http://localhost:3001
- MLflow UI: http://localhost:5000
- Backend API: http://localhost:8000/docs
- MinIO Console: http://localhost:9001
- Docker Desktop (with Docker Compose)
- At least 4GB RAM available for Docker
- Ports available: 3000, 5000, 8000, 9000, 9001, 5432
An interactive web application that teaches A/B testing concepts through real-time simulated streaming events. Learn how CUPED reduces variance and Difference-in-Differences (DiD) removes confounding time effects.
- Multi-language support (English / 中文)
- Sample size calculator for pre-experiment planning
- Real-time streaming simulation of user events
- Step-by-step walkthroughs of CUPED and DiD calculations
cd ab-test
# Install dependencies
npm install
# Start development server
npm run dev
# Open http://localhost:5173- Node.js (v18+ recommended)
- npm or yarn
git clone https://github.com/YOUR_USERNAME/Data-scientist-projects.git
cd Data-scientist-projectsNavigate to the project directory you want to explore and follow the Quick Start instructions above.
If you're new to these topics, here's a suggested learning order:
- Start with ab-test - Learn fundamental A/B testing concepts, statistical methods (CUPED, DiD), and experimental design
- Continue with CICD-MLflow - Understand enterprise ML pipelines, experiment tracking, and model deployment workflows
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - See individual project LICENSE files for details.