"From foundational principles to optimized implementation - a complete neural network journey" ๐ง
In this comprehensive project, we constructed a neural network from scratch using fundamental principles, leveraging the CIFAR-10 dataset for training. The implementation encompassed data preprocessing, forward/backward propagation, and highlighted the transformative power of vectorization for computational efficiency.
Core Insight: This project demonstrates the pivotal role of vectorization in accelerating neural network computations, illuminating its critical importance in modern machine learning workflows. โก
Under the guidance of Prof. Mohammad Mehdi Ebadzadeh
๐
Spring 2022
- NumPy - Fundamental package for scientific computing
- Matplotlib - Comprehensive library for visualization
- Scikit-image - Image processing algorithms
- PIL (Pillow) - Image manipulation capabilities
- Glob - File path pattern matching
- OS - Operating system interface
- Time - Time access and measurement utilities
Step 1: Data Acquisition & Storage
- ๐ฅ Read the first 4 classes from CIFAR-10 dataset (airplane
โ๏ธ , automobile ๐, bird ๐ฆ, and cat ๐ฑ) - ๐พ Store data in matrix format:
(n_samples, width, height, channels) - ๐ข Encode labels using one-hot representation
Steps 2โ5: Data Transformation Pipeline
- ๐ถ Grayscale Conversion - Reduce computational complexity by converting RGB to grayscale
- ๐ Normalization - Scale pixel values to
[0, 1]range by dividing by 255 - ๐งฉ Flattening - Reshape data to
(n_samples, 1024)for input layer compatibility - ๐ Shuffling - Randomize data order while maintaining data-label correspondence
Objective: Compute network outputs using forward propagation
- โ Data Selection: 200-sample subset from training data
- โ
Parameter Initialization:
- ๐ฒ Random weight initialization
- 0๏ธโฃ Zero bias initialization
- โ Output Computation: Matrix multiplication + Sigmoid activation ฯ
- โ Model Inference: Class prediction based on maximum activation ๐
- โ Accuracy Assessment: ~25% baseline accuracy (random chance) ๐ฏ
Implementation Note: Leveraged NumPy for efficient matrix operations
- Employed backpropagation to iteratively refine model parameters and minimize prediction error
- โ๏ธ Hyperparameter Tuning: Careful selection of batch size, learning rate, and epochs
- ๐ง Algorithm Implementation: Followed standard pseudo-code for parameter updates
๐ Performance Metrics:
- Model accuracy on 200-sample subset
- Execution time measurement โฑ๏ธ
- Expected performance: ~30% accuracy (accounting for random initialization)
- ๐ Cost Visualization: Plotted average cost reduction per epoch
Implemented vectorized operations to dramatically improve computational efficiency
- ๐ฏ Feedforward Vectorization: Matrix-based implementation
- ๐ Backpropagation Vectorization: Eliminated iterative loops
๐ Enhanced Evaluation:
- Increased to 20 epochs for comprehensive assessment
- Reported final model accuracy and execution time
- Multiple executions to account for performance variability
- Cost trajectory visualization over training
Comprehensive performance assessment using full dataset (4 classes, 8000 samples)
๐๏ธ Training Configuration: Optimized hyperparameters
๐ Evaluation Metrics:
- Training set accuracy ๐
- Test set accuracy ๐
- Comparative performance analysis
- Learning visualization: Average cost reduction over epochs
- โ Built neural network from foundational principles
- โ Implemented efficient data preprocessing pipeline
- โ Demonstrated dramatic performance improvement through vectorization
- โ Achieved measurable accuracy on CIFAR-10 subset
- โ Visualized learning process through cost reduction graphs
๐ฆ CIFAR10-NeuralNetwork
โโโ ๐ README.md # Project documentation
โโโ ๐ data/ # Dataset handling utilities
โโโ ๐ง models/ # Neural network implementation
โโโ ๐ results/ # Performance metrics and visualizations
โโโ ๐ฌ experiments/ # Testing and evaluation scripts
โโโ ๐ requirements.txt # Project dependencies1. Install dependencies:
pip install -r requirements.txt2. Preprocess data:
python data/preprocessing.py3. Train model:
python models/train.py4. Evaluate performance:
python experiments/evaluate.py-
Baseline Accuracy: ~25โ30% (random initialization)
-
Vectorized Speedup: 5โ10x performance improvement
-
Final Accuracy: Measurable improvement over baseline
-
Learning Curve: Consistent cost reduction across epochs
-
Additional layer architectures
-
Alternative activation functions (ReLU, tanh)
-
Regularization techniques (Dropout, L2)
-
Hyperparameter optimization framework
-
Extension to full CIFAR-10 dataset (10 classes)