GPU-accelerated speech enhancement using a U-Net model
This repo provides a complete pipeline to remove background noise from speech using a U-Net model trained on the VoiceBank+DEMAND dataset. It includes a CLI inference tool, a Streamlit app, a training notebook, and a global path system that “just works” after clone.
unet-audiofilter/
├── apps/
│ ├── __init__.py
│ ├── requirements.txt
│ └── streamlit_app.py # Web UI for enhancement
├── config/
│ ├── __init__.py
│ └── paths.py # Auto-detected project paths
├── dataset/ # Sample/test data and scp files
│ ├── clean_testset_wav/
│ ├── clean_trainset_28spk_wav/
│ ├── noisy_testset_wav/
│ ├── noisy_trainset_28spk_wav/
│ ├── test.scp
│ └── train.scp
├── models/ # Model checkpoints
│ ├── best_gpu_model.pth
│ ├── checkpoint_epoch_5.pth
│ └── checkpoint_epoch_10.pth
├── notebooks/
│ └── main_training.ipynb # End-to-end training pipeline
├── presentation/ # Assets for docs / slides
├── results/ # Training curves, analysis, samples
├── scripts/
│ ├── __init__.py
│ ├── inference.py # CLI inference entrypoint
│ ├── setup_environment.py # Validation / .env generator
│ └── train.py # Redirects to notebook
├── src/
│ ├── __init__.py
│ ├── audio_utils.py # ffmpeg-based audio I/O helpers
│ ├── unet_model.py # U-Net model + loss
│ └── utils.py # Plotting, helpers, metrics
├── tools/
│ ├── __init__.py
│ ├── test_quality.py
│ └── test_system.py
├── config.yaml # Default config (dataset/model/training)
├── requirements.txt
├── req-aur.txt # Arch/Manjaro package hints
├── run_app.sh # Start Streamlit app
├── run_inference.sh # Run CLI inference
├── run_tests.sh # Run system + quality tests
├── setup.py # Wrapper to scripts/setup_environment.py
├── LICENSE
└── README.md
Install ffmpeg and requirements.txt
Validate paths (auto-detect root and create missing dirs):
python -c "from config.paths import quick_setup; quick_setup()"Run tests:
./run_tests.shRun inference:
./run_inference.sh input.wav output.wavLaunch the web app:
./run_app.shSpectrograms for a representative sample (p232_010):
Listen to the audio examples:
Noisy vs Enhanced vs Clean (click to expand)
<p><strong>Noisy input</strong></p>
<audio controls>
<source src="results/comparison/p232_010_noisy.wav" type="audio/wav" />
Your browser does not support the audio element.
</audio>
<p><strong>Enhanced output</strong></p>
<audio controls>
<source src="results/comparison/p232_010_enhanced.wav" type="audio/wav" />
Your browser does not support the audio element.
</audio>
<p><strong>Clean reference</strong></p>
<audio controls>
<source src="results/comparison/p232_010_clean.wav" type="audio/wav" />
Your browser does not support the audio element.
</audio>
The CLI uses a GPU if available and expects a pretrained checkpoint at models/best_gpu_model.pth by default.
python scripts/inference.py noisy.wav enhanced.wav --device autoNotes:
- Spectrogram settings: n_fft=1024, hop_length=256.
- Processing is done in 4s chunks with 25% overlap and crossfade.
- The shipped scripts/app instantiate UNet with base_filters=32, depth=3, dropout=0.1 to match the provided checkpoint.
Start the app with ./run_app.sh, upload an audio file, and download the enhanced result. The app shows waveform and spectrogram comparisons and basic quality metrics.
Training is provided as a Jupyter notebook for clarity and interactivity:
- Open
notebooks/main_training.ipynb. - Run cells in sequence; checkpoints are saved under
models/.
The helper script scripts/train.py prints directions to the notebook. The config.yaml shows a reference model config (base_filters=64, depth=4, dropout=0.2) you can adopt when retraining.
- Python 3.9+
- PyTorch (CUDA optional but recommended for speed)
- ffmpeg (system package) for robust multi-format audio I/O
Hardware guidance:
- GPU with ≥4GB VRAM recommended for training (CPU works for inference)
- For Arch users, see
req-aur.txtfor curated package names
- Model not found: ensure
models/best_gpu_model.pthexists or pass--model /path/to/checkpoint.pth. - Import/path errors: set
UNET_AUDIOFILTER_ROOTto your repo root and retryquick_setup(). - ffmpeg errors: install
ffmpegvia your OS package manager and restart the session. - CUDA not used: pass
--device cputo force CPU or ensure CUDA drivers/toolkit are installed for GPU.
Pull requests are welcome. Please run ./run_tests.sh before submitting.
MIT License © 2025 Radhey Kalra
