UNet Audio Filter – AI Speech Enhancement

GPU-accelerated speech enhancement using a U-Net model

Overview

This repo provides a complete pipeline to remove background noise from speech using a U-Net model trained on the VoiceBank+DEMAND dataset. It includes a CLI inference tool, a Streamlit app, a training notebook, and a global path system that “just works” after clone.

Repository structure

unet-audiofilter/
├── apps/
│   ├── __init__.py
│   ├── requirements.txt
│   └── streamlit_app.py          # Web UI for enhancement
├── config/
│   ├── __init__.py
│   └── paths.py                  # Auto-detected project paths
├── dataset/                      # Sample/test data and scp files
│   ├── clean_testset_wav/
│   ├── clean_trainset_28spk_wav/
│   ├── noisy_testset_wav/
│   ├── noisy_trainset_28spk_wav/
│   ├── test.scp
│   └── train.scp
├── models/                       # Model checkpoints
│   ├── best_gpu_model.pth
│   ├── checkpoint_epoch_5.pth
│   └── checkpoint_epoch_10.pth
├── notebooks/
│   └── main_training.ipynb       # End-to-end training pipeline
├── presentation/                 # Assets for docs / slides
├── results/                      # Training curves, analysis, samples
├── scripts/
│   ├── __init__.py
│   ├── inference.py              # CLI inference entrypoint
│   ├── setup_environment.py      # Validation / .env generator
│   └── train.py                  # Redirects to notebook
├── src/
│   ├── __init__.py
│   ├── audio_utils.py            # ffmpeg-based audio I/O helpers
│   ├── unet_model.py             # U-Net model + loss
│   └── utils.py                  # Plotting, helpers, metrics
├── tools/
│   ├── __init__.py
│   ├── test_quality.py
│   └── test_system.py
├── config.yaml                   # Default config (dataset/model/training)
├── requirements.txt
├── req-aur.txt                   # Arch/Manjaro package hints
├── run_app.sh                    # Start Streamlit app
├── run_inference.sh              # Run CLI inference
├── run_tests.sh                  # Run system + quality tests
├── setup.py                      # Wrapper to scripts/setup_environment.py
├── LICENSE
└── README.md

Installation

Install ffmpeg and requirements.txt

Quick start

Validate paths (auto-detect root and create missing dirs):

python -c "from config.paths import quick_setup; quick_setup()"

Run tests:

./run_tests.sh

Run inference:

./run_inference.sh input.wav output.wav

Launch the web app:

./run_app.sh

Audio & spectrogram comparison

Spectrograms for a representative sample (p232_010):

Listen to the audio examples:

Noisy vs Enhanced vs Clean (click to expand)

<p><strong>Noisy input</strong></p>
<audio controls>
	<source src="results/comparison/p232_010_noisy.wav" type="audio/wav" />
	Your browser does not support the audio element.
</audio>

<p><strong>Enhanced output</strong></p>
<audio controls>
	<source src="results/comparison/p232_010_enhanced.wav" type="audio/wav" />
	Your browser does not support the audio element.
</audio>

<p><strong>Clean reference</strong></p>
<audio controls>
	<source src="results/comparison/p232_010_clean.wav" type="audio/wav" />
	Your browser does not support the audio element.
</audio>

Inference (CLI)

The CLI uses a GPU if available and expects a pretrained checkpoint at models/best_gpu_model.pth by default.

python scripts/inference.py noisy.wav enhanced.wav --device auto

Notes:

Spectrogram settings: n_fft=1024, hop_length=256.
Processing is done in 4s chunks with 25% overlap and crossfade.
The shipped scripts/app instantiate UNet with base_filters=32, depth=3, dropout=0.1 to match the provided checkpoint.

Streamlit app

Start the app with ./run_app.sh, upload an audio file, and download the enhanced result. The app shows waveform and spectrogram comparisons and basic quality metrics.

Training

Training is provided as a Jupyter notebook for clarity and interactivity:

Open notebooks/main_training.ipynb.
Run cells in sequence; checkpoints are saved under models/.

The helper script scripts/train.py prints directions to the notebook. The config.yaml shows a reference model config (base_filters=64, depth=4, dropout=0.2) you can adopt when retraining.

Requirements

Python 3.9+
PyTorch (CUDA optional but recommended for speed)
ffmpeg (system package) for robust multi-format audio I/O

Hardware guidance:

GPU with ≥4GB VRAM recommended for training (CPU works for inference)
For Arch users, see req-aur.txt for curated package names

Troubleshooting

Model not found: ensure models/best_gpu_model.pth exists or pass --model /path/to/checkpoint.pth.
Import/path errors: set UNET_AUDIOFILTER_ROOT to your repo root and retry quick_setup().
ffmpeg errors: install ffmpeg via your OS package manager and restart the session.
CUDA not used: pass --device cpu to force CPU or ensure CUDA drivers/toolkit are installed for GPU.

Contributing

Pull requests are welcome. Please run ./run_tests.sh before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNet Audio Filter – AI Speech Enhancement

Overview

Repository structure

Installation

Quick start

Audio & spectrogram comparison

Inference (CLI)

Streamlit app

Training

Requirements

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
apps		apps
config		config
dataset		dataset
models		models
notebooks		notebooks
presentation		presentation
results		results
scripts		scripts
src		src
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
req-aur.txt		req-aur.txt
requirements.txt		requirements.txt
run_app.sh		run_app.sh
run_inference.sh		run_inference.sh
run_tests.sh		run_tests.sh
setup.py		setup.py

License

jR4dh3y/unet-audiofilter

Folders and files

Latest commit

History

Repository files navigation

UNet Audio Filter – AI Speech Enhancement

Overview

Repository structure

Installation

Quick start

Audio & spectrogram comparison

Inference (CLI)

Streamlit app

Training

Requirements

Troubleshooting

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages