The aim of this project is to build a deep learning system that identifies the human voice and isolates the other sources of sound from a single waveform. The application of this technology is extensive, with potential benefits including improved performance of noise-canceling systems and enhanced quality of audio communications.
| Experiment | SDR Metric |
|---|---|
| Bandpass Filter | 7.8 |
| Spectral Gating | 7.2 |
| Sepformer | 10.4 |
| Conv-TasNet | 8.53 |
| Waveform AutoEncoder | -3.1 |
| Spectogram AutoEncoder | 13.77 |
-
Install Conda: If you haven't already, install Conda by following the instructions on the official Conda documentation.
-
Create a New Environment: Open your terminal or command prompt and run the following command to create a new Conda environment. Replace
myenvwith your desired environment name.conda create --name myenv
-
Activate the Environment: Once the environment is created, activate it using the following command:
conda activate myenv
-
Install Dependencies: Once you have your
requirements.txtfile ready, you can install all the dependencies listed in it usingpip. Run the following command in your terminal:pip install -r requirements.txt
The experiments above were run via notebooks. Simply follow the general setup steps, then run the notebooks top-down.
- In
/datawe have all the relevant files for the custom test dataset, our model inferencing results as .csv files, and the .csv files for all of our loss and accuracy curves - In the
/notebookspath we have the notebooks for all of our experimentation. - Our from scratch model can be found in
notebooks/spectogram_autoencoder.ipynbandnotebooks/autoencoder.ipynb - Our non-ml methods can be found in
notebooks/Python_Noise_Reduction_Methods.ipynb - Our experiments for inferencing can be found in
notebooks/pretained_sound_separation.ipynbandnotebooks/sepformer.ipynb - Our attempt at transfer learning on demucs can be found in
notebook/finetuning-demucs.ipynb