Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Riccardo Passoni1, Francesca Ronchini1, Luca Comanducci1, Romain Serizel2, Fabio Antonacci1
1 Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano, Milan, Italy
2 Université de Lorraine, CNRS, Inria, Loria, Nancy, France
Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generation parameters affect energy consumption at inference time. We also aim to identify an optimal balance between audio quality and energy consumption by considering Pareto-optimal solutions across all selected models. Our findings provide insights into the trade-offs between performance and environmental impact, contributing to the development of more efficient generative audio models.
In order to run the Jupyter notebooks, you need to clone the repo, create a virtual environment, and install the needed packages.
You can create the virtual environment and install the needed packages using conda with the following command:
conda env create -f requirements.yml
Once everything is installed, you can run the Jupyter Notebook following the instruction reported on it and reproduce the results.
The scripts contained in the 'inferences' folder can be run by creating environments specific to the desired model; further information is provided in the folder's README. The 'sanitycheck' folder contains a brief confirmation of the statistical significance of the quality metrics experiment.
For more details: "Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models" (Riccardo Passoni, Francesca Ronchini, Luca Comanducci, Romain Serizel, Fabio Antonacci)
If you use code or comments from this work, please cite:
@INPROCEEDINGS{11230979,
author={Passoni, Riccardo and Ronchini, Francesca and Comanducci, Luca and Serizel, Romain and Antonacci, Fabio},
booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
title={Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Energy consumption;Analytical models;Computer architecture;Signal processing;Diffusion models;Acoustics},
doi={10.1109/WASPAA66052.2025.11230979},
ISSN={1947-1629},
month={Oct}}