-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I've been exploring performance enhancements for the audio generation model, as this appears to be the least optimized portion at the moment. The LLM image-to-text portion is able to leverage the GPU to compute its result, but the text-to-audio model is currently only able to utilize CPU resources. It does not need the larger memory offered by my CPU, and with a footprint of less than 1GB for the length of text the LLM generates, it will even run decently quickly on integrated graphics.
I have been exploring pytorch-directml as an enhancement for pytorch, which is currently used to run the text-to-audio model. This is proving rather difficult, as the model was not designed to work this way and the documentation is not clear on what I would need to do to swap out the PyTorch version. I have made some progress in the integration, but the current version is not yet runnable. The most recent runnable version is the one before the pytorch-directml work began.