Weekly Report 11/21/2024

I've been exploring performance enhancements for the audio generation model, as this appears to be the least optimized portion at the moment. The LLM image-to-text portion is able to leverage the GPU to compute its result, but the text-to-audio model is currently only able to utilize CPU resources. It does not need the larger memory offered by my CPU, and with a footprint of less than 1GB for the length of text the LLM generates, it will even run decently quickly on integrated graphics.

I have been exploring pytorch-directml as an enhancement for pytorch, which is currently used to run the text-to-audio model. This is proving rather difficult, as the model was not designed to work this way and the documentation is not clear on what I would need to do to swap out the PyTorch version. I have made some progress in the integration, but the current version is not yet runnable. The most recent runnable version is the one before the pytorch-directml work began.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly Report 11/21/2024 #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weekly Report 11/21/2024 #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions