Skip to content

Weekly Report 11/21/2024 #4

@Jim-Hutchinson

Description

@Jim-Hutchinson

I've been exploring performance enhancements for the audio generation model, as this appears to be the least optimized portion at the moment. The LLM image-to-text portion is able to leverage the GPU to compute its result, but the text-to-audio model is currently only able to utilize CPU resources. It does not need the larger memory offered by my CPU, and with a footprint of less than 1GB for the length of text the LLM generates, it will even run decently quickly on integrated graphics.

I have been exploring pytorch-directml as an enhancement for pytorch, which is currently used to run the text-to-audio model. This is proving rather difficult, as the model was not designed to work this way and the documentation is not clear on what I would need to do to swap out the PyTorch version. I have made some progress in the integration, but the current version is not yet runnable. The most recent runnable version is the one before the pytorch-directml work began.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions