-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Hello, I am very happy to finally wait for the demo of UnitySentis. This surprised me. I am a speech recognition algorithm engineer. We often encounter such problems, the inference speed of the model, that is, the problem of real-time performance. Nowadays, very popular large models, such as ChatGPT and whsiper-large-v2. I want to use these models in UnitySentis. We have also tested the onnx model inference speed of Whisper in the Linux environment. Unfortunately, this inference speed is still unbearable for people. So we have two methods. The first is to reduce the size of the model, such as using whisper-small or whisper-base models, but there will still be some loss in effect. We are very happy that we have found an inference library named CTranslate2 (https://github.com/OpenNMT/CTranslate2). We only need to export the model to the format required by ctranslate2, and then with the acceleration of CTranslate2, whisper-large- The real-time rate of v2 has been significantly improved, the RTF value can be reduced to 0.03 (A30 machine), and the running speed under the CPU has also been significantly improved. Moreover, CTranslate2 also supports most of the current mainstream Transformer models, so it would be great if UnitySentis could refer to or integrate CTranslate2 in future versions. :)
I wish UnitySentis will develop better and better :)
Here are some links you may want to use
https://github.com/OpenNMT/CTranslate2
https://opennmt.net/CTranslate2
https://github.com/guillaumekln/faster-whisper/tree/master