🚀 Feature Description
When generating audio stream, possible to also return the word(or token) for current chunk? or more precisely, is it possible to know the start time and end time in the stream for each token?
Additional context
Source code is here