edgetts is a golang module that allows you to use Microsoft Edge's online text-to-speech service in your golang projects.
To install it, run the following command:
go get github.com/kolonist/edgetts@latestpackage main
import (
"context"
"github.com/kolonist/edgetts"
)
func main() {
args := edgetts.Args{
// set voice to use in speech synthesys
Voice: "en-US-AlloyTurboMultilingualNeural",
}
// generate sound in mp3 format and save it to the file
err := edgetts.
New(args).
Speak("Text I need to speak now").
SaveToFile(context.TODO(), "./sample.mp3", edgetts.OutputFormatMp3)
}package main
import (
"context"
"github.com/kolonist/edgetts"
)
func main() {
args := edgetts.Args{
// set voice to use in speech synthesys
Voice: "en-US-AlloyTurboMultilingualNeural",
// speak 15% faster
Rate: "+15%",
// reduce the volume by 10%
Volume: "-10%",
}
// set synthesys args and text
speaker := edgetts.
New(args).
Speak("Text I need to speak now")
// generate sound in raw PCM format with 22050 Hz sample rate and get sound chunks in iterator
for data, err := range speaker.GetSoundIter(context.TODO(), edgetts.OutputFormatRaw22050) {
// data is `[]byte`
// you can use this method to stream audio as it came from Edge TTS server
}
// get info about timings of all words appearance and its duration in generated speech
// note that you should call `GetMetadata()` after one of `GetSoundIter()`, `GetSound()` or `SaveToFile()`
metadata, err := speaker.GetMetadata()
for _, word := range metadata {
// `Offset` and `Duration` are in milleseconds
fmt.Printf("%d: %s - %d\n", word.Offset, word.Text, word.Duration)
}
}package main
import (
"context"
"github.com/kolonist/edgetts"
)
func main() {
// get ;list of all available voices
voices, err := edgetts.ListVoices(context.TODO())
for i, v := range voices {
// you should use `ShortName` in your speech generation Args
fmt.Printf("ShortName: %s, Gender: %s", v.ShortName, v.Gender)
}
}You can find more complex example in /examples folder.
OutputFormatMp3— mp3 24khz, 48k bitrate (default)OutputFormatWebm— webm 24khz, 16bit, 24k bitrateOutputFormatOgg— ogg 24khz, 16bitOutputFormatRaw22050— raw PCM 22050 hz, 16bitOutputFormatRaw44100— raw PCM 44100 hz, 16bit
Arguments to use in edgetts.New(args edgetts.Args) function.
Voice string— Voice.
Has formaten-US-AlloyTurboMultilingualNeural, where:en-US— localeAlloyTurbo- actual voice name
Volume string— Sound volume in percent. Can increase (+10%) or decrease (-20%) volumeRate string— Speech rate in percent. Can increase (+30%) or decrease (-40%) rate
Struct to manage speech synthesys
Create EdgeTTS struct
Assign text you need to synthesize.
Assign text you need to synthesize with defined voice. Can be helpful if you need multiple generations with different voices.
Used to synthesyze speech
Get sound data as byte buffers in iterator
Get whole downloaded sound file as byte buffer
Save to file generated sound
Get metadata of generated speech. Should be called after one of GetSoundIter(), GetSound() or SaveToFile()
Contains time of each word start and its pronunciation duration im milliseconds
Offset int— Start time of word in generated sound in millisecondsDuration int— Duration of word pronunciation in millisecondsText string— Word
Voice used for speech synthesys
Get list of all possible voices
Name string— Voice full nameShortName string— Voice short name. You should use this value when specifying voice in this library functionsGender string— Speaker gender,MaleofFemaleLocale string— Locale, e.g.en-USSuggestedCodec string— always emptyFriendlyName string— always emptyStatus string— Can beGAfor General Availability orPreviewVoiceTag.ContentCategories []string— always emptyVoiceTag.VoicePersonalities []string— Vocal characteristics of voice
For speech synthesys you need only ShortName field
I used the following projects as sources of inspiration:
- https://github.com/rany2/edge-tts (similar library for Python)
- https://github.com/surfaceyu/edge-tts-go (Python library rewritten in Go but not working now)
@license MIT
@author Alexander Zubakov developer@xinit.ru
