Learn how to convert streaming Text to Speech in Realtime.
The TextToAudioStream
class provides real-time text-to-speech (TTS) conversion by streaming text directly into audio output. This feature is particularly useful in applications that require instant feedback, such as voice assistants, live captioning systems, or interactive chatbots, where text is continuously generated and needs to be converted into speech on-the-fly.
This example demonstrates how to stream text from a large language model (LLM) and process it into speech, utilizing the TextToAudioStream
class with both synchronous and asynchronous TTS engines.
In this example, text is generated using an LLM (Groq in this case, you can use any LLM), and the generated text is then passed to a TTS system (Smallest API) for real-time audio synthesis. The audio is saved as a .wav
file. This entire process happens asynchronously to ensure smooth performance, especially when dealing with large or continuous streams of text.
If you are using a voice_id
corresponding to a voice clone, you should explicitly set the model
parameter to "lightning-large"
in the Smallest
client or payload.
If you are using a voice_id
corresponding to a voice clone, you should explicitly set the model
parameter to "lightning-large"
in the Smallest
client or payload.
tts_instance
: The instance of the TTS engine (either Smallest
or AsyncSmallest
) used to generate speech from the text.queue_timeout
: The wait time (in seconds) for new text to be received before attempting to generate speech. Default is 5.0 seconds.max_retries
: The maximum number of retries for failed synthesis attempts. Default is 3.The TextToAudioStream
processor streams raw audio data without WAV headers for better streaming efficiency. These raw audio chunks can be:
.wav
or .mp3
) for later use.This approach allows you to handle continuous streams of text and convert them into real-time speech, making it ideal for interactive applications where immediate audio feedback is crucial.
Learn how to convert streaming Text to Speech in Realtime.
The TextToAudioStream
class provides real-time text-to-speech (TTS) conversion by streaming text directly into audio output. This feature is particularly useful in applications that require instant feedback, such as voice assistants, live captioning systems, or interactive chatbots, where text is continuously generated and needs to be converted into speech on-the-fly.
This example demonstrates how to stream text from a large language model (LLM) and process it into speech, utilizing the TextToAudioStream
class with both synchronous and asynchronous TTS engines.
In this example, text is generated using an LLM (Groq in this case, you can use any LLM), and the generated text is then passed to a TTS system (Smallest API) for real-time audio synthesis. The audio is saved as a .wav
file. This entire process happens asynchronously to ensure smooth performance, especially when dealing with large or continuous streams of text.
If you are using a voice_id
corresponding to a voice clone, you should explicitly set the model
parameter to "lightning-large"
in the Smallest
client or payload.
If you are using a voice_id
corresponding to a voice clone, you should explicitly set the model
parameter to "lightning-large"
in the Smallest
client or payload.
tts_instance
: The instance of the TTS engine (either Smallest
or AsyncSmallest
) used to generate speech from the text.queue_timeout
: The wait time (in seconds) for new text to be received before attempting to generate speech. Default is 5.0 seconds.max_retries
: The maximum number of retries for failed synthesis attempts. Default is 3.The TextToAudioStream
processor streams raw audio data without WAV headers for better streaming efficiency. These raw audio chunks can be:
.wav
or .mp3
) for later use.This approach allows you to handle continuous streams of text and convert them into real-time speech, making it ideal for interactive applications where immediate audio feedback is crucial.