Skip to main content
POST
/
api
/
v1
/
lightning-large
/
stream
Generate speech from text (Lightning Large)
curl --request POST \
  --url https://waves-api.smallest.ai/api/v1/lightning-large/stream \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "text": "<string>",
  "voice_id": "<string>",
  "sample_rate": 24000,
  "speed": 1,
  "consistency": 0.5,
  "similarity": 0,
  "enhancement": 1,
  "language": "en",
  "output_format": "pcm",
  "pronunciation_dicts": [
    "<string>"
  ]
}'
{
  "data": "event: chunk\ndata: <WAV_DATA>\ndone: false\n"
}

Overview

The Lightning-Large SSE API provides real-time text-to-speech streaming capabilities with high-quality voice synthesis. This API uses Server-Sent Events (SSE) to deliver audio chunks as they’re generated, enabling low-latency audio playback without waiting for the entire audio file to process. For an end-to-end example of how to use the Lightning-Large SSE API, check out Text to Speech (SSE) Example

When to Use

  • Interactive Applications: Perfect for chatbots, virtual assistants, and other applications requiring immediate voice responses
  • Long-Form Content: Efficiently stream audio for articles, stories, or other long-form content without buffering delays
  • Voice User Interfaces: Create natural-sounding voice interfaces with minimal perceived latency
  • Accessibility Solutions: Provide real-time audio versions of written content for users with visual impairments

How It Works

  1. Make a POST Request: Send your text and voice settings to the API endpoint
  2. Receive Audio Chunks: The API processes your text and streams audio back as base64-encoded chunks with 1024 byte size
  3. Process the Stream: Handle the SSE events to decode and play audio chunks sequentially
  4. End of Stream: The API sends a completion event when all audio has been delivered

Authorizations

Authorization
string
header
required

Body

application/json
text
string
required
voice_id
string
required
sample_rate
integer
default:24000
Required range: 8000 <= x <= 24000
speed
number
default:1
Required range: 0.5 <= x <= 2
consistency
number
default:0.5
Required range: 0 <= x <= 1
similarity
number
default:0
Required range: 0 <= x <= 1
enhancement
number
default:1
Required range: 0 <= x <= 2
language
enum<string>
default:en
Available options:
en,
hi
output_format
enum<string>
default:pcm
Available options:
pcm,
mp3,
wav,
mulaw
pronunciation_dicts
string[]

Response