Skip to main content

Generate Speech

Synthesize speech (audio) from text. Also called TTS (text-to-speech).

Usage

generateSpeech

generateSpeech API

In standard mode, a text string is passed to the generateSpeech function, along with a SpeechGenerationModel instance, and an audio Uint8Array with mpeg audio data is returned.

import { generateSpeech, lmnt } from "modelfusion";

const speech = await generateSpeech({
model: lmnt.SpeechGenerator({
voice: "034b632b-df71-46c8-b440-86a42ffc3cf3", // Henry
}),
text:
"Good evening, ladies and gentlemen! Exciting news on the airwaves tonight " +
"as The Rolling Stones unveil 'Hackney Diamonds,' their first collection of " +
"fresh tunes in nearly twenty years, featuring the illustrious Lady Gaga, the " +
"magical Stevie Wonder, and the final beats from the late Charlie Watts.",
});

streamSpeech

streamSpeech API

In duplex streaming mode, an AsyncIterable<string> is passed to the streamSpeech function, along with a StreamingSpeechGenerationModel, and an AsyncIterable<Uint8Array> is returned. You can also pass in a string and get streaming audio back.

import { streamSpeech, elevenlabs } from "modelfusion";

const textStream: AsyncIterable<string>;

const speechStream = await streamSpeech({
model: elevenlabs.SpeechGenerator({
model: "eleven_turbo_v2",
voice: "pNInz6obpgDQGcFmaJgB", // Adam
optimizeStreamingLatency: 1,
voiceSettings: { stability: 1, similarityBoost: 0.35 },
generationConfig: {
chunkLengthSchedule: [50, 90, 120, 150, 200],
},
}),
text: textStream,
});

for await (const part of speechStream) {
// each part is a Uint8Array with MP3 audio data
}

Available Providers