Generate Speech
Synthesize speech (audio) from text. Also called TTS (text-to-speech).
Usage
generateSpeech
In standard mode, a text string is passed to the generateSpeech
function, along with a SpeechGenerationModel
instance, and an audio Uint8Array with mpeg audio data is returned.
import { generateSpeech, lmnt } from "modelfusion";
const speech = await generateSpeech({
model: lmnt.SpeechGenerator({
voice: "034b632b-df71-46c8-b440-86a42ffc3cf3", // Henry
}),
text:
"Good evening, ladies and gentlemen! Exciting news on the airwaves tonight " +
"as The Rolling Stones unveil 'Hackney Diamonds,' their first collection of " +
"fresh tunes in nearly twenty years, featuring the illustrious Lady Gaga, the " +
"magical Stevie Wonder, and the final beats from the late Charlie Watts.",
});
streamSpeech
In duplex streaming mode, an AsyncIterable<string>
is passed to the streamSpeech
function, along with a StreamingSpeechGenerationModel
, and an AsyncIterable<Uint8Array>
is returned. You can also pass in a string and get streaming audio back.
import { streamSpeech, elevenlabs } from "modelfusion";
const textStream: AsyncIterable<string>;
const speechStream = await streamSpeech({
model: elevenlabs.SpeechGenerator({
model: "eleven_turbo_v2",
voice: "pNInz6obpgDQGcFmaJgB", // Adam
optimizeStreamingLatency: 1,
voiceSettings: { stability: 1, similarityBoost: 0.35 },
generationConfig: {
chunkLengthSchedule: [50, 90, 120, 150, 200],
},
}),
text: textStream,
});
for await (const part of speechStream) {
// each part is a Uint8Array with MP3 audio data
}
Available Providers
- ElevenLabs - Standard mode and duplex streaming mode
- LMNT - Standard mode
- OpenAI - Standard mode