Generate Speech

Synthesize speech (audio) from text. Also called TTS (text-to-speech).

Usage

generateSpeech

In standard mode, a text string is passed to the generateSpeech function, along with a SpeechGenerationModel instance, and an audio Uint8Array with mpeg audio data is returned.

import { generateSpeech, lmnt } from "modelfusion";

const speech = await generateSpeech({
  model: lmnt.SpeechGenerator({
    voice: "034b632b-df71-46c8-b440-86a42ffc3cf3", // Henry
  }),
  text:
    "Good evening, ladies and gentlemen! Exciting news on the airwaves tonight " +
    "as The Rolling Stones unveil 'Hackney Diamonds,' their first collection of " +
    "fresh tunes in nearly twenty years, featuring the illustrious Lady Gaga, the " +
    "magical Stevie Wonder, and the final beats from the late Charlie Watts.",
});

streamSpeech

streamSpeech API

In duplex streaming mode, an AsyncIterable<string> is passed to the streamSpeech function, along with a StreamingSpeechGenerationModel, and an AsyncIterable<Uint8Array> is returned. You can also pass in a string and get streaming audio back.

import { streamSpeech, elevenlabs } from "modelfusion";

const textStream: AsyncIterable<string>;

const speechStream = await streamSpeech({
  model: elevenlabs.SpeechGenerator({
    model: "eleven_turbo_v2",
    voice: "pNInz6obpgDQGcFmaJgB", // Adam
    optimizeStreamingLatency: 1,
    voiceSettings: { stability: 1, similarityBoost: 0.35 },
    generationConfig: {
      chunkLengthSchedule: [50, 90, 120, 150, 200],
    },
  }),
  text: textStream,
});

for await (const part of speechStream) {
  // each part is a Uint8Array with MP3 audio data
}

Available Providers

ElevenLabs - Standard mode and duplex streaming mode
LMNT - Standard mode
OpenAI - Standard mode

Generate Speech

Usage​

generateSpeech​

streamSpeech​

Available Providers​

Usage

generateSpeech

streamSpeech

Available Providers