Transcribe text using whisper.cpp. You can run the whisper.cpp server locally or remote.


  1. Install whisper.cpp following the instructions in the whisper.cpp repository.
  2. Start the whisper.cpp server: ./server
  3. (optional): Download larger models and start the server with the --model parameter
  4. (optional): Enable input conversion on the server using the --convert parameter

Without the --convert parameter, the server expects WAV files with 16kHz sample rate and 16-bit PCM encoding. You can use ffmpeg for conversion: ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

Model Functions


Generate Transcription

WhisperCppTranscriptionModel API

import fs from "node:fs";
import { whispercpp, generateTranscription } from "modelfusion";

const transcription = await generateTranscription({
model: whispercpp.Transcriber(),
mimeType: "audio/wav",
audioData: await fs.promises.readFile("data/test.wav"),


API Configuration

Whisper.cpp API Configuration

const api = whispercpp.Api({
baseUrl: {
host: "localhost",
port: "9000",
// ...

const model = whispercpp.Transcriber({
// ...