Skip to main content


Transcribe text using whisper.cpp. You can run the whisper.cpp server locally or remote.


  1. Install whisper.cpp following the instructions in the whisper.cpp repository.
  2. Start the whisper.cpp server: ./server
  3. (optional): Download larger models and start the server with the --model parameter
  4. (optional): Enable input conversion on the server using the --convert parameter

Without the --convert parameter, the server expects WAV files with 16kHz sample rate and 16-bit PCM encoding. You can use ffmpeg for conversion: ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

Model Functions


Generate Transcription

WhisperCppTranscriptionModel API

import fs from "node:fs";
import { whispercpp, generateTranscription } from "modelfusion";

const transcription = await generateTranscription({
model: whispercpp.Transcriber(),
mimeType: "audio/wav",
audioData: await fs.promises.readFile("data/test.wav"),


API Configuration

Whisper.cpp API Configuration

const api = whispercpp.Api({
baseUrl: {
host: "localhost",
port: "9000",
// ...

const model = whispercpp.Transcriber({
// ...