Skip to main content

Class: TikTokenTokenizer

TikToken tokenizer for OpenAI language models.

See

https://github.com/openai/tiktoken

Example

const tokenizer = new TikTokenTokenizer({ model: "gpt-4" });

const text = "At first, Nox didn't know what to do with the pup.";

const tokenCount = await countTokens(tokenizer, text);
const tokens = await tokenizer.tokenize(text);
const tokensAndTokenTexts = await tokenizer.tokenizeWithTexts(text);
const reconstructedText = await tokenizer.detokenize(tokens);

Implements

Constructors

constructor

new TikTokenTokenizer(settings): TikTokenTokenizer

Get a TikToken tokenizer for a specific model or encoding.

Parameters

NameType
settingsTikTokenTokenizerSettings

Returns

TikTokenTokenizer

Defined in

packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:35

Methods

detokenize

detokenize(tokens): Promise<string>

Asynchronously revert a sequence of numeric tokens back into the original text. Detokenization is the process of transforming tokens back to a human-readable format, and it's essential in scenarios where the output needs to be interpretable or when the tokenization process has to be reversible.

Parameters

NameTypeDescription
tokensnumber[]An array of numeric tokens to be converted back to text.

Returns

Promise<string>

A promise containing a string that represents the original text corresponding to the sequence of input tokens.

Implementation of

FullTokenizer.detokenize

Defined in

packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:54


tokenize

tokenize(text): Promise<number[]>

Asynchronously tokenize the given text into a sequence of numeric tokens.

Parameters

NameTypeDescription
textstringInput text string that needs to be tokenized.

Returns

Promise<number[]>

A promise containing an array of numbers, where each number is a token representing a part or the whole of the input text.

Implementation of

FullTokenizer.tokenize

Defined in

packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:41


tokenizeWithTexts

tokenizeWithTexts(text): Promise<{ tokenTexts: string[] ; tokens: number[] }>

Asynchronously tokenize the given text, providing both the numeric tokens and their corresponding text.

Parameters

NameTypeDescription
textstringInput text string to be tokenized.

Returns

Promise<{ tokenTexts: string[] ; tokens: number[] }>

A promise containing an object with two arrays:

  1. tokens - An array of numbers where each number is a token.
  2. tokenTexts - An array of strings where each string represents the original text corresponding to each token.

Implementation of

FullTokenizer.tokenizeWithTexts

Defined in

packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:45