Class: TikTokenTokenizer
TikToken tokenizer for OpenAI language models.
See
https://github.com/openai/tiktoken
Example
const tokenizer = new TikTokenTokenizer({ model: "gpt-4" });
const text = "At first, Nox didn't know what to do with the pup.";
const tokenCount = await countTokens(tokenizer, text);
const tokens = await tokenizer.tokenize(text);
const tokensAndTokenTexts = await tokenizer.tokenizeWithTexts(text);
const reconstructedText = await tokenizer.detokenize(tokens);
Implements
Constructors
constructor
• new TikTokenTokenizer(settings
): TikTokenTokenizer
Get a TikToken tokenizer for a specific model or encoding.
Parameters
Name | Type |
---|---|
settings | TikTokenTokenizerSettings |
Returns
Defined in
packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:35
Methods
detokenize
▸ detokenize(tokens
): Promise
<string
>
Asynchronously revert a sequence of numeric tokens back into the original text. Detokenization is the process of transforming tokens back to a human-readable format, and it's essential in scenarios where the output needs to be interpretable or when the tokenization process has to be reversible.
Parameters
Name | Type | Description |
---|---|---|
tokens | number [] | An array of numeric tokens to be converted back to text. |
Returns
Promise
<string
>
A promise containing a string that represents the original text corresponding to the sequence of input tokens.
Implementation of
Defined in
packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:54
tokenize
▸ tokenize(text
): Promise
<number
[]>
Asynchronously tokenize the given text into a sequence of numeric tokens.
Parameters
Name | Type | Description |
---|---|---|
text | string | Input text string that needs to be tokenized. |
Returns
Promise
<number
[]>
A promise containing an array of numbers, where each number is a token representing a part or the whole of the input text.
Implementation of
Defined in
packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:41
tokenizeWithTexts
▸ tokenizeWithTexts(text
): Promise
<{ tokenTexts
: string
[] ; tokens
: number
[] }>
Asynchronously tokenize the given text, providing both the numeric tokens and their corresponding text.
Parameters
Name | Type | Description |
---|---|---|
text | string | Input text string to be tokenized. |
Returns
Promise
<{ tokenTexts
: string
[] ; tokens
: number
[] }>
A promise containing an object with two arrays:
tokens
- An array of numbers where each number is a token.tokenTexts
- An array of strings where each string represents the original text corresponding to each token.
Implementation of
FullTokenizer.tokenizeWithTexts
Defined in
packages/modelfusion/src/model-provider/openai/TikTokenTokenizer.ts:45