Skip to main content

Interface: FullTokenizer

Interface for a comprehensive tokenizer that extends the basic tokenization capabilities.

In addition to basic tokenization, this interface provides methods for detokenization and retrieving the original text corresponding to each token, enabling a more informative and reversible transformation process.

Hierarchy

Implemented by

Properties

detokenize

detokenize: (tokens: number[]) => PromiseLike<string>

Asynchronously revert a sequence of numeric tokens back into the original text. Detokenization is the process of transforming tokens back to a human-readable format, and it's essential in scenarios where the output needs to be interpretable or when the tokenization process has to be reversible.

Param

An array of numeric tokens to be converted back to text.

Type declaration

▸ (tokens): PromiseLike<string>

Asynchronously revert a sequence of numeric tokens back into the original text. Detokenization is the process of transforming tokens back to a human-readable format, and it's essential in scenarios where the output needs to be interpretable or when the tokenization process has to be reversible.

Parameters
NameTypeDescription
tokensnumber[]An array of numeric tokens to be converted back to text.
Returns

PromiseLike<string>

A promise containing a string that represents the original text corresponding to the sequence of input tokens.

Defined in

packages/modelfusion/src/model-function/tokenize-text/Tokenizer.ts:44


tokenize

tokenize: (text: string) => PromiseLike<number[]>

Asynchronously tokenize the given text into a sequence of numeric tokens.

Param

Input text string that needs to be tokenized.

Type declaration

▸ (text): PromiseLike<number[]>

Asynchronously tokenize the given text into a sequence of numeric tokens.

Parameters
NameTypeDescription
textstringInput text string that needs to be tokenized.
Returns

PromiseLike<number[]>

A promise containing an array of numbers, where each number is a token representing a part or the whole of the input text.

Inherited from

BasicTokenizer.tokenize

Defined in

packages/modelfusion/src/model-function/tokenize-text/Tokenizer.ts:13


tokenizeWithTexts

tokenizeWithTexts: (text: string) => PromiseLike<{ tokenTexts: string[] ; tokens: number[] }>

Asynchronously tokenize the given text, providing both the numeric tokens and their corresponding text.

Param

Input text string to be tokenized.

Type declaration

▸ (text): PromiseLike<{ tokenTexts: string[] ; tokens: number[] }>

Asynchronously tokenize the given text, providing both the numeric tokens and their corresponding text.

Parameters
NameTypeDescription
textstringInput text string to be tokenized.
Returns

PromiseLike<{ tokenTexts: string[] ; tokens: number[] }>

A promise containing an object with two arrays:

  1. tokens - An array of numbers where each number is a token.
  2. tokenTexts - An array of strings where each string represents the original text corresponding to each token.

Defined in

packages/modelfusion/src/model-function/tokenize-text/Tokenizer.ts:31