Skip to main content

Class: CohereTokenizer

Tokenizer for the Cohere models. It uses the Co.Tokenize and Co.Detokenize APIs.

See

Example

const tokenizer = new CohereTokenizer({ model: "command" });

const text = "At first, Nox didn't know what to do with the pup.";

const tokenCount = await countTokens(tokenizer, text);
const tokens = await tokenizer.tokenize(text);
const tokensAndTokenTexts = await tokenizer.tokenizeWithTexts(text);
const reconstructedText = await tokenizer.detokenize(tokens);

Implements

Constructors

constructor

new CohereTokenizer(settings): CohereTokenizer

Parameters

NameType
settingsCohereTokenizerSettings

Returns

CohereTokenizer

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:44

Methods

callDeTokenizeAPI

callDeTokenizeAPI(tokens, callOptions?): Promise<{ meta: { api_version: { version: string } } ; text: string }>

Parameters

NameType
tokensnumber[]
callOptions?FunctionCallOptions

Returns

Promise<{ meta: { api_version: { version: string } } ; text: string }>

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:80


callTokenizeAPI

callTokenizeAPI(text, callOptions?): Promise<{ meta: { api_version: { version: string } } ; token_strings: string[] ; tokens: number[] }>

Parameters

NameType
textstring
callOptions?FunctionCallOptions

Returns

Promise<{ meta: { api_version: { version: string } } ; token_strings: string[] ; tokens: number[] }>

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:48


detokenize

detokenize(tokens): Promise<string>

Asynchronously revert a sequence of numeric tokens back into the original text. Detokenization is the process of transforming tokens back to a human-readable format, and it's essential in scenarios where the output needs to be interpretable or when the tokenization process has to be reversible.

Parameters

NameTypeDescription
tokensnumber[]An array of numeric tokens to be converted back to text.

Returns

Promise<string>

A promise containing a string that represents the original text corresponding to the sequence of input tokens.

Implementation of

FullTokenizer.detokenize

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:125


tokenize

tokenize(text): Promise<number[]>

Asynchronously tokenize the given text into a sequence of numeric tokens.

Parameters

NameTypeDescription
textstringInput text string that needs to be tokenized.

Returns

Promise<number[]>

A promise containing an array of numbers, where each number is a token representing a part or the whole of the input text.

Implementation of

FullTokenizer.tokenize

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:112


tokenizeWithTexts

tokenizeWithTexts(text): Promise<{ tokenTexts: string[] = response.token_strings; tokens: number[] = response.tokens }>

Asynchronously tokenize the given text, providing both the numeric tokens and their corresponding text.

Parameters

NameTypeDescription
textstringInput text string to be tokenized.

Returns

Promise<{ tokenTexts: string[] = response.token_strings; tokens: number[] = response.tokens }>

A promise containing an object with two arrays:

  1. tokens - An array of numbers where each number is a token.
  2. tokenTexts - An array of strings where each string represents the original text corresponding to each token.

Implementation of

FullTokenizer.tokenizeWithTexts

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:116

Properties

settings

Readonly settings: CohereTokenizerSettings

Defined in

packages/modelfusion/src/model-provider/cohere/CohereTokenizer.ts:42