Skip to main content

3 posts tagged with "chatbot"

View All Tags

· 8 min read
Lars Grammel

In this blog post, we'll build a Next.js chatbot that runs on your computer. We'll use Llama.cpp to serve the OpenHermes 2.5 Mistral LLM (large language model) locally, the Vercel AI SDK to handle stream forwarding and rendering, and ModelFusion to integrate Llama.cpp with the Vercel AI SDK. The chatbot will be able to generate responses to user messages in real-time.

The architecture looks like this:

You can find a full Next.js, Vercel AI SDK, Llama.cpp & ModelFusion starter with more examples here: github/com/lgrammel/modelfusion-Llamacpp-nextjs-starter

This blog post explains step by step how to build the chatbot. Let's get started!

Setup Llama.cpp

The first step to getting started with our local chatbot is to setup Llama.cpp.

Llama.cpp is an LLM (large language model) inference engine implemented in C++ that allows us to run LLMs like OpenHermes 2.5 Mistral on your machine. This is crucial for our chatbot as it forms the backbone of its AI capabilities.

Step 1: Build Llama.cpp

Llama.cpp requires you to clone the repository and build it on your machine. Please follow the instructions on the Llama.cpp README:

  1. Open your terminal or command prompt.

  2. Clone the repository:

    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
  3. Compile llama.cpp:

    1. Linux/Mac: Run make
    2. Windows or other setups: Please follow the instructions on the Llama.cpp README.

Step 2: Downloading OpenHermes 2.5 Mistral GGUF

Once Llama.cpp is ready, you'll need to pull the specific LLM we will be using for this project, OpenHermes 2.5 Mistral.

  1. Download the OpenHermes 2.5 Mistral model from HuggingFace. I'll use openhermes-2.5-mistral-7b.Q4_K_M.gguf in this tutorial.

  2. Move the model file into the models/ directory of your local Llama.cpp repository.

Llama.cpp runs LLMs in a format called GGUF (GPT-Generated Unified Format). You can find many GGUF models on HuggingFace. 4-bit quantized models that fit in your machine's memory, e.g. 7B param models on a 8GB or 16GB machine, are usually the best models to run.

info

Quantization involves reducing the precision of the numerical values representing the model's weights, often from 32-bit floating points to lower precision formats like 4-bit. This decreases the model's memory footprint and computational requirements.

Step 3: Start the Llama.cpp Server

You can now start the Llama.cpp server by running the following command in your terminal (Mac/Linux):

./server -m models/openhermes-2.5-mistral-7b.Q4_K_M.gguf

After completing these steps, your system is running a Llama.cpp server with the OpenHermes 2.5 Mistral model, ready to be integrated into our Next.js chatbot.

Creating the Next.js Project

The next step is to create the foundational structure of our chatbot using Next.js. Next.js will be used to build our chatbot application's frontend and API routes.

Here are the steps to create the Next.js project:

  1. Execute the following command in your terminal to create a new Next.js project:

    npx create-next-app@latest llamacpp-nextjs-chatbot
  2. You will be prompted to configure various aspects of your Next.js application. Here are the settings for our chatbot project:

    Would you like to use TypeScript? Yes
    Would you like to use ESLint? Yes
    Would you like to use Tailwind CSS? Yes
    Would you like to use `src/` directory? Yes
    Would you like to use App Router? (recommended) Yes
    Would you like to customize the default import alias? No

    These settings enable TypeScript for robust type-checking, ESLint for code quality, and Tailwind CSS for styling. Using the src/ directory and App Router enhances the project structure and routing capabilities.

  3. Once the project is initialized, navigate to the project directory:

    cd llamacpp-nextjs-chatbot

By following these steps, you have successfully created and configured your Next.js project. This forms the base of our chatbot application, where we will later integrate the AI functionalities using Llama.cpp and ModelFusion. The next part of the tutorial will guide you through installing additional libraries and setting up the backend logic for the chatbot.

tip

You can verify your setup by running npm run dev in your terminal and navigating to http://localhost:3000 in your browser. You should see the default Next.js page.

Installing the Required Libraries

We will use several libraries to build our chatbot. Here is an overview of the libraries we will use:

  • Vercel AI SDK: The Vercel AI SDK provides React hooks for creating chats (useChat) as well as streams that forward AI responses to the frontend (StreamingTextResponse).
  • ModelFusion: ModelFusion is a library for building multi-modal AI applications that I've been working on. It provides a streamText function that calls AI models and returns a streaming response. ModelFusion also contains a Llama.cpp integration that we will use to access the OpenHermes 2.5 Mistral model.
  • ModelFusion Vercel AI SDK Integration: The @modelfusion/vercel-ai integration provides a ModelFusionTextStream that adapts ModelFusion's text streaming to the Vercel AI SDK's streaming response.

You can run the following command in the chatbot project directory to install all libraries:

npm install --save ai modelfusion @modelfusion/vercel-ai

You have now installed all the libraries required for building the chatbot. The next section of the tutorial will guide you through creating an API route for handling chat interactions.

Creating an API Route for the Chatbot

Creating the API route for the Next.js app router is the next step in building our chatbot. The API route will handle the chat interactions between the user and the AI.

Create the api/chat/ directory in src/app/ directory of your project and create a new file named route.ts to serve as our API route file.

The API route requires several important imports from the ai, modelfusion, and @modelfusion/vercel-ai libraries. These imports bring in necessary classes and functions for streaming AI responses and processing chat messages.

import { ModelFusionTextStream, asChatMessages } from "@modelfusion/vercel-ai";
import { Message, StreamingTextResponse } from "ai";
import { llamacpp, streamText } from "modelfusion";

We will use the edge runtime:

export const runtime = "edge";

The route itself is a POST request that takes a list of messages as input:

export async function POST(req: Request) {
// useChat will send a JSON with a messages property:
const { messages }: { messages: Message[] } = await req.json();

// ...
}

We initialize a ModelFusion text generation model for calling the Llama.cpp chat API with the OpenHermes 2.5 Mistral model. The .withChatPrompt() method creates an adapted model for chat prompts:

const model = llamacpp
.CompletionTextGenerator({
promptTemplate: llamacpp.prompt.ChatML, // OpenHermes uses the ChatML prompt format
temperature: 0,
cachePrompt: true, // Cache previous processing for fast responses
maxGenerationTokens: 1024, // Room for answer
})
.withChatPrompt();

Next, we create a ModelFusion chat prompt from the AI SDK messages:

const prompt = {
system: "You are an AI chatbot. Follow the user's instructions carefully.",

// map Vercel AI SDK Message to ModelFusion ChatMessage:
messages: asChatMessages(messages),
};

The asChatMessages helper converts the messages from the Vercel AI SDK to ModelFusion chat messages.

With the prompt and the model, you can then use ModelFusion to call Llama.cpp and generate a streaming response:

const textStream = await streamText({ model, prompt });

Finally you can return the streaming text response with the Vercel AI SDK. The ModelFusionTextStream adapts ModelFusion's streaming response to the Vercel AI SDK's streaming response:

// Return the result using the Vercel AI SDK:
return new StreamingTextResponse(ModelFusionTextStream(textStream));

Adding the Chat Interface

We need to create a dedicated chat page to bring our chatbot to life on the frontend. This page will be located at src/app/page.tsx and will leverage the useChat hook from the Vercel AI SDK. The useChat hook calls the /api/chat route and processes the streaming response as an array of messages, rendering each token as it arrives.

// src/app/page.tsx
"use client";

import { useChat } from "ai/react";

export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();

return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
{messages.map((message) => (
<div
key={message.id}
className="whitespace-pre-wrap"
style={{ color: message.role === "user" ? "black" : "green" }}
>
<strong>{`${message.role}: `}</strong>
{message.content}
<br />
<br />
</div>
))}

<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
);
}

It's important to clean up the global styles for a more visually appealing chat interface. By default, the Next.js page is dark. We clean up src/app/globals.css to make it readable:

@tailwind base;
@tailwind components;
@tailwind utilities;

Running the Chatbot Application

With the chat page in place, it's time to run our chatbot app and see the result of our hard work.

You can launch the development server by running the following command in your terminal:

npm run dev

You can now navigate to http://localhost:3000 in your browser to see the chat page. You can interact with the chatbot by typing messages into the input field. The chatbot will respond to your messages in real-time.

Below is a screenshot of what you can expect your chatbot interface to look like when you run the application:

Conclusion

And there you have it—a fully functional local chatbot built with Next.js, Llama.cpp, and ModelFusion at your fingertips. We've traversed the path from setting up our development environment, integrating a robust language model, and spinning up a user-friendly chat interface.

The code is intended as a starting point for your projects. Have fun exploring!

· 8 min read
Lars Grammel

In this blog post, we'll build a Next.js chatbot that runs on your computer. We'll use Ollama to serve the OpenHermes 2.5 Mistral LLM (large language model) locally, the Vercel AI SDK to handle stream forwarding and rendering, and ModelFusion to integrate Ollama with the Vercel AI SDK. The chatbot will be able to generate responses to user messages in real-time.

The architecture looks like this:

You can find a full Next.js, Vercel AI SDK, Ollama & ModelFusion starter with more examples here: github/com/lgrammel/modelfusion-ollama-nextjs-starter

This blog post explains step by step how to build the chatbot. Let's get started!

Installing Ollama

The first step to getting started with our local chatbot is installing Ollama. Ollama is a versatile platform that allows us to run LLMs like OpenHermes 2.5 Mistral on your machine. This is crucial for our chatbot as it forms the backbone of its AI capabilities.

Step 1: Download Ollama

  1. Visit the official Ollama website.
  2. Follow the instructions provided on the site to download and install Ollama on your machine.

Step 2: Pulling OpenHermes 2.5 Mistral

Once Ollama is installed, you'll need to pull the specific LLM we will be using for this project, OpenHermes 2.5 Mistral. As of November 2023, it is one of the best open-source LLMs in the 7B parameter class. You need at least a MacBook M1 with 8GB of RAM or a similarly compatible computer to run it.

  1. Open your terminal or command prompt.
  2. Run the following command:
    ollama pull openhermes2.5-mistral

This command will download the LLM and store it on your machine. You can now use it to generate text.

tip

You can find the best-performing open-source LLMs on the HuggingFace Open LLM Leaderboard. They are ranked using a mix of benchmarks and grouped into different parameter classes so you can choose the best LLM for your machine. Many of the LLMs on the leaderboard are available on Ollama.

After completing these steps, your system is equipped with Ollama and the OpenHermes 2.5 Mistral model, ready to be integrated into our Next.js chatbot.

Creating the Next.js Project

The next step is to create the foundational structure of our chatbot using Next.js. Next.js will be used to build our chatbot application's frontend and API routes.

Here are the steps to create the Next.js project:

  1. Execute the following command in your terminal to create a new Next.js project:

    npx create-next-app@latest ollama-nextjs-chatbot
  2. You will be prompted to configure various aspects of your Next.js application. Here are the settings for our chatbot project:

    Would you like to use TypeScript? Yes
    Would you like to use ESLint? Yes
    Would you like to use Tailwind CSS? Yes
    Would you like to use `src/` directory? Yes
    Would you like to use App Router? (recommended) Yes
    Would you like to customize the default import alias? No

    These settings enable TypeScript for robust type-checking, ESLint for code quality, and Tailwind CSS for styling. Using the src/ directory and App Router enhances the project structure and routing capabilities.

  3. Once the project is initialized, navigate to the project directory:

    cd ollama-nextjs-chatbot

By following these steps, you have successfully created and configured your Next.js project. This forms the base of our chatbot application, where we will later integrate the AI functionalities using Ollama and ModelFusion. The next part of the tutorial will guide you through installing additional libraries and setting up the backend logic for the chatbot.

tip

You can verify your setup by running npm run dev in your terminal and navigating to http://localhost:3000 in your browser. You should see the default Next.js page.

Installing the Required Libraries

We will use several libraries to build our chatbot. Here is an overview of the libraries we will use:

  • Vercel AI SDK: The Vercel AI SDK provides React hooks for creating chats (useChat) as well as streams that forward AI responses to the frontend (StreamingTextResponse).
  • ModelFusion: ModelFusion is a library for building multi-modal AI applications that I've been working on. It provides a streamText function that calls AI models and returns a streaming response. ModelFusion also contains an Ollama integration that we will use to access the OpenHermes 2.5 Mistral model.
  • ModelFusion Vercel AI SDK Integration: The @modelfusion/vercel-ai integration provides a ModelFusionTextStream that adapts ModelFusion's text streaming to the Vercel AI SDK's streaming response.

You can run the following command in the chatbot project directory to install all libraries:

npm install --save ai modelfusion @modelfusion/vercel-ai

You have now installed all the libraries required for building the chatbot. The next section of the tutorial will guide you through creating an API route for handling chat interactions.

Creating an API Route for the Chatbot

Creating the API route for the Next.js app router is the next step in building our chatbot. The API route will handle the chat interactions between the user and the AI.

Create the api/chat/ directory in src/app/ directory of your project and create a new file named route.ts to serve as our API route file.

The API route requires several important imports from the ai, modelfusion, and @modelfusion/vercel-ai libraries. These imports bring in necessary classes and functions for streaming AI responses and processing chat messages.

import { ModelFusionTextStream, asChatMessages } from "@modelfusion/vercel-ai";
import { Message, StreamingTextResponse } from "ai";
import { ollama, streamText } from "modelfusion";

We will use the edge runtime:

export const runtime = "edge";

The route itself is a POST request that takes a list of messages as input:

export async function POST(req: Request) {
// useChat will send a JSON with a messages property:
const { messages }: { messages: Message[] } = await req.json();

// ...
}

We initialize a ModelFusion text generation model for calling the Ollama chat API with the OpenHermes 2.5 Mistral model. The .withChatPrompt() method creates an adapted model for chat prompts:

const model = ollama
.ChatTextGenerator({ model: "openhermes2.5-mistral" })
.withChatPrompt();

Next, we create a ModelFusion chat prompt from the AI SDK messages:

const prompt = {
system: "You are an AI chatbot. Follow the user's instructions carefully.",

// map Vercel AI SDK Message to ModelFusion ChatMessage:
messages: asChatMessages(messages),
};

The asChatMessages helper converts the messages from the Vercel AI SDK to ModelFusion chat messages.

With the prompt and the model, you can then use ModelFusion to call Ollama and generate a streaming response:

const textStream = await streamText({ model, prompt });

Finally you can return the streaming text response with the Vercel AI SDK. The ModelFusionTextStream adapts ModelFusion's streaming response to the Vercel AI SDK's streaming response:

// Return the result using the Vercel AI SDK:
return new StreamingTextResponse(ModelFusionTextStream(textStream));

Adding the Chat Interface

We need to create a dedicated chat page to bring our chatbot to life on the frontend. This page will be located at src/app/page.tsx and will leverage the useChat hook from the Vercel AI SDK. The useChat hook calls the /api/chat route and processes the streaming response as an array of messages, rendering each token as it arrives.

// src/app/page.tsx
"use client";

import { useChat } from "ai/react";

export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();

return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
{messages.map((message) => (
<div
key={message.id}
className="whitespace-pre-wrap"
style={{ color: message.role === "user" ? "black" : "green" }}
>
<strong>{`${message.role}: `}</strong>
{message.content}
<br />
<br />
</div>
))}

<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
);
}

It's important to clean up the global styles for a more visually appealing chat interface. By default, the Next.js page is dark. We clean up src/app/globals.css to make it readable:

@tailwind base;
@tailwind components;
@tailwind utilities;

Running the Chatbot Application

With the chat page in place, it's time to run our chatbot app and see the result of our hard work.

You can launch the development server by running the following command in your terminal:

npm run dev

You can now navigate to http://localhost:3000 in your browser to see the chat page. You can interact with the chatbot by typing messages into the input field. The chatbot will respond to your messages in real-time.

Below is a screenshot of what you can expect your chatbot interface to look like when you run the application:

Conclusion

And there you have it—a fully functional local chatbot built with Next.js, Ollama, and ModelFusion at your fingertips. We've traversed the path from setting up our development environment, integrating a robust language model, and spinning up a user-friendly chat interface.

The code is intended as a starting point for your projects. Have fun exploring!

· 10 min read
Lars Grammel

Have you ever wondered how a chatbot that can answer questions about a PDF works?

In this blog post, we'll build a console app capable of searching and understanding PDF content to answer questions using Node.js, OpenAI, and ModelFusion. You'll learn how to read and index PDFs for efficient search and deliver precise responses by retrieving relevant content from the PDFs.

You can find the complete code for the chatbot here: github/com/lgrammel/modelfusion/examples/pdf-chat-terminal

This blog post explains the essential parts in detail. Let's get started!

Loading Pages from PDFs

We use Mozilla's PDF.js via the pdfjs-dist NPM module to load pages from a PDF file. The loadPdfPages function reads the PDF file and extracts its content. It returns an array where each object contains the page number and the text of that page.

import fs from "fs/promises";
import * as PdfJs from "pdfjs-dist/legacy/build/pdf";

async function loadPdfPages(path: string) {
const pdfData = await fs.readFile(path);

const pdf = await PdfJs.getDocument({
data: new Uint8Array(
pdfData.buffer,
pdfData.byteOffset,
pdfData.byteLength
),
useSystemFonts: true,
}).promise;

const pageTexts: Array<{
pageNumber: number;
text: string;
}> = [];

for (let i = 0; i < pdf.numPages; i++) {
const page = await pdf.getPage(i + 1);
const pageContent = await page.getTextContent();

pageTexts.push({
pageNumber: i + 1,
text: pageContent.items
.filter((item) => (item as any).str != null)
.map((item) => (item as any).str as string)
.join(" ")
.replace(/\s+/g, " "),
});
}

return pageTexts;
}

Let's explore the primary tasks: "Load & Parse PDF" and "Extract Page Numbers and Text."

Load & parse the PDF

Before working with the PDF content, we need to read the file from the disk and parse it into a format our code can understand.

const pdfData = await fs.readFile(path);

const pdf = await PdfJs.getDocument({
data: new Uint8Array(pdfData.buffer, pdfData.byteOffset, pdfData.byteLength),
useSystemFonts: true,
}).promise;

In this code snippet, the fs.readFile function reads the PDF file from the disk and stores the data in pdfData. We then use the PdfJs.getDocument function to parse this data. The flag useSystemFonts is set to true to avoid issues when system fonts are used in the PDF.

Extract page numbers and text

After successfully loading and parsing the PDF, the next step is to extract the text content from each page along with its page number.

const pageTexts: Array<{
pageNumber: number;
text: string;
}> = [];

for (let i = 0; i < pdf.numPages; i++) {
const page = await pdf.getPage(i + 1);
const pageContent = await page.getTextContent();

pageTexts.push({
pageNumber: i + 1,
text: pageContent.items
.filter((item) => (item as any).str != null)
.map((item) => (item as any).str as string)
.join(" ")
.replace(/\s+/g, " "),
}

The code defines an array named pageTexts to hold objects that contain the page number and the extracted text from each page. We then loop through each page of the PDF by using pdf.numPages to determine the total number of pages.

Within the loop, pdf.getPage(i + 1) fetches each page, starting from page number 1. We extract the text content with page.getTextContent().

Finally, the extracted text from each page is cleaned up by joining all text items and reducing multiple whitespaces to a single space. This cleaned-up text and the page number are stored in pageTexts.

Indexing Pages

Now that the PDF pages are available as text, we'll delve into the mechanism for indexing the PDF text we've loaded. Indexing is crucial as it allows for quick and semantic-based retrieval of information later. Here's how the magic happens:

const pages = await loadPdfPages(file);

const embeddingModel = openai.TextEmbedder({
model: "text-embedding-ada-002",
throttle: throttleMaxConcurrency({ maxConcurrentCalls: 5 }),
});

const chunks = await splitTextChunks(
splitAtToken({
maxTokensPerChunk: 256,
tokenizer: embeddingModel.tokenizer,
}),
pages
);

const vectorIndex = new MemoryVectorIndex<{
pageNumber: number;
text: string;
}>();

await upsertIntoVectorIndex({
vectorIndex,
embeddingModel,
objects: chunks,
getValueToEmbed: (chunk) => chunk.text,
});

Let's look at each step:

Initialize the text embedding model

The first step is to initialize a text embedding model. This model will be responsible for converting our text data into a format that can be compared for similarity.

const embeddingModel = openai.TextEmbedder({
model: "text-embedding-ada-002",
});

Text embedding models work by converting chunks of text into vectors in a multi-dimensional space such that text with similar meaning will have vectors that are close to each other. These vectors will be stored in a vector index.

Tokenization and text chunking

We need to prepare the text data before we convert our text into vectors. This preparation involves splitting the text into smaller pieces, known as "chunks," that are manageable for the model.

const chunks = await splitTextChunks(
splitAtToken({
maxTokensPerChunk: 256,
tokenizer: embeddingModel.tokenizer,
}),
pages
);

We limit each chunk to 256 tokens and use the tokenizer from our embedding model. The splitTextChunks function recursively splits the text until the chunks fit the specified maximum size.

You can play with the chunk size and see how it affects the results. When chunks are too small, they might contain only some of the necessary information to answer a question. When chunks are too large, their embedding vector may not be similar enough to the hypothetical answer we generate later.

Token: A token is the smallest unit that a machine-learning model reads. In language models, a token can be as small as a character or as long as a word (e.g., 'a', 'apple').

Tokenizer: A tool that breaks down text into tokens. ModelFusion provides the tokenizer for most text generation and embedding models.

Creating a memory vector index

The next step is to create an empty memory vector index to store our embedded text vectors.

const vectorIndex = new MemoryVectorIndex<{
pageNumber: number;
text: string;
}>();

A vector store is like a specialized database for vectors. It allows us to perform quick searches to find similar vectors to a given query vector.

In ModelFusion, a vector index is a searchable interface to access a vector store for a specific table or metadata. In our app, each vector in the index is associated with the page number and the text chunk it originated from.

The ModelFusion MemoryVectorIndex is a simple in-memory implementation of a vector index that uses cosine similarity to find similar vectors. It's a good choice for small datasets, such as a single PDF file loaded on-demand.

Inserting text chunks into the vector index

Finally, we populate our memory vector index with the text vectors generated from our chunks.

await upsertIntoVectorIndex({
vectorIndex,
embeddingModel,
objects: chunks,
getValueToEmbed: (chunk) => chunk.text,
});

The function upsertIntoVectorIndex performs the following:

  • It uses the embeddingModel to convert the text of each text chunk into a vector.
  • It then inserts this vector into vectorIndex, along with the metadata (page number and text).

At this point, our vector index is fully populated and ready for fast, semantic-based searches. This is essential for our chatbot to provide relevant and accurate answers.

In summary, indexing involves converting text chunks into a vectorized, searchable format. It the stage for semantic-based text retrieval, enabling our chatbot to understand and respond in a context-aware manner.

The Chat Loop

The chat loop is the central part of our "Chat with PDF" application. It continuously awaits user questions, generates hypothetical answers, searches for similar text chunks from a pre-processed PDF, and responds to the user.

const chat = readline.createInterface({
input: process.stdin,
output: process.stdout,
});

while (true) {
const question = await chat.question("You: ");

const hypotheticalAnswer = await generateText({
model: openai.ChatTextGenerator({ model: "gpt-3.5-turbo", temperature: 0 }),
prompt: [
openai.ChatMessage.system(`Answer the user's question.`),
openai.ChatMessage.user(question),
],
});

const information = await retrieve(
new VectorIndexRetriever({
vectorIndex,
embeddingModel,
maxResults: 5,
similarityThreshold: 0.75,
}),
hypotheticalAnswer
);

const textStream = await streamText({
model: openai.ChatTextGenerator({ model: "gpt-4", temperature: 0 }),
prompt: [
openai.ChatMessage.system(
`Answer the user's question using only the provided information.\n` +
`Include the page number of the information that you are using.\n` +
`If the user's question cannot be answered using the provided information, ` +
`respond with "I don't know".`
),
openai.ChatMessage.user(question),
openai.ChatMessage.fn({
fnName: "getInformation",
content: JSON.stringify(information),
}),
],
});

process.stdout.write("\nAI : ");
for await (const textPart of textStream) {
process.stdout.write(textPart);
}
process.stdout.write("\n\n");
}

Let's break down the major components of the code within the chat loop.

Looping and waiting for user input

const chat = readline.createInterface({
input: process.stdin,
output: process.stdout,
});

while (true) {
const question = await chat.question("You: ");
// ...
}

The chat loop runs indefinitely to keep the chat interaction alive. We use the Node.js readline package for collecting user input from the terminal on each iteration.

Generate a hypothetical answer

const hypotheticalAnswer = await generateText({
model: openai.ChatTextGenerator({ model: "gpt-3.5-turbo", temperature: 0 }),
prompt: [
openai.ChatMessage.system(`Answer the user's question.`),
openai.ChatMessage.user(question),
],
});

We use the gpt-3.5-turbo model from OpenAI to create a hypothetical answer first.

The idea (hypothetical document embeddings) is that the hypothetical answer will be closer to the chunks we seek in the embedding vector space than the user's question. This approach will help us to find better results when searching for similar text chunks later.

Retrieve relevant text chunks

const information = await retrieve(
new VectorIndexRetriever({
vectorIndex,
embeddingModel,
maxResults: 5,
similarityThreshold: 0.75,
}),
hypotheticalAnswer
);

The retrieve() function searches for text chunks similar to the hypothetical answer from the pre-processed PDF.

We limit the results to 5 and set a similarity threshold of 0.75. You can play with these parameters (in combination with the earlier chunk size setting) to see how they affect the results. When you e.g., make the chunks smaller, you might want to increase the number of results to get more information.

Generate an answer using text chunks

const textStream = await streamText({
model: openai.ChatTextGenerator({ model: "gpt-4", temperature: 0 }),
prompt: [
openai.ChatMessage.system(
`Answer the user's question using only the provided information.\n` +
`Include the page number of the information that you are using.\n` +
`If the user's question cannot be answered using the provided information, ` +
`respond with "I don't know".`
),
openai.ChatMessage.user(question),
openai.ChatMessage.functionResult(
"getInformation",
JSON.stringify(information)
),
],
});

We use gpt-4 to generate a final answer based on the retrieved text chunks. The temperature is set to 0 to remove as much randomness as possible from the response.

In the system prompt, we specify that:

  • The answer should be based solely on the retrieved text chunks.
  • The page number of the information should be included.
  • The answer should be "I don't know" if the user's question cannot be answered using the provided information. This instruction steers the LLM towards using this answer if it cannot find the answer in the text chunks.

The chunks are inserted as fake function results (using the OpenAI function calling API) to indicate that they are separate from the user's question.

The answer is streamed to show information to the user as soon as it is available.

Stream the answer to the console

process.stdout.write("\nAI : ");
for await (const textPart of textStream) {
process.stdout.write(textPart);
}
process.stdout.write("\n\n");

Finally, we display the generated answer to the user using stdout.write() to print the text parts collected from textStream.

Conclusion

That wraps up our journey into building a chatbot capable of answering questions based on PDF content. With the help of OpenAI and ModelFusion, you've seen how to read, index, and retrieve information from PDF files.

The code is intended as a starting point for your projects. Have fun exploring!

P.S.: You can find the complete code for the application here: github.com/lgrammel/modelfusion/examples/pdf-chat-terminal