Re-imagine images using GPT-4 Vision and Dall-E 3

You can use a combination of multi-modal text generation models and image generation models to recreate an image in a new style.

This tutorial demonstrates how to recreate an image in a cyberpunk style using OpenAI GPT 4 Vision and Dall-E 3. It involves generating a text prompt that captures the essence of the original image and then creating a new image in the desired style using AI models.

note

To use the OpenAI GPT-4 Vision with ModelFusion, you need to have an OpenAI API key and access to the gpt-4-vision-preview model.

First we need to obtain the image that we want to recreate. Here we fetch the image from a URL.

const imageResponse = await fetch(imageUrl);
const originalImage = Buffer.from(await imageResponse.arrayBuffer());

After obtaining the base image, the next step is to create an image generation prompt using GPT 4 Vision. This prompt will instruct Dall-E 3 to generate a description for creating a cyberpunk-style version of the original image.

const imageGenerationPrompt = await generateText({
  model: openai
    .ChatTextGenerator({
      model: "gpt-4-vision-preview",
      maxGenerationTokens: 128,
    })
    .withInstructionPrompt(),

  prompt: {
    instruction: [
      {
        type: "text",
        text:
          "Generate an image generation prompt for creating a cyberpunk-style image " +
          "that resembles the attached image. " +
          "Capture the essence of the image in 1-2 sentences.",
      },
      { type: "image", image: originalImage },
    ],
  },
});

console.log(`Image generation prompt:`);
console.log(imageGenerationPrompt);

Dall-E 3 will interpret the prompt and create a new image in the specified cyberpunk style.

const image = await generateImage({
  model: openai.ImageGenerator({
    model: "dall-e-3",
    quality: "hd",
    size: "1024x1024",
  }),
  prompt: imageGenerationPrompt,
});

Once the recreated image has been generated, the last step is to save it to disk.

const path = `./enhanced-image-example.png`;
fs.writeFileSync(path, image);

console.log(`Image saved to ${path}`);

In summary, this tutorial demonstrates a simple yet powerful way to use AI for transforming images into new styles, combining GPT-4 Vision and Dall-E 3 using ModelFusion. You can use the same approach to recreate images in other styles, such as watercolor paintings or pencil sketches.

References

Source Code

Re-imagine images using GPT-4 Vision and Dall-E 3

References​

References