botbook/src/18-appendix-external-services/multimodal.md

2.8 KiB

Multimodal Module

Image, video, and audio generation with vision/captioning capabilities.

Overview

The multimodal module connects to BotModels server for AI-powered media generation and analysis.

BASIC Keywords

Keyword Purpose
IMAGE Generate image from text prompt
VIDEO Generate video from text prompt
AUDIO Generate speech audio from text
SEE Describe/caption an image or video

IMAGE

Generate an image from a text prompt:

url = IMAGE "A sunset over mountains with a lake"
TALK "Here's your image: " + url

Timeout: 300 seconds (5 minutes)

VIDEO

Generate a video from a text prompt:

url = VIDEO "A cat playing with a ball of yarn"
TALK "Here's your video: " + url

Timeout: 600 seconds (10 minutes)

AUDIO

Generate speech audio from text:

url = AUDIO "Welcome to our service. How can I help you today?"
PLAY url

SEE

Get a description of an image or video:

description = SEE "path/to/image.jpg"
TALK "I see: " + description

Configuration

Add to config.csv:

botmodels-enabled,true
botmodels-host,localhost
botmodels-port,5000
botmodels-api-key,your-api-key
botmodels-use-https,false

Image Generation Config

botmodels-image-model,stable-diffusion
botmodels-image-steps,20
botmodels-image-width,512
botmodels-image-height,512

Video Generation Config

botmodels-video-model,text2video
botmodels-video-frames,16
botmodels-video-fps,8

BotModels Client

Rust API for direct integration:

let client = BotModelsClient::from_state(&state, &bot_id);

if client.is_enabled() {
    let image_url = client.generate_image("A beautiful garden").await?;
    let description = client.describe_image("path/to/photo.jpg").await?;
}

Available Methods

Method Description
generate_image(prompt) Create image from text
generate_video(prompt) Create video from text
generate_audio(text) Create speech audio
describe_image(path) Get image caption
describe_video(path) Get video description
speech_to_text(audio_path) Transcribe audio
health_check() Check BotModels server status

Response Structures

GenerationResponse

{
    "status": "success",
    "file_path": "/path/to/generated/file.png",
    "generation_time": 12.5,
    "error": null
}

DescribeResponse

{
    "description": "A golden retriever playing fetch in a park",
    "confidence": 0.92
}

Requirements

  • BotModels server running (separate service)
  • GPU recommended for generation tasks
  • Sufficient disk space for generated media

See Also