GeneralBots/botbook

Fork 0

Rodrigo Rodriguez (Pragmatismo) 0cf9736944 Update: General project updates

2025-12-06 11:09:12 -03:00

2.8 KiB

Raw Blame History

Multimodal Module

Image, video, and audio generation with vision/captioning capabilities.

Overview

The multimodal module connects to BotModels server for AI-powered media generation and analysis.

BASIC Keywords

Keyword	Purpose
`IMAGE`	Generate image from text prompt
`VIDEO`	Generate video from text prompt
`AUDIO`	Generate speech audio from text
`SEE`	Describe/caption an image or video

IMAGE

Generate an image from a text prompt:

url = IMAGE "A sunset over mountains with a lake"
TALK "Here's your image: " + url

Timeout: 300 seconds (5 minutes)

VIDEO

Generate a video from a text prompt:

url = VIDEO "A cat playing with a ball of yarn"
TALK "Here's your video: " + url

Timeout: 600 seconds (10 minutes)

AUDIO

Generate speech audio from text:

url = AUDIO "Welcome to our service. How can I help you today?"
PLAY url

SEE

Get a description of an image or video:

description = SEE "path/to/image.jpg"
TALK "I see: " + description

Configuration

Add to config.csv:

botmodels-enabled,true
botmodels-host,localhost
botmodels-port,5000
botmodels-api-key,your-api-key
botmodels-use-https,false

Image Generation Config

botmodels-image-model,stable-diffusion
botmodels-image-steps,20
botmodels-image-width,512
botmodels-image-height,512

Video Generation Config

botmodels-video-model,text2video
botmodels-video-frames,16
botmodels-video-fps,8

BotModels Client

Rust API for direct integration:

let client = BotModelsClient::from_state(&state, &bot_id);

if client.is_enabled() {
    let image_url = client.generate_image("A beautiful garden").await?;
    let description = client.describe_image("path/to/photo.jpg").await?;
}

Available Methods

Method	Description
`generate_image(prompt)`	Create image from text
`generate_video(prompt)`	Create video from text
`generate_audio(text)`	Create speech audio
`describe_image(path)`	Get image caption
`describe_video(path)`	Get video description
`speech_to_text(audio_path)`	Transcribe audio
`health_check()`	Check BotModels server status

Response Structures

GenerationResponse

{
    "status": "success",
    "file_path": "/path/to/generated/file.png",
    "generation_time": 12.5,
    "error": null
}

DescribeResponse

{
    "description": "A golden retriever playing fetch in a park",
    "confidence": 0.92
}

Requirements

BotModels server running (separate service)
GPU recommended for generation tasks
Sufficient disk space for generated media

2.8 KiB Raw Blame History