generalbots/botbook/src/10-configuration-deployment/multimodal.md
Rodrigo Rodriguez (Pragmatismo) 037db5c381 feat: Major workspace reorganization and documentation update
- Add comprehensive documentation in botbook/ with 12 chapters
- Add botapp/ Tauri desktop application
- Add botdevice/ IoT device support
- Add botlib/ shared library crate
- Add botmodels/ Python ML models service
- Add botplugin/ browser extension
- Add botserver/ reorganized server code
- Add bottemplates/ bot templates
- Add bottest/ integration tests
- Add botui/ web UI server
- Add CI/CD workflows in .forgejo/workflows/
- Add AGENTS.md and PROD.md documentation
- Add dependency management scripts (DEPENDENCIES.sh/ps1)
- Remove legacy src/ structure and migrations
- Clean up temporary and backup files
2026-04-19 08:14:25 -03:00

7.8 KiB

Multimodal Configuration

General Bots integrates with botmodels—a Python service for multimodal AI tasks—to enable image generation, video creation, audio synthesis, and vision capabilities directly from BASIC scripts.

General Bots

Architecture

┌─────────────┐     HTTPS      ┌─────────────┐
│  botserver  │ ────────────▶  │  botmodels  │
│   (Rust)    │                │  (Python)   │
└─────────────┘                └─────────────┘
      │                              │
      │ BASIC Keywords               │ AI Models
      │ - IMAGE                      │ - Stable Diffusion
      │ - VIDEO                      │ - Zeroscope
      │ - AUDIO                      │ - TTS/Whisper
      │ - SEE                        │ - BLIP2

When a BASIC script calls a multimodal keyword, botserver forwards the request to botmodels, which runs the appropriate AI model and returns the generated content.

Configuration

Add these settings to your bot's config.csv file to enable multimodal capabilities.

BotModels Service

Key Default Description
botmodels-enabled false Enable botmodels integration
botmodels-host 0.0.0.0 Host address for botmodels service
botmodels-port 8085 Port for botmodels service
botmodels-api-key API key for authentication
botmodels-https false Use HTTPS for connection

Image Generation

Key Default Description
image-generator-model Path to image generation model
image-generator-steps 4 Inference steps (more = higher quality, slower)
image-generator-width 512 Output image width in pixels
image-generator-height 512 Output image height in pixels
image-generator-gpu-layers 20 Layers to offload to GPU
image-generator-batch-size 1 Batch size for generation

Video Generation

Key Default Description
video-generator-model Path to video generation model
video-generator-frames 24 Number of frames to generate
video-generator-fps 8 Output frames per second
video-generator-width 320 Output video width in pixels
video-generator-height 576 Output video height in pixels
video-generator-gpu-layers 15 Layers to offload to GPU
video-generator-batch-size 1 Batch size for generation

Example Configuration

key,value
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
botmodels-api-key,your-secret-key
botmodels-https,false
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
image-generator-gpu-layers,20
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8

BASIC Keywords

Once configured, these keywords become available in your scripts.

IMAGE

Generate an image from a text prompt:

file = IMAGE "a sunset over mountains with purple clouds"
SEND FILE TO user, file

The keyword returns a path to the generated image file.

VIDEO

Generate a video from a text prompt:

file = VIDEO "a rocket launching into space"
SEND FILE TO user, file

Video generation is more resource-intensive than image generation. Expect longer processing times.

AUDIO

Generate speech audio from text:

file = AUDIO "Hello, welcome to our service!"
SEND FILE TO user, file

SEE

Analyze an image or video and get a description:

' Describe an image
caption = SEE "/path/to/image.jpg"
TALK caption

' Describe a video
description = SEE "/path/to/video.mp4"
TALK description

The SEE keyword uses vision models to understand visual content and return natural language descriptions.

Starting BotModels

Before using multimodal features, start the botmodels service:

cd botmodels
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085

For production with HTTPS:

python -m uvicorn src.main:app \
    --host 0.0.0.0 \
    --port 8085 \
    --ssl-keyfile key.pem \
    --ssl-certfile cert.pem

BotModels API Endpoints

The botmodels service exposes these REST endpoints:

Endpoint Method Description
/api/image/generate POST Generate image from prompt
/api/video/generate POST Generate video from prompt
/api/speech/generate POST Generate speech from text
/api/speech/totext POST Transcribe audio to text
/api/vision/describe POST Describe an image
/api/vision/describe_video POST Describe a video
/api/vision/vqa POST Visual question answering
/api/health GET Health check

All endpoints except /api/health require the X-API-Key header for authentication.

Model Paths

Configure model paths relative to the botmodels service directory. Typical layout:

data/
├── diffusion/
│   ├── sd_turbo_f16.gguf          # Stable Diffusion
│   └── zeroscope_v2_576w/         # Zeroscope video
├── tts/
│   └── model.onnx                 # Text-to-speech
├── whisper/
│   └── model.bin                  # Speech-to-text
└── vision/
    └── blip2/                     # Vision model

GPU Acceleration

Both image and video generation benefit significantly from GPU acceleration. Configure GPU layers based on your hardware:

GPU VRAM Recommended GPU Layers
4GB 8-12
8GB 15-20
12GB+ 25-35

Lower GPU layers if you experience out-of-memory errors.

Troubleshooting

"BotModels is not enabled"

Set botmodels-enabled=true in your config.csv.

Connection refused

Verify botmodels service is running and check host/port configuration. Test connectivity:

curl http://localhost:8085/api/health

Authentication failed

Ensure botmodels-api-key in config.csv matches the API_KEY environment variable in botmodels.

Model not found

Verify model paths are correct and models are downloaded to the expected locations.

Out of memory

Reduce gpu-layers or batch-size. Video generation is particularly memory-intensive.

Security Considerations

Use HTTPS in production. Set botmodels-https=true and configure SSL certificates on the botmodels service.

Use strong API keys. Generate cryptographically random keys for the botmodels-api-key setting.

Restrict network access. Limit botmodels service access to trusted hosts only.

Consider GPU isolation. Run botmodels on a dedicated GPU server if sharing resources with other services.

Performance Tips

Image generation runs fastest with SD Turbo models and 4-8 inference steps. More steps improve quality but increase generation time linearly.

Video generation is the most resource-intensive operation. Keep frame counts low (24-48) for reasonable response times.

Batch processing improves throughput when generating multiple items. Increase batch-size if you have sufficient GPU memory.

Caching generated content when appropriate. If multiple users request similar content, consider storing results.

See Also