Rodrigo Rodriguez (Pragmatismo) 06c75cb690 Looking at this diff, I can see it's a comprehensive documentation

update and code refactoring focused on:

1. Adding new documentation pages to the table of contents
2. Restructuring the bot templates documentation
3. Changing keyword syntax from underscore format to space format (e.g.,
   `SET_BOT_MEMORY` → `SET BOT MEMORY`)
4. Updating compiler and keyword registration to support the new
   space-based syntax
5. Adding new keyword modules (social media, lead scoring, templates,
   etc.)

Refactor BASIC keywords to use spaces instead of underscores

Change keyword syntax from underscore format (SET_BOT_MEMORY) to more
natural space-separated format (SET BOT MEMORY) throughout the codebase.

Key changes:
- Update Rhai custom syntax registration to use space tokens
- Simplify compiler preprocessing (fewer replacements needed)
- Update all template .bas files to use new syntax
- Expand documentation with consolidated examples and new sections
- Add new keyword modules: social_media, lead_scoring, send_template,
  core_functions, qrcode, sms, procedures, import_export, llm_macros,
  on_form_submit

2025-11-30 10:53:59 -03:00

9.9 KiB

Raw Blame History

KB and TOOL System Documentation

Overview

The General Bots system provides 4 essential keywords for managing Knowledge Bases (KB) and Tools dynamically during conversation sessions:

USE KB - Load and embed files from .gbkb folders into vector database
CLEAR KB - Remove KB from current session
USE TOOL - Make a tool available for LLM to call
CLEAR TOOLS - Remove all tools from current session

Knowledge Base (KB) System

What is a KB?

A Knowledge Base (KB) is a folder containing documents (.gbkb folder structure) that are vectorized/embedded and stored in a vector database. The vectorDB retrieves relevant chunks/excerpts to inject into prompts, giving the LLM context-aware responses.

Folder Structure

work/
  {bot_name}/
    {bot_name}.gbkb/          # Knowledge Base root
      circular/               # KB folder 1
        document1.pdf
        document2.md
        document3.txt
      comunicado/             # KB folder 2
        info.docx
        data.csv
      docs/                   # KB folder 3
        README.md
        guide.pdf

KB Loading Process

Scan folder - System scans .gbkb folder for documents
Process files - Extracts text from PDF, DOCX, TXT, MD, CSV files
Chunk text - Splits into ~1000 character chunks with overlap
Generate embeddings - Creates vector representations
Store in VectorDB - Saves to vector database for similarity search
Ready for queries - KB available for semantic search

Supported File Types

PDF - Full text extraction with pdf-extract
DOCX/DOC - Microsoft Word documents
TXT - Plain text files
MD - Markdown documents
CSV - Structured data (each row as entry)
HTML - Web pages (text only)
JSON - Structured data

USE KB Keyword

USE KB "circular"
# Loads the 'circular' KB folder into session
# All documents in that folder are now searchable

USE KB "comunicado"
# Adds another KB to the session
# Both 'circular' and 'comunicado' are now active

CLEAR KB Keyword

CLEAR KB
# Removes all loaded KBs from current session
# Frees up memory and context space

Tool System

What are Tools?

Tools are callable functions that the LLM can invoke to perform specific actions:

Query databases
Call APIs
Process data
Execute workflows
Integrate with external systems

Tool Definition

Tools are defined in .bas files that generate MCP and OpenAI-compatible tool definitions:

' weather.bas - becomes a tool automatically
PARAM location AS string
PARAM units AS string DEFAULT "celsius"
DESCRIPTION "Get current weather for a location"

' Tool implementation
weather_data = GET "https://api.weather.com/v1/current?location=" + location
' System AI will format and present the data naturally
SET CONTEXT "weather_data", weather_data
TALK "Here's the current weather for " + location

Tool Registration

Tools are registered in two ways:

Auto-discovery - All .bas files in .gbdialog folder (except start.bas) become tools
Dynamic Loading - Via USE TOOL keyword for external tools

USE TOOL Keyword

USE TOOL "weather"
# Makes the weather tool available to LLM

USE TOOL "database_query"
# Adds database query tool to session

USE TOOL "email_sender"
# Enables email sending capability

CLEAR TOOLS Keyword

CLEAR TOOLS
# Removes all tools from current session
# LLM can no longer call external functions

Session Management

Context Lifecycle

Session Start - Clean slate, no KB or tools
Load Resources - USE KB and USE TOOL as needed
Active Use - LLM uses loaded resources
Clear Resources - CLEAR KB/CLEAR TOOLS when done
Session End - Automatic cleanup

Best Practices

KB Management

Load relevant KBs only - Don't overload context
Clear when switching topics - Keep context focused
Update KBs regularly - Keep information current
Monitor token usage - Vector search adds tokens

Tool Management

Enable minimal tools - Only what's needed
Validate tool responses - Check for errors
Log tool usage - For audit and debugging
Set rate limits - Prevent abuse

Performance Considerations

Memory Usage

Each KB uses ~100-500MB RAM (depends on size)
Tools use minimal memory (<1MB each)
Vector search adds 10-50ms latency
Clear unused resources to free memory

Token Optimization

KB chunks add 500-2000 tokens per query
Tool descriptions use 50-200 tokens each
Clear resources to reduce token usage
Use specific KB folders vs entire database

Implementation Details

Vector Database

Configuration:

Collection: Per bot instance
Embedding Model: text-embedding-ada-002
Dimension: 1536
Distance: Cosine similarity
Index: HNSW with M=16, ef=100

File Processing Pipeline

File Processing Steps (use_kb):

Scan directory for files
Extract text based on file type
Clean and normalize text
Split into chunks (1000 chars, 200 overlap)
Generate embeddings via OpenAI
Store in vector database with metadata
Update session context

Tool Execution Engine

Tool Execution Steps (use_tool):

Parse tool definition (JSON schema)
Register with LLM context
Listen for tool invocation
Validate parameters
Execute tool (HTTP/function call)
Return results to LLM
Log execution for audit

Error Handling

Common Errors

Common Issues

Error	Cause	Solution
`KB_NOT_FOUND`	KB folder doesn't exist	Check folder name and path
`VECTORDB_ERROR`	Vector database connection issue	Check vector database service
`EMBEDDING_FAILED`	Embedding API error	Check API key and limits
`TOOL_NOT_FOUND`	Tool not registered	Verify tool name
`TOOL_EXECUTION_ERROR`	Tool failed to execute	Check tool endpoint/logic
`MEMORY_LIMIT`	Too many KBs loaded	Clear unused KBs

Check logs for:

KB loading progress
Embedding generation
Vector search queries
Tool invocations
Error details

Examples

Customer Support Bot

# Load product documentation
USE KB "product_docs"
USE KB "faqs"

# Enable support tools
USE TOOL "ticket_system"
USE TOOL "knowledge_search"

# System AI now has access to docs and can work with tickets
TALK "How can I help you with your support needs today?"
# System AI automatically searches KB and uses tools when responding

# Clean up after session
CLEAR KB
CLEAR TOOLS

Research Assistant

# Load research papers
USE KB "papers_2024"
USE KB "citations"

# Enable research tools
USE TOOL "arxiv_search"
USE TOOL "citation_formatter"

# System AI can now search papers and format citations
TALK "What research topic would you like to explore?"

# Switch to different topic
CLEAR KB
USE KB "papers_biology"

Enterprise Integration

# Load company policies
USE KB "hr_policies"
USE KB "it_procedures"

# Enable enterprise tools
USE TOOL "active_directory"
USE TOOL "jira_integration"
USE TOOL "slack_notifier"

# Bot can now query AD, work with Jira, send Slack messages
# ... handle employee request ...

# End of shift cleanup
CLEAR KB
CLEAR TOOLS

Security Considerations

KB Security

Access Control - KBs require authorization
Encryption - Files encrypted at rest
Audit Logging - All KB access logged
Data Isolation - Per-session KB separation

Tool Security

Authentication - Tools require valid session
Rate Limiting - Prevent tool abuse
Parameter Validation - Input sanitization
Execution Sandboxing - Tools run isolated

Best Practices

Principle of Least Privilege - Only load needed resources
Regular Audits - Review KB and tool usage
Secure Storage - Encrypt sensitive KBs
API Key Management - Rotate tool API keys
Session Isolation - Clear resources between users

Configuration

Troubleshooting

KB Issues

Problem: KB not loading

Check folder exists in work/{bot_name}/{bot_name}.gbkb/
Verify file permissions
Check vector database connection
Review logs for embedding errors

Problem: Poor search results

Increase chunk overlap
Adjust chunk size
Update embedding model
Clean/preprocess documents better

Tool Issues

Problem: Tool not executing

Verify tool registration
Check parameter validation
Test endpoint directly
Review execution logs

Problem: Tool timeout

Increase timeout setting
Check network connectivity
Optimize tool endpoint
Add retry logic

Migration Guide

From File-based to Vector Search

Export existing files
Organize into .gbkb folders
Run embedding pipeline
Test vector search
Update bot logic

From Static to Dynamic Tools

Convert function to tool definition
Create .gbtool file
Implement endpoint/handler
Test with USE TOOL
Remove static registration

9.9 KiB Raw Blame History

KB and TOOL System Documentation

Overview

Knowledge Base (KB) System

What is a KB?

Folder Structure

KB Loading Process

Supported File Types

USE KB Keyword

CLEAR KB Keyword

Tool System

What are Tools?

Tool Definition

Tool Registration

USE TOOL Keyword

CLEAR TOOLS Keyword

Session Management

Context Lifecycle

Best Practices

KB Management

Tool Management

Performance Considerations

Memory Usage

Token Optimization

Implementation Details

Vector Database

File Processing Pipeline

Tool Execution Engine

Error Handling

Common Errors

Common Issues

Examples

Customer Support Bot

Research Assistant

Enterprise Integration

Security Considerations

KB Security

Tool Security

Best Practices

Configuration

Troubleshooting

KB Issues

Tool Issues

Migration Guide

From File-based to Vector Search

From Static to Dynamic Tools

See Also

Documentation

Further Reading - Blog Posts

Next Chapter

9.9 KiB

Raw Blame History