update and code refactoring focused on: 1. Adding new documentation pages to the table of contents 2. Restructuring the bot templates documentation 3. Changing keyword syntax from underscore format to space format (e.g., `SET_BOT_MEMORY` → `SET BOT MEMORY`) 4. Updating compiler and keyword registration to support the new space-based syntax 5. Adding new keyword modules (social media, lead scoring, templates, etc.) Refactor BASIC keywords to use spaces instead of underscores Change keyword syntax from underscore format (SET_BOT_MEMORY) to more natural space-separated format (SET BOT MEMORY) throughout the codebase. Key changes: - Update Rhai custom syntax registration to use space tokens - Simplify compiler preprocessing (fewer replacements needed) - Update all template .bas files to use new syntax - Expand documentation with consolidated examples and new sections - Add new keyword modules: social_media, lead_scoring, send_template, core_functions, qrcode, sms, procedures, import_export, llm_macros, on_form_submit
9.9 KiB
KB and TOOL System Documentation
Overview
The General Bots system provides 4 essential keywords for managing Knowledge Bases (KB) and Tools dynamically during conversation sessions:
- USE KB - Load and embed files from
.gbkbfolders into vector database - CLEAR KB - Remove KB from current session
- USE TOOL - Make a tool available for LLM to call
- CLEAR TOOLS - Remove all tools from current session
Knowledge Base (KB) System
What is a KB?
A Knowledge Base (KB) is a folder containing documents (.gbkb folder structure) that are vectorized/embedded and stored in a vector database. The vectorDB retrieves relevant chunks/excerpts to inject into prompts, giving the LLM context-aware responses.
Folder Structure
work/
{bot_name}/
{bot_name}.gbkb/ # Knowledge Base root
circular/ # KB folder 1
document1.pdf
document2.md
document3.txt
comunicado/ # KB folder 2
info.docx
data.csv
docs/ # KB folder 3
README.md
guide.pdf
KB Loading Process
- Scan folder - System scans
.gbkbfolder for documents - Process files - Extracts text from PDF, DOCX, TXT, MD, CSV files
- Chunk text - Splits into ~1000 character chunks with overlap
- Generate embeddings - Creates vector representations
- Store in VectorDB - Saves to vector database for similarity search
- Ready for queries - KB available for semantic search
Supported File Types
- PDF - Full text extraction with pdf-extract
- DOCX/DOC - Microsoft Word documents
- TXT - Plain text files
- MD - Markdown documents
- CSV - Structured data (each row as entry)
- HTML - Web pages (text only)
- JSON - Structured data
USE KB Keyword
USE KB "circular"
# Loads the 'circular' KB folder into session
# All documents in that folder are now searchable
USE KB "comunicado"
# Adds another KB to the session
# Both 'circular' and 'comunicado' are now active
CLEAR KB Keyword
CLEAR KB
# Removes all loaded KBs from current session
# Frees up memory and context space
Tool System
What are Tools?
Tools are callable functions that the LLM can invoke to perform specific actions:
- Query databases
- Call APIs
- Process data
- Execute workflows
- Integrate with external systems
Tool Definition
Tools are defined in .bas files that generate MCP and OpenAI-compatible tool definitions:
' weather.bas - becomes a tool automatically
PARAM location AS string
PARAM units AS string DEFAULT "celsius"
DESCRIPTION "Get current weather for a location"
' Tool implementation
weather_data = GET "https://api.weather.com/v1/current?location=" + location
' System AI will format and present the data naturally
SET CONTEXT "weather_data", weather_data
TALK "Here's the current weather for " + location
Tool Registration
Tools are registered in two ways:
- Auto-discovery - All
.basfiles in.gbdialogfolder (except start.bas) become tools - Dynamic Loading - Via USE TOOL keyword for external tools
USE TOOL Keyword
USE TOOL "weather"
# Makes the weather tool available to LLM
USE TOOL "database_query"
# Adds database query tool to session
USE TOOL "email_sender"
# Enables email sending capability
CLEAR TOOLS Keyword
CLEAR TOOLS
# Removes all tools from current session
# LLM can no longer call external functions
Session Management
Context Lifecycle
- Session Start - Clean slate, no KB or tools
- Load Resources - USE KB and USE TOOL as needed
- Active Use - LLM uses loaded resources
- Clear Resources - CLEAR KB/CLEAR TOOLS when done
- Session End - Automatic cleanup
Best Practices
KB Management
- Load relevant KBs only - Don't overload context
- Clear when switching topics - Keep context focused
- Update KBs regularly - Keep information current
- Monitor token usage - Vector search adds tokens
Tool Management
- Enable minimal tools - Only what's needed
- Validate tool responses - Check for errors
- Log tool usage - For audit and debugging
- Set rate limits - Prevent abuse
Performance Considerations
Memory Usage
- Each KB uses ~100-500MB RAM (depends on size)
- Tools use minimal memory (<1MB each)
- Vector search adds 10-50ms latency
- Clear unused resources to free memory
Token Optimization
- KB chunks add 500-2000 tokens per query
- Tool descriptions use 50-200 tokens each
- Clear resources to reduce token usage
- Use specific KB folders vs entire database
Implementation Details
Vector Database
Configuration:
- Collection: Per bot instance
- Embedding Model: text-embedding-ada-002
- Dimension: 1536
- Distance: Cosine similarity
- Index: HNSW with M=16, ef=100
File Processing Pipeline
File Processing Steps (use_kb):
- Scan directory for files
- Extract text based on file type
- Clean and normalize text
- Split into chunks (1000 chars, 200 overlap)
- Generate embeddings via OpenAI
- Store in vector database with metadata
- Update session context
Tool Execution Engine
Tool Execution Steps (use_tool):
- Parse tool definition (JSON schema)
- Register with LLM context
- Listen for tool invocation
- Validate parameters
- Execute tool (HTTP/function call)
- Return results to LLM
- Log execution for audit
Error Handling
Common Errors
Common Issues
| Error | Cause | Solution |
|---|---|---|
KB_NOT_FOUND |
KB folder doesn't exist | Check folder name and path |
VECTORDB_ERROR |
Vector database connection issue | Check vector database service |
EMBEDDING_FAILED |
Embedding API error | Check API key and limits |
TOOL_NOT_FOUND |
Tool not registered | Verify tool name |
TOOL_EXECUTION_ERROR |
Tool failed to execute | Check tool endpoint/logic |
MEMORY_LIMIT |
Too many KBs loaded | Clear unused KBs |
Check logs for:
- KB loading progress
- Embedding generation
- Vector search queries
- Tool invocations
- Error details
Examples
Customer Support Bot
# Load product documentation
USE KB "product_docs"
USE KB "faqs"
# Enable support tools
USE TOOL "ticket_system"
USE TOOL "knowledge_search"
# System AI now has access to docs and can work with tickets
TALK "How can I help you with your support needs today?"
# System AI automatically searches KB and uses tools when responding
# Clean up after session
CLEAR KB
CLEAR TOOLS
Research Assistant
# Load research papers
USE KB "papers_2024"
USE KB "citations"
# Enable research tools
USE TOOL "arxiv_search"
USE TOOL "citation_formatter"
# System AI can now search papers and format citations
TALK "What research topic would you like to explore?"
# Switch to different topic
CLEAR KB
USE KB "papers_biology"
Enterprise Integration
# Load company policies
USE KB "hr_policies"
USE KB "it_procedures"
# Enable enterprise tools
USE TOOL "active_directory"
USE TOOL "jira_integration"
USE TOOL "slack_notifier"
# Bot can now query AD, work with Jira, send Slack messages
# ... handle employee request ...
# End of shift cleanup
CLEAR KB
CLEAR TOOLS
Security Considerations
KB Security
- Access Control - KBs require authorization
- Encryption - Files encrypted at rest
- Audit Logging - All KB access logged
- Data Isolation - Per-session KB separation
Tool Security
- Authentication - Tools require valid session
- Rate Limiting - Prevent tool abuse
- Parameter Validation - Input sanitization
- Execution Sandboxing - Tools run isolated
Best Practices
- Principle of Least Privilege - Only load needed resources
- Regular Audits - Review KB and tool usage
- Secure Storage - Encrypt sensitive KBs
- API Key Management - Rotate tool API keys
- Session Isolation - Clear resources between users
Configuration
Troubleshooting
KB Issues
Problem: KB not loading
- Check folder exists in work/{bot_name}/{bot_name}.gbkb/
- Verify file permissions
- Check vector database connection
- Review logs for embedding errors
Problem: Poor search results
- Increase chunk overlap
- Adjust chunk size
- Update embedding model
- Clean/preprocess documents better
Tool Issues
Problem: Tool not executing
- Verify tool registration
- Check parameter validation
- Test endpoint directly
- Review execution logs
Problem: Tool timeout
- Increase timeout setting
- Check network connectivity
- Optimize tool endpoint
- Add retry logic
Migration Guide
From File-based to Vector Search
- Export existing files
- Organize into .gbkb folders
- Run embedding pipeline
- Test vector search
- Update bot logic
From Static to Dynamic Tools
- Convert function to tool definition
- Create .gbtool file
- Implement endpoint/handler
- Test with USE TOOL
- Remove static registration
See Also
Documentation
- Vector Collections - How vector search works
- Document Indexing - Automatic document processing
- Semantic Search - Meaning-based retrieval
- Context Compaction - Managing conversation context
- Caching - Performance optimization
- Chapter 6: BASIC Reference - Dialog scripting
- Chapter 9: API and Tools - Tool integration
Further Reading - Blog Posts
- BASIC LLM Tools - Extending LLMs with tools
- MCP is the new API - Modern tool integration
- Beyond Chatbots - Using knowledge bases effectively
Next Chapter
Continue to Chapter 4: User Interface to learn about creating bot interfaces.