3.5 KiB
Vector Collections
A vector collection is automatically generated from each folder in .gbkb. Each folder becomes a searchable collection that the LLM can use during conversations.
How Collections Work
Each .gbkb folder is automatically:
- Scanned for documents (PDF, DOCX, TXT, HTML, MD)
- Text extracted from all files
- Split into chunks for processing
- Converted to vector embeddings using BGE model (replaceable)
- Made available for semantic search
Folder Structure
botname.gbkb/
├── policies/ # Becomes "policies" collection
├── procedures/ # Becomes "procedures" collection
└── faqs/ # Becomes "faqs" collection
Using Collections
Simply activate a collection with USE KB:
USE KB "policies"
' The LLM now has access to all documents in the policies folder
' No need to explicitly search - happens automatically during responses
Multiple Collections
Load multiple collections for comprehensive knowledge:
USE KB "policies"
USE KB "procedures"
USE KB "faqs"
' All three collections are now active
' LLM searches across all when generating responses
Automatic Document Indexing
Documents are indexed automatically when:
- Files are added to
.gbkbfolders USE KBis called for the first time- The system detects new or modified files
Website Indexing
To keep web content updated, schedule regular crawls:
' In update-content.bas
SET SCHEDULE "0 3 * * *" ' Run daily at 3 AM
ADD WEBSITE "https://example.com/docs"
' Website content is crawled and added to the collection
How Search Works
When USE KB is active:
- User asks a question
- System automatically searches relevant collections
- Finds semantically similar content
- Injects relevant chunks into LLM context
- LLM generates response using the knowledge
Important: Search happens automatically - you don't need to call any search function. Just activate the KB with USE KB and ask questions naturally.
Embeddings Configuration
The system uses BGE embeddings by default:
embedding-url,http://localhost:8082
embedding-model,../../../../data/llm/bge-small-en-v1.5-f32.gguf
You can replace BGE with any compatible embedding model by changing the model path in config.csv.
Collection Management
USE KB "name"- Activates a collection for the sessionCLEAR KB- Removes all active collectionsCLEAR KB "name"- Removes a specific collection
Best Practices
- Organize by topic - One folder per subject area
- Name clearly - Use descriptive folder names
- Update regularly - Schedule website crawls if using web content
- Keep files current - System auto-indexes changes
- Don't overload - Use only necessary collections per session
Example: Customer Support Bot
support.gbkb/
├── products/ # Product documentation
├── policies/ # Company policies
├── troubleshooting/ # Common issues and solutions
└── contact/ # Contact information
In your dialog:
' Activate all support knowledge
USE KB "products"
USE KB "troubleshooting"
' Bot can now answer product questions and solve issues
Performance Notes
- Collections are cached for fast access
- Only active collections consume memory
- Embeddings are generated once and reused
- Changes trigger automatic re-indexing
No manual configuration needed - just organize your documents in folders and use USE KB to activate them!