6.8 KiB
Storage and Data
BotServer uses multiple storage layers to handle different types of data, from structured user information to unstructured documents and vector embeddings.
Overview
Storage in BotServer is organized into:
- PostgreSQL - Structured data and metadata
- Drive - S3-compatible object storage for files and documents
- Cache (Valkey) - Session cache and temporary data
- Qdrant - Vector embeddings for semantic search
- Local filesystem - Working directories and cache
Storage Architecture
Data Flow
User Upload → Drive Storage → Processing → Database Metadata
↓ ↓
Vector Database PostgreSQL Tables
↓ ↓
Semantic Search Structured Queries
PostgreSQL Database
Primary Data Store
PostgreSQL stores all structured data:
- User accounts and sessions
- Bot configurations
- Message history
- System automations
- Knowledge base metadata
Schema Management
- Migrations in
migrations/directory - Diesel ORM for type-safe queries
- Automatic migration on bootstrap
- Version tracking in database
Connection Pooling
DATABASE_URL=postgres://gbuser:password@localhost:5432/botserver
DB_POOL_SIZE=10
Connection pool managed by Diesel:
- Default 10 connections
- Automatic retry on failure
- Connection recycling
- Timeout protection
Drive (S3-Compatible) Object Storage
File Organization
Drive stores unstructured data:
drive/
├── bot-name.gbai/ # Bot-specific bucket
│ ├── bot-name.gbdialog/ # BASIC scripts
│ ├── bot-name.gbkb/ # Knowledge base documents
│ └── bot-name.gbot/ # Configuration files
└── botserver-media/ # Shared media files
Storage Operations
- Upload: Files uploaded via PUT operations
- Retrieval: GET operations with bucket/key
- Listing: Browse bucket contents
- Deletion: Remove objects (rarely used)
Configuration
DRIVE_SERVER=http://localhost:9000
DRIVE_ACCESSKEY=minioadmin
DRIVE_SECRET=minioadmin
Cache (Valkey)
Cached Data
Cache stores temporary and cached data:
- Session tokens
- Temporary conversation state
- API response cache
- Rate limiting counters
- Lock mechanisms
Cache Patterns
# Session cache
session:{session_id} → session_data (TTL: 24 hours)
# Rate limiting
rate:{user_id}:{endpoint} → request_count (TTL: 1 hour)
# Temporary data
temp:{key} → data (TTL: varies)
Configuration
CACHE_URL=redis://localhost:6379
CACHE_POOL_SIZE=5
CACHE_TTL_SECONDS=86400
Qdrant Vector Database
Vector Storage
Qdrant stores embedding vectors:
- Document embeddings
- Search indices
- Semantic relationships
- Similarity scores
Collection Structure
Collections:
├── {bot_id}_documents # Document embeddings
├── {bot_id}_conversations # Conversation embeddings
└── {bot_id}_cache # Cached query results
Vector Operations
- Insert: Add new embeddings
- Search: Find similar vectors
- Update: Modify metadata
- Delete: Remove outdated vectors
Local Storage
Working Directories
botserver/
├── work/ # Temporary processing
│ └── bot.gbai/ # Bot working files
├── logs/ # Application logs
├── cache/ # Local cache files
└── uploads/ # Temporary uploads
File Management
- Automatic cleanup of old files
- Size limits on uploads
- Temp file rotation
- Log file management
Data Persistence
Backup Strategy
-
Database Backups
- Daily PostgreSQL dumps
- Point-in-time recovery
- Automated backup scripts
-
Object Storage
- Drive replication
- Versioning enabled
- Cross-region backup
-
Configuration
- Version controlled
- Environment-specific
- Encrypted secrets
Data Retention
- Message history: 90 days default
- Session data: 30 days
- Temporary files: 24 hours
- Logs: 7 days rolling
- Backups: 30 days
BASIC Script Storage Operations
Saving Data
# Save to CSV file
SAVE "data/results.csv", column1, column2, column3
# Save with timestamp
let filename = "backup_" + FORMAT(NOW(), "YYYYMMDD") + ".txt"
SAVE filename, data
Reading Data
# Read from storage
let content = GET "documents/report.pdf"
# Read configuration
let config = GET "settings/config.json"
Storage Optimization
Performance Tips
-
Use appropriate storage
- PostgreSQL for structured data
- Drive for files
- Cache (Valkey) for sessions
- Qdrant for vectors
-
Implement caching
- Cache frequent queries
- Use cache for sessions
- Local cache for static files
-
Batch operations
- Bulk inserts
- Batch file uploads
- Grouped queries
Resource Management
- Monitor disk usage
- Set storage quotas
- Implement cleanup jobs
- Compress old data
Security
Data Encryption
- At Rest: Database encryption
- In Transit: TLS/SSL connections
- Sensitive Data: AES-GCM encryption
- Passwords: Never stored (Zitadel handles)
Access Control
- Role-based permissions
- Bot isolation
- User data segregation
- Audit logging
Monitoring
Storage Metrics
Monitor these metrics:
- Database size and growth
- Drive bucket usage
- Cache memory usage
- Qdrant index size
- Disk space available
Health Checks
- Database connectivity
- Drive availability
- Cache response time
- Qdrant query performance
- Disk space warnings
Troubleshooting
Common Issues
-
Out of Space
- Clean temporary files
- Archive old data
- Increase storage allocation
-
Slow Queries
- Add database indexes
- Optimize query patterns
- Increase cache size
-
Connection Failures
- Check service status
- Verify credentials
- Review network configuration
Best Practices
-
Regular Maintenance
- Vacuum PostgreSQL
- Clean drive buckets
- Flush cache
- Reindex Qdrant
-
Monitor Growth
- Track storage trends
- Plan capacity
- Set up alerts
-
Data Hygiene
- Remove orphaned data
- Archive old records
- Validate integrity
Configuration Reference
Storage Limits
# Database
MAX_CONNECTIONS=100
STATEMENT_TIMEOUT=30s
# Drive
MAX_OBJECT_SIZE=5GB
BUCKET_QUOTA=100GB
# Cache
MAX_MEMORY=2GB
EVICTION_POLICY=allkeys-lru
# Filesystem
UPLOAD_SIZE_LIMIT=100MB
TEMP_DIR_SIZE=10GB
Summary
BotServer's multi-layered storage architecture provides flexibility, performance, and reliability. By using the right storage solution for each data type and implementing proper caching and optimization strategies, the system can handle large-scale deployments while maintaining responsiveness.