- Added static save_file_states_static() helper method - Changed tokio::spawn calls to use Arc::clone instead of Arc::new(self.clone()) - This prevents double Arc wrapping which causes 'dispatch failure' errors - Fixes config.csv not syncing from bucket to database for salesianos/default bots
39 KiB
Production Environment Guide
⚠️ CRITICAL PRODUCTION RULES
READ THIS FIRST:
🚫 NEVER Start Services Directly
In production, NEVER start botserver or botui directly. Always use systemctl:
# ❌ NEVER DO THIS IN PRODUCTION:
/opt/gbo/bin/botserver # Wrong
./botserver # Wrong
/opt/gbo/bin/botserver & # Wrong
# ✅ ALWAYS USE THIS:
sudo incus exec system -- systemctl start botserver
sudo incus exec system -- systemctl restart botserver
sudo incus exec system -- systemctl stop botserver
sudo incus exec system -- systemctl status botserver
Why:
systemctlloads/opt/gbo/bin/.env(Vault credentials, paths, etc.)- Direct execution skips environment variables → services fail
systemctlmanages auto-restart, logging, and dependencies
🔐 Security Rules
- NEVER push secrets to git (API keys, passwords, tokens)
- NEVER commit
init.json(Vault unseal keys) - ALWAYS use Vault for secrets (see Vault Security Architecture)
- ONLY
VAULT_*environment variables allowed in.env
🚢 Deployment Rules
- NEVER deploy manually (scp, ssh copy) — use CI/CD only
- NEVER push to ALM without asking first
- ALWAYS push ALL submodules (botserver, botui, botlib) when pushing main repo
- ALWAYS use
systemctlto restart services after deployment
Infrastructure
Servers
| Host | IP | Purpose |
|---|---|---|
system |
<main-server-ip> |
Main botserver + botui container |
alm-ci |
<ci-runner-ip> |
CI/CD runner (Forgejo Actions) |
drive |
<storage-server-ip> |
Object storage |
monitor |
<monitor-server-ip> |
Monitoring service |
Port Mapping (system container)
| Service | Internal Port | External URL |
|---|---|---|
| botserver | 5858 |
https://system.example.com |
| botui | 5859 |
https://chat.example.com |
Access
# SSH to host
ssh admin@<host-ip>
# Execute inside system container
sudo incus exec system -- bash -c 'command'
# SSH from host to container (used by CI)
ssh -o StrictHostKeyChecking=no system "command"
Services
botserver.service
- Binary:
/opt/gbo/bin/botserver - Port:
5858 - User:
gbuser - Logs:
/opt/gbo/logs/out.log,/opt/gbo/logs/err.log - Config:
/etc/systemd/system/botserver.service - Env:
PORT=5858
ui.service
- Binary:
/opt/gbo/bin/botui - Port:
5859 - Config:
/etc/systemd/system/ui.service - Env:
BOTSERVER_URL=http://localhost:5858- ⚠️ MUST be
http://localhost:5858— NOThttps://system.example.com - Rust proxy runs server-side, needs direct localhost access
- JS client uses relative URLs through
chat.example.com
- ⚠️ MUST be
Data Directory
- Path:
/opt/gbo/data/ - Structure:
<botname>.gbai/<botname>.gbdialog/*.bas - Work dir:
/opt/gbo/work/(compiled .ast cache)
Stack Services (managed by botserver bootstrap)
- Vault: Secrets management
- PostgreSQL: Database (port 5432)
- Valkey: Cache (port 6379, password auth)
- MinIO: Object storage
- Zitadel: Identity provider
- LLM: llama.cpp
CI/CD Pipeline
Repositories
| Repo | ALM URL | GitHub URL |
|---|---|---|
| gb | https://alm.example.com/organization/gb.git |
git@github.com:organization/gb.git |
| botserver | https://alm.example.com/organization/BotServer.git |
git@github.com:organization/botserver.git |
| botui | https://alm.example.com/organization/BotUI.git |
git@github.com:organization/botui.git |
| botlib | https://alm.example.com/organization/botlib.git |
git@github.com:organization/botlib.git |
Push Order
# 1. Push submodules first
cd botserver && git push alm main && git push origin main && cd ..
cd botui && git push alm main && git push origin main && cd ..
# 2. Update root workspace references
git add botserver botui botlib
git commit -m "Update submodules: <description>"
git push alm main && git push origin main
Build Environment
- CI runner:
ci-runnercontainer (Debian Trixie, glibc 2.41) - Target:
systemcontainer (Debian 12 Bookworm, glibc 2.36) - ⚠️ GLIBC MISMATCH: Building on CI runner produces binaries incompatible with system container
- Solution: CI workflow transfers source to system container and builds there via SSH
Workflow File
- Location:
botserver/.forgejo/workflows/botserver.yaml - Triggers: Push to
mainbranch - Steps:
- Setup workspace on CI runner (clone repos)
- Transfer source to system container via
tar | ssh - Build inside system container (matches glibc 2.36)
- Deploy binary inside container
- Verify botserver is running
Common Operations
Check Service Status
# From host
sudo incus exec system -- systemctl status botserver --no-pager
sudo incus exec system -- systemctl status ui --no-pager
# Check if running
sudo incus exec system -- pgrep -f botserver
sudo incus exec system -- pgrep -f botui
View Logs
# Systemd journal
sudo incus exec system -- journalctl -u botserver --no-pager -n 50
sudo incus exec system -- journalctl -u ui --no-pager -n 50
# Application logs
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log
sudo incus exec system -- tail -50 /opt/gbo/logs/err.log
# Live tail
sudo incus exec system -- tail -f /opt/gbo/logs/out.log
Restart Services
CRITICAL PRODUCTION RULE: In production, NEVER start botserver or botui directly. Always use systemctl to ensure proper initialization, environment loading, and logging.
sudo incus exec system -- systemctl restart botserver
sudo incus exec system -- systemctl restart ui
PROHIBITED in production:
# ❌ NEVER DO THIS IN PRODUCTION:
sudo incus exec system -- /opt/gbo/bin/botserver # Wrong - no systemd integration
sudo incus exec system -- /opt/gbo/bin/botserver & # Wrong - no service management
sudo incus exec system -- cd /opt/gbo/bin && ./botserver # Wrong - missing env vars
# ✅ CORRECT - Always use systemctl:
sudo incus exec system -- systemctl start botserver
sudo incus exec system -- systemctl restart botserver
sudo incus exec system -- systemctl stop botserver
sudo incus exec system -- systemctl status botserver
Why:
systemctlloads/opt/gbo/bin/.env(viaEnvironmentFilein service definition)systemctlmanages process lifecycle, auto-restart, and dependenciessystemctlsends logs to/opt/gbo/logs/out.logand/opt/gbo/logs/err.log- Direct execution skips environment variables and systemd service configuration
Manual Deploy (emergency)
# Kill old process
sudo incus exec system -- killall botserver
# Copy binary (from host CI workspace or local)
sudo incus exec system -- cp /opt/gbo/ci/botserver/target/debug/botserver /opt/gbo/bin/botserver
sudo incus exec system -- chmod +x /opt/gbo/bin/botserver
sudo incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver
# Start service
sudo incus exec system -- systemctl start botserver
Transfer Bot Files to Production
# From local to prod host
tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai
scp /tmp/bots.tar.gz admin@<host-ip>:/tmp/
# From host to container
sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'
# Clear compiled cache
sudo incus exec system -- find /opt/gbo/data -name "*.ast" -delete
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
Snapshots
# List snapshots
sudo incus snapshot list system
# Restore snapshot
sudo incus snapshot restore system <snapshot-name>
DriveMonitor & Bot Configuration Sync
DriveMonitor Architecture
DriveMonitor is a background service that synchronizes bot files from MinIO (S3-compatible storage) to the local filesystem and database. It monitors three directories per bot:
| Directory | Purpose | Sync Behavior |
|---|---|---|
{bot}.gbai/{bot}.gbdialog/ |
BASIC scripts (.bas) | Downloads and compiles on change |
{bot}.gbai/{bot}.gbot/ |
Configuration files | Syncs to bot_configuration table |
{bot}.gbkb/ |
Knowledge base documents | Downloads and indexes for vector search |
Bot Configuration Database Tables
bot_configuration (main config table)
-- Location: botserver database
SELECT * FROM bot_configuration WHERE bot_id = '<bot_uuid>';
-- Key columns:
-- - bot_id: Bot UUID (link to bots table)
-- - config_key: Configuration key (e.g., "llm-provider", "system-prompt")
-- - config_value: Configuration value
-- - config_type: Type (string, boolean, number)
-- - is_encrypted: Whether value is encrypted
-- - updated_at: Last modification timestamp
gbot_config_sync (sync tracking table)
-- Location: botserver database
-- Tracks config.csv sync status from bucket
SELECT * FROM gbot_config_sync g
JOIN bots b ON g.bot_id = b.id
WHERE b.name = 'salesianos';
-- Key columns:
-- - bot_id: Bot UUID
-- - config_file_path: Path to config.csv in bucket
-- - last_sync_at: Timestamp of last successful sync
-- - file_hash: ETag/MD5 of synced file
-- - sync_count: Number of times synced
config.csv Sync Process
File Locations:
- Source:
{bot}.gbai/{bot}.gbot/config.csvin MinIO bucket - Sync method: DriveMonitor → ConfigManager →
bot_configurationtable - Sync frequency: Every 10 seconds (DriveMonitor periodic check)
Sync Trigger Conditions:
- File ETag changes in MinIO
- Initial DriveMonitor startup
- Manual botserver restart
CSV Format:
llm-provider,groq
llm-api-key,sk-xxx
llm-url,http://localhost:8085
system-prompt-file,PROMPT.md
theme-color1,#cc0000
theme-title,MyBot
whatsapp-id,botname
Checking Bot Configuration Status
Method 1: Query bot_configuration table
# Get all config for a bot
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
SELECT b.name, bc.config_key, bc.config_value, bc.updated_at
FROM bot_configuration bc
JOIN bots b ON bc.bot_id = b.id
WHERE b.name = 'salesianos'
ORDER BY bc.config_key;
"
# Get specific LLM provider config
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
SELECT config_key, config_value, updated_at
FROM bot_configuration
WHERE bot_id = (
SELECT id FROM bots WHERE name = 'salesianos'
)
AND config_key LIKE 'llm-%'
ORDER BY config_key;
"
Method 2: Check DriveMonitor sync status
# Check if config.csv has been synced
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
SELECT b.name, gcs.last_sync_at, gcs.sync_count, gcs.config_file_path
FROM gbot_config_sync gcs
JOIN bots b ON gcs.bot_id = b.id
WHERE b.name IN ('salesianos', 'default');
"
-- Empty result = DriveMonitor hasn't synced config.csv yet
-- If sync_count = 0, config.csv exists but hasn't been processed
Method 3: Direct MinIO inspection
# Check if config.csv exists in bucket
sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/
# View config.csv contents
sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv
# Check file ETag (for sync comparison)
sudo incus exec drive -- /opt/gbo/bin/mc stat local/salesianos.gbai/salesianos.gbot/config.csv
DriveMonitor Debugging Logs
Key log patterns to monitor
# Monitor DriveMonitor activity in real-time
sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)"
# Check for config.csv sync attempts
sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -20
# Check for config synchronization
sudo incus exec system -- grep "sync_gbot_config" /opt/gbo/logs/out.log | tail -20
# Check for DriveMonitor errors
sudo incus exec system -- grep -i "drive.*error" /opt/gbo/logs/err.log | tail -20
Expected successful sync logs
check_gbot: Checking bucket salesianos.gbai for config.csv changes
check_gbot: Found config.csv at path: salesianos.gai/salesianos.gbot/config.csv
info config:Synced config.csv for bot <uuid> - updated 3 keys
Error patterns and meanings
# Config.csv not found in bucket
check_gbot: Config file not found or inaccessible: path/to/config.csv
# Sync to database failed
error config:Failed to sync_gbot_config: <database error>
# DriveMonitor not running
(no check_gbot logs in output.log)
# MinIO connection failed
error drive_monitor:S3/MinIO unavailable for bucket <bucket>
Common Issues and Fixes
Issue 1: config.csv not syncing to database
Symptoms:
gbot_config_synctable empty (0 rows)- LLM provider changes in bucket not reflected in bot behavior
- Database shows old configuration values
Diagnosis:
# 1. Check if config.csv exists in bucket
sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/
# 2. Check DriveMonitor logs for sync attempts
sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -10
# 3. Check if DriveMonitor is running for the bot
sudo incus exec system -- ps aux | grep botserver
Root Causes:
- config.csv missing from
{bot}.gai/{bot}.gbot/folder - DriveMonitor not started for the bot
- MinIO connection issues
- Database write permissions
Fixes:
# Case 1: Create missing config.csv
sudo incus exec drive -- bash -c '
cat > /tmp/config.csv << EOF
llm-provider,groq
llm-api-key,your-api-key
llm-url,http://localhost:8085
system-prompt-file,PROMPT.md
theme-color1,#cc0000
theme-title,Salesianos
EOF
/opt/gbo/bin/mc cp /tmp/config.csv local/salesianos.gbai/salesianos.gbot/config.csv
'
# Case 2: Restart botserver to reinitialize DriveMonitor
sudo incus exec system -- systemctl restart botserver
# Case 3: Force immediate sync by touching config.csv
sudo incus exec drive -- /opt/gbo/bin/mc cp local/salesianos.gbai/salesianos.gbot/config.csv local/salesianos.gbai/salesianos.gbot/config.csv
Issue 2: LLM provider changes not taking effect
Symptoms:
- config.csv shows correct provider (e.g., groq)
- Bot still uses old provider
- Database shows old value
Diagnosis:
# Compare bucket vs database
BUCKET_PROVIDER=$(sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv | grep "^llm-provider" | cut -d',' -f2)
DB_PROVIDER=$(sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c "
SELECT config_value FROM bot_configuration
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos')
AND config_key = 'llm-provider';
")
echo "Bucket: $BUCKET_PROVIDER"
echo "Database: $DB_PROVIDER"
# Check last sync time
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c "
SELECT last_sync_at FROM gbot_config_sync
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos');
"
Fix:
# If sync is stale (> 10 minutes), restart DriveMonitor
sudo incus exec system -- systemctl restart botserver
# Or manually update config value in database (temporary fix)
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
UPDATE bot_configuration
SET config_value = 'groq', updated_at = NOW()
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos')
AND config_key = 'llm-provider';
"
Issue 3: DriveMonitor not checking for changes
Symptoms:
- No new log entries after 30 seconds
- File changes in bucket not detected
- Bot compilation not happening after .bas file updates
Diagnosis:
# Check DriveMonitor loop logs
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep "DRIVE_MONITOR.*Inside monitoring loop"
# Check if is_processing flag is stuck
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -E "(is_processing|monitoring loop)"
Fix:
# Restart botserver to clear stuck state
sudo incus exec system -- systemctl restart botserver
# Monitor startup logs to verify DriveMonitor started
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep "Drive Monitor"
Database Schema Reference
List all bot databases
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "\l" | grep bot_
List tables in a specific bot database
sudo incus exec tables -- psql -h localhost -U postgres -d bot_salesianos -c "\dt"
List botserver management tables
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "\dt" | grep -E "(bot|config|sync)"
Connection Methods Summary
| Method | Use Case | Command Pattern |
|---|---|---|
| SSH to host | Initial access, file transfer | ssh admin@63.141.255.9 |
| incus exec | Execute inside container | sudo incus exec system -- command |
| psql direct | Database queries from container | sudo incus exec tables -- psql ... |
| mc (MinIO CLI) | Inspect buckets, copy files | sudo incus exec drive -- /opt/gbo/bin/mc ... |
| HTTP/curl | Service health checks | curl http://<ip>:5858/health |
| journalctl | Systemd service logs | sudo incus exec system -- journalctl -u botserver |
Vault Security Architecture
Overview
The production environment uses HashiCorp Vault as the centralized secrets management system. All sensitive credentials (database passwords, API keys, tokens) are stored in Vault, NEVER in code or environment files.
Vault Connection Flow
1. botserver starts
↓
2. Reads VAULT_ADDR, VAULT_TOKEN from .env
↓
3. Initializes VaultClient with TLS/mTLS
↓
4. Reads secrets from Vault paths (gbo/tables, gbo/drive, etc.)
↓
5. Falls back to defaults if Vault unavailable
Environment Variables (Allowed)
File Location: /opt/gbo/bin/.env (system container)
# Vault Connection (MANDATORY for production)
VAULT_ADDR=https://<vault-ip>:8200
VAULT_TOKEN=<root-token>
VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
# Optional: Skip TLS verification (NOT recommended for production)
VAULT_SKIP_VERIFY=false
# Optional: Use mTLS certificates
VAULT_CLIENT_CERT=/opt/gbo/conf/system/certificates/botserver/client.crt
VAULT_CLIENT_KEY=/opt/gbo/conf/system/certificates/botserver/client.key
# Optional: Cache TTL in seconds (default: 300)
VAULT_CACHE_TTL=300
# Server Configuration
PORT=5858
DATA_DIR=/opt/gbo/data/
WORK_DIR=/opt/gbo/work/
LOAD_ONLY=default,salesianos
Security Rule:
- ONLY
VAULT_*environment variables are allowed in.env - All other secrets MUST come from Vault
- Hardcoded secrets in code are FORBIDDEN (see AGENTS.md)
Vault Secret Paths Structure
System-Wide Paths (Global)
| Path | Purpose | Example Keys |
|---|---|---|
gbo/tables |
Database (PostgreSQL) | host, port, database, username, password |
gbo/drive |
MinIO (Object Storage) | host, accesskey, secret |
gbo/cache |
Valkey (Redis) | host, port, password |
gbo/directory |
Zitadel (Auth) | url, project_id, client_id, client_secret |
gbo/email |
SMTP Email | smtp_host, smtp_port, smtp_user, smtp_password |
gbo/llm |
LLM Configuration | url, model, openai_key, anthropic_key |
gbo/vectordb |
Qdrant (Vector DB) | url, api_key |
gbo/jwt |
JWT Signing | secret |
gbo/meet |
Jitsi Meet | url, app_id, app_secret |
gbo/alm |
ALM Repository | url, token |
gbo/encryption |
Encryption Keys | master_key |
gbo/system/observability |
Monitoring | url, org, bucket, token |
gbo/system/security |
Security Policies | require_auth, anonymous_paths |
gbo/system/cloud |
Cloud Config | region, access_key, secret_key |
gbo/system/app |
Application Settings | url, environment |
gbo/system/models |
BotModels API | url |
Organization-Specific Paths
| Path Pattern | Purpose |
|---|---|
gbo/orgs/{org_id}/config |
Organization configuration |
gbo/orgs/{org_id}/bots/{bot_id} |
Bot-specific secrets |
gbo/orgs/{org_id}/users/{user_id} |
User-specific secrets |
gbo/tenants/{tenant_id}/infrastructure |
Tenant database/cache/drive |
gbo/tenants/{tenant_id}/config |
Tenant configuration |
Credential Resolution Hierarchy
For bot email configuration (example):
1. Check gbo/orgs/{org_id}/bots/{bot_id}/email
2. Fallback: gbo/bots/default/email
3. Fallback: gbo/email
4. Fallback: Environment variables (development only)
Vault Client Initialization (Code Reference)
File: botserver/src/core/secrets/mod.rs
// SecretsManager::from_env() reads:
// - VAULT_ADDR (required)
// - VAULT_TOKEN (required)
// - VAULT_CACERT (optional, has default)
// - VAULT_SKIP_VERIFY (optional, default: false)
// - VAULT_CLIENT_CERT (optional, mTLS)
// - VAULT_CLIENT_KEY (optional, mTLS)
// - VAULT_CACHE_TTL (optional, default: 300s)
impl SecretsManager {
pub fn from_env() -> Result<Self> {
let addr = env::var("VAULT_ADDR").unwrap_or_default();
let token = env::var("VAULT_TOKEN").unwrap_or_default();
if token.is_empty() || addr.is_empty() {
// Vault not configured - use environment variables directly
warn!("Vault not configured. Using environment variables directly.");
return Ok(Self { client: None, enabled: false, ... });
}
// Initialize VaultClient with TLS
let client = VaultClient::new(settings)?;
Ok(Self { client: Some(client), enabled: true, ... })
}
}
Vault Operations - Production Usage
Read Secrets from Vault
# From system container (using vault CLI)
sudo incus exec system -- bash -c '
export VAULT_ADDR=https://10.157.134.250:8200
export VAULT_TOKEN=<vault-token>
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
# Read database secrets
vault kv get -field=password secret/gbo/tables
vault kv get secret/gbo/tables
# Read drive secrets
vault kv get secret/gbo/drive
# Read LLM configuration
vault kv get secret/gbo/llm
'
Read Secrets via HTTP API (from any container)
sudo incus exec system -- curl -sf \
--cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
-H "X-Vault-Token: <vault-token>" \
https://10.157.134.250:8200/v1/secret/data/gbo/drive | jq
Verify Vault Health
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
# Expected output:
# {"initialized":true,"sealed":false,"standby":false,"performance_standby":false,"replication_performance_mode":"disabled","replication_dr_mode":"disabled","server_time_utc":"2026-04-10T13:55:00.123Z"}
init.json (Vault Initialization Data)
Location: /opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json
Purpose: Stores Vault unseal keys and root token (created during Vault initialization)
Contents:
{
"recovery_keys_b64": [],
"recovery_keys_hex": [],
"recovery_keys_shares": 0,
"recovery_keys_threshold": 0,
"root_token": "<vault-token>",
"unseal_keys_b64": ["<5 unseal keys base64-encoded>"],
"unseal_keys_hex": ["<5 unseal keys hex-encoded>"],
"unseal_shares": 5,
"unseal_threshold": 3
}
Security Notes:
root_token: Used to authenticate to Vault as adminunseal_keys: Required to unseal Vault after restart (5 keys, need 3 to unseal)- CRITICAL: Store
init.jsonin a secure, encrypted location - Never commit
init.jsonto git or store in repo
Troubleshooting Vault Connection
Issue 1: Botserver cannot connect to Vault
Symptoms:
- Logs show "Vault connection failed"
- Secrets fall back to defaults
- Bot cannot authenticate to database
Diagnosis:
# Check Vault is running
sudo incus exec vault -- systemctl status vault
# Check Vault health
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
# Check .env has Vault credentials
sudo incus exec system -- grep "^VAULT_" /opt/gbo/bin/.env
# Test Vault connection from system container
sudo incus exec system -- bash -c '
curl -k -sf --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
-H "X-Vault-Token: $(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2)" \
https://10.157.134.250:8200/v1/secret/data/gbo/tables
'
Common Causes:
- Vault service not running (vault container stopped)
VAULT_TOKENexpired or invalid- TLS certificate path incorrect or CA certificate missing
- Network connectivity between system and vault containers
Fix:
# 1. Restart Vault if stopped
sudo incus exec vault -- systemctl restart vault
# 2. Generate new token if expired
sudo incus exec vault -- bash -c '
export VAULT_ADDR=https://localhost:8200
export VAULT_TOKEN=<root-token-from-init.json>
vault token create -policy="botserver" -ttl="8760h" -format=json | jq -r .auth.client_token
'
# 3. Update .env with new token
sudo incus exec system -- sed -i "s|VAULT_TOKEN=.*|VAULT_TOKEN=<new-token>|" /opt/gbo/bin/.env
# 4. Restart botserver
sudo incus exec system -- systemctl restart botserver
Issue 2: Secrets not being read from Vault
Symptoms:
- Logs show "Vault read failed for 'gbo/drive'"
- Services use default credentials
- DriveMonitor cannot access MinIO
Diagnosis:
# Check if Vault has secrets configured
sudo incus exec system -- bash -c '
export VAULT_ADDR=https://10.157.134.250:8200
export VAULT_TOKEN=$(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2)
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
echo "=== Database Secrets ==="
vault kv get secret/gbo/tables || echo "NOT FOUND"
echo "=== Drive Secrets ==="
vault kv get secret/gbo/drive || echo "NOT FOUND"
echo "=== LLM Secrets ==="
vault kv get secret/gbo/llm || echo "NOT FOUND"
'
Fix - Adding Secrets to Vault:
sudo incus exec vault -- bash -c '
export VAULT_ADDR=https://localhost:8200
export VAULT_TOKEN=<root-token>
# Add database secrets
vault kv put secret/gbo/tables \
host=<tables-ip> \
port=5432 \
database=botserver \
username=gbuser \
password=<secure-password>
# Add drive (MinIO) secrets
vault kv put secret/gbo/drive \
host=<drive-ip> \
port=9100 \
accesskey=<minio-access-key> \
secret=<minio-secret>
# Add LLM secrets
vault kv put secret/gbo/llm \
url=http://localhost:8085 \
model=gpt-4 \
openai_key=<openai-api-key> \
anthropic_key=<anthropic-api-key>
'
Issue 3: Vault sealed after restart
Symptoms:
- All Vault operations fail
- botserver cannot read secrets
- Logs show "Vault is sealed"
Diagnosis:
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health | jq .sealed
Fix - Unseal Vault:
sudo incus exec vault -- bash -c '
# Need 3 of 5 unseal keys from init.json
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>
# Verify unsealed
vault status
'
Issue 4: TLS certificate errors
Symptoms:
- "certificate verify failed" errors
- TLS handshake failures
- curl: (60) SSL certificate problem
Diagnosis:
sudo incus exec system -- bash -c '
# Check CA certificate exists
ls -la /opt/gbo/conf/system/certificates/ca/ca.crt
# Test certificate
openssl x509 -in /opt/gbo/conf/system/certificates/ca/ca.crt -text -noout
'
Fix:
# If CA cert is missing, copy from vault container
sudo incus exec vault -- cp /opt/gbo/conf/vault/ca.crt /tmp/
sudo incus exec system -- mkdir -p /opt/gbo/conf/system/certificates/ca/
sudo incus exec system -- bash -c '
# Copy certificate from vault container
incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt
cp /tmp/ca.crt /opt/gbo/conf/system/certificates/ca/
chmod 644 /opt/gbo/conf/system/certificates/ca/ca.crt
'
Security Best Practices
-
Never commit secrets to git
- No API keys, passwords, tokens in code
- Use Vault for ALL sensitive data
- Init secrets from
SecretsManager::from_env()
-
Use Vault for all service credentials
- Database passwords:
gbo/tables - MinIO keys:
gbo/drive - LLM API keys:
gbo/llm - Email passwords:
gbo/email
- Database passwords:
-
Rotate credentials regularly
- Generate new tokens/keys periodically
- Update Vault using
vault kv put - No need to restart services (next read gets new values)
-
Enable TLS/mTLS in production
- Always use
VAULT_CACERT - Enable mTLS for critical services:
VAULT_CLIENT_CERT+VAULT_CLIENT_KEY - Never use
VAULT_SKIP_VERIFY=truein production
- Always use
-
Limit token lifetimes
- Root token: single use or very short TTL
- Service tokens: limited to needed time (e.g., 8760h = 1 year)
- Generate new tokens when old ones expire
-
Audit Vault access
# Check recent Vault operations sudo incus exec vault -- vault audit list sudo incus exec vault -- vault audit file /var/log/vault_audit.log
Vault Backup & Recovery
Backup Vault Data
# Snapshot vault container (includes all secrets)
sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M)
# Export Vault config (init.json with unseal keys)
sudo incus exec vault -- cat /opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json > /tmp/vault-init.json
# Backup all secrets (JSON format)
sudo incus exec vault -- bash -c '
export VAULT_ADDR=https://localhost:8200
export VAULT_TOKEN=<root-token>
# Backup each path
for path in gbo/tables gbo/drive gbo/cache gbo/llm; do
vault kv get -format=json secret/$path > /tmp/vault-$path.json
done
'
Restore from Snapshot
# Stop vault
sudo incus exec vault -- systemctl stop vault
# Restore snapshot
sudo incus snapshot restore vault <snapshot-name>
# Start vault
sudo incus exec vault -- systemctl start vault
# Wait for Vault to be ready
sleep 10
# Verify health
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
Troubleshooting
GLIBC Version Mismatch
Symptom: GLIBC_2.39 not found or GLIBC_2.38 not found
Cause: Binary compiled on CI runner (glibc 2.41) but runs in system container (glibc 2.36)
Fix: CI workflow must build inside the system container. Check botserver.yaml uses SSH to build in container.
botserver Not Starting
# Check binary
sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found"
# Check direct execution
sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1
# Check data directory
sudo incus exec system -- ls -la /opt/gbo/data/
botui Can't Reach botserver
# Check BOTSERVER_URL
sudo incus exec system -- grep BOTSERVER_URL /etc/systemd/system/ui.service
# Must be http://localhost:5858, NOT https://system.example.com
# Fix:
sudo incus exec system -- sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' /etc/systemd/system/ui.service
sudo incus exec system -- systemctl daemon-reload
sudo incus exec system -- systemctl restart ui
Suggestions Not Showing
# Check bot files exist
sudo incus exec system -- ls -la /opt/gbo/data/<bot>.gbai/<bot>.gbdialog/
# Check for compilation errors
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep -i "error\|fail\|compile"
# Clear cache and restart
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
sudo incus exec system -- systemctl restart botserver
IPv6 DNS Issues
Symptom: External API calls (Groq, Cloudflare) timeout
Cause: Container DNS returns AAAA records but no IPv6 connectivity
Fix: Container has IPV6=no in network config and gai.conf labels. If issues persist, check RES_OPTIONS=inet4 in botserver.service.
Vault Connection & Service Discovery Issues
Symptom: Logs show Failed to read data directory <development-path> or Config scan failed
Cause: Botserver is using hardcoded development paths instead of production paths
Fix:
-
Check current configuration:
# Check .env file sudo incus exec system -- cat /opt/gbo/bin/.env # Check data directory sudo incus exec system -- ls -la /opt/gbo/data/ sudo incus exec system -- ls -la /opt/gbo/work/ -
Verify Vault connection:
# Test Vault from system container sudo incus exec system -- curl -k -sf https://<vault-ip>:8200/v1/sys/health # Check Vault token sudo incus exec system -- grep VAULT_TOKEN /opt/gbo/bin/.env -
Check service discovery:
# Check if botserver is reading Vault secrets sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i vault # Check for service configuration errors sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "config\|service" -
Fix data directory paths:
- Ensure botserver uses
/opt/gbo/data/instead of development paths - Update configuration if hardcoded paths exist
- Restart botserver after fixing
- Ensure botserver uses
-
Verify all services are accessible:
# Check PostgreSQL sudo incus exec system -- pg_isready -h <database-ip> -p 5432 # Check Valkey sudo incus exec system -- redis-cli -h <cache-ip> -a <password> ping # Check MinIO sudo incus exec system -- curl -sf http://<storage-ip>:9100/minio/health/live -
Update botserver configuration:
- Ensure botserver reads from
/opt/gbo/bin/.envfor Vault configuration - Verify service discovery uses Vault to get service endpoints
- Check that data directory is set to
/opt/gbo/data/in configuration - Update systemd service if needed:
sudo incus exec system -- cat /etc/systemd/system/botserver.service # Ensure EnvironmentFile=/opt/gbo/bin/.env is present
- Ensure botserver reads from
-
Test after fixes:
# Restart botserver sudo incus exec system -- systemctl restart botserver # Wait for startup sleep 10 # Check logs for errors sudo incus exec system -- tail -50 /opt/gbo/logs/err.log # Verify health endpoint curl -sf http://<main-server-ip>:5858/health
Vault Connection Errors
Symptom: Vault connection failed or Vault token invalid
Fix:
# Check Vault is running
sudo incus exec vault -- systemctl status vault
# Check Vault health
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
# Verify token is valid
sudo incus exec system -- bash -c '
export VAULT_ADDR=https://<vault-ip>:8200
export VAULT_TOKEN=<vault_token>
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
vault token lookup
'
# If token is invalid, generate new one
sudo incus exec vault -- bash -c '
export VAULT_ADDR=https://localhost:8200
export VAULT_TOKEN=<root_token>
vault token create -policy="botserver" -ttl="8760h"
'
# Update .env with new token
sudo incus exec system -- sed -i 's|VAULT_TOKEN=.*|VAULT_TOKEN=<new_token>|' /opt/gbo/bin/.env
sudo incus exec system -- systemctl restart botserver
Service Discovery Failures
Symptom: Service not found or Failed to connect to service
Fix:
# Check if service is running
sudo incus exec tables -- systemctl status postgresql
sudo incus exec cache -- systemctl status valkey
sudo incus exec drive -- systemctl status minio
# Check if service is accessible from system container
sudo incus exec system -- nc -zv <database-ip> 5432 # PostgreSQL
sudo incus exec system -- nc -zv <cache-ip> 6379 # Valkey
sudo incus exec system -- nc -zv <storage-ip> 9100 # MinIO
# Check Vault has service configuration
sudo incus exec system -- bash -c '
export VAULT_ADDR=https://<vault-ip>:8200
export VAULT_TOKEN=<vault_token>
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
vault kv list secret/botserver
'
# If service config is missing, add it (see Vault Configuration section)
Monitoring & Verification
Check botserver is working correctly:
# Health check
curl -sf http://<main-server-ip>:5858/health
# Check logs for errors
sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "error\|fail"
# Check logs for successful service connections
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "connected\|service\|vault"
# Verify data directory is correct
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "data\|work"
# Should show /opt/gbo/data/ and /opt/gbo/work/, not development paths
Expected log output:
info vault:Connected to Vault at https://<vault-ip>:8200
info service_discovery:Loaded service configuration from Vault
info database:Connected to PostgreSQL at <database-ip>:5432
info cache:Connected to Valkey at <cache-ip>:6379
info storage:Connected to MinIO at http://<storage-ip>:9100
info watcher:Watching data directory /opt/gbo/data
info botserver:BotServer started successfully on port 5858
If logs show errors:
- Check Vault connection (see Vault Connection Errors section)
- Check service accessibility (see Service Discovery Failures section)
- Fix data directory paths (see Fix Development Paths in Production section)
- Restart botserver and verify again
Vault Backup & Restore
Create Vault snapshot:
# Stop Vault
sudo incus exec vault -- systemctl stop vault
# Create snapshot
sudo incus snapshot create vault manual-$(date +%Y-%m-%d-%H%M)
# Start Vault
sudo incus exec vault -- systemctl start vault
# Verify
sudo incus snapshot list vault
Restore Vault from snapshot:
# Stop Vault
sudo incus exec vault -- systemctl stop vault
# List snapshots
sudo incus snapshot list vault
# Restore from latest snapshot
sudo incus snapshot restore vault <snapshot-name>
# Start Vault
sudo incus exec vault -- systemctl start vault
# Verify Vault is running
sudo incus exec vault -- systemctl status vault
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
Automated snapshots:
# Create cron job for daily snapshots
sudo incus exec vault -- bash -c 'cat > /etc/cron.daily/vault-snapshot << EOF
#!/bin/bash
systemctl stop vault
incus snapshot create vault daily-$(date +\%Y\%m\%d)
systemctl start vault
EOF
chmod +x /etc/cron.daily/vault-snapshot'
Update Botserver for Production
Required changes in botserver code:
-
Read configuration from Vault:
- Add Vault client initialization
- Read service endpoints from Vault
- Read secrets from Vault
- Fallback to environment variables if Vault is unavailable
-
Use production paths:
- Remove hardcoded development paths
- Use environment variables for data directory
- Default to
/opt/gbo/data/for production
-
Update .env file:
# /opt/gbo/bin/.env VAULT_ADDR=https://<vault-ip>:8200 VAULT_TOKEN=<vault_token> VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt DATA_DIR=/opt/gbo/data/ WORK_DIR=/opt/gbo/work/ PORT=5858 -
Update systemd service:
sudo incus exec system -- cat > /etc/systemd/system/botserver.service << 'EOF' [Unit] Description=BotServer Service After=network.target [Service] User=root Group=root WorkingDirectory=/opt/gbo/bin EnvironmentFile=/opt/gbo/bin/.env ExecStart=/opt/gbo/bin/botserver --noconsole Restart=always RestartSec=5 StandardOutput=append:/opt/gbo/logs/out.log StandardError=append:/opt/gbo/logs/err.log [Install] WantedBy=multi-user.target EOF sudo incus exec system -- systemctl daemon-reload sudo incus exec system -- systemctl restart botserver -
Deploy updated botserver:
# Push changes to ALM cd botserver && git push alm main && git push origin main # CI will build and deploy automatically # Or manually deploy (see Manual Deploy section)
Security
- NEVER push secrets to git
- NEVER commit files to root with credentials
- Vault is single source of truth for secrets
- CI/CD is the only deployment method — never manually scp binaries
- ALM is production — ask before pushing