- Added static save_file_states_static() helper method - Changed tokio::spawn calls to use Arc::clone instead of Arc::new(self.clone()) - This prevents double Arc wrapping which causes 'dispatch failure' errors - Fixes config.csv not syncing from bucket to database for salesianos/default bots
1336 lines
39 KiB
Markdown
1336 lines
39 KiB
Markdown
# Production Environment Guide
|
|
|
|
## ⚠️ CRITICAL PRODUCTION RULES
|
|
|
|
**READ THIS FIRST:**
|
|
|
|
### 🚫 NEVER Start Services Directly
|
|
|
|
In production, **NEVER** start botserver or botui directly. Always use `systemctl`:
|
|
|
|
```bash
|
|
# ❌ NEVER DO THIS IN PRODUCTION:
|
|
/opt/gbo/bin/botserver # Wrong
|
|
./botserver # Wrong
|
|
/opt/gbo/bin/botserver & # Wrong
|
|
|
|
# ✅ ALWAYS USE THIS:
|
|
sudo incus exec system -- systemctl start botserver
|
|
sudo incus exec system -- systemctl restart botserver
|
|
sudo incus exec system -- systemctl stop botserver
|
|
sudo incus exec system -- systemctl status botserver
|
|
```
|
|
|
|
**Why:**
|
|
- `systemctl` loads `/opt/gbo/bin/.env` (Vault credentials, paths, etc.)
|
|
- Direct execution skips environment variables → services fail
|
|
- `systemctl` manages auto-restart, logging, and dependencies
|
|
|
|
### 🔐 Security Rules
|
|
|
|
- **NEVER** push secrets to git (API keys, passwords, tokens)
|
|
- **NEVER** commit `init.json` (Vault unseal keys)
|
|
- **ALWAYS** use Vault for secrets (see [Vault Security Architecture](#vault-security-architecture))
|
|
- **ONLY** `VAULT_*` environment variables allowed in `.env`
|
|
|
|
### 🚢 Deployment Rules
|
|
|
|
- **NEVER** deploy manually (scp, ssh copy) — use CI/CD only
|
|
- **NEVER** push to ALM without asking first
|
|
- **ALWAYS** push ALL submodules (botserver, botui, botlib) when pushing main repo
|
|
- **ALWAYS** use `systemctl` to restart services after deployment
|
|
|
|
---
|
|
|
|
## Infrastructure
|
|
|
|
### Servers
|
|
|
|
| Host | IP | Purpose |
|
|
|------|-----|---------|
|
|
| `system` | `<main-server-ip>` | Main botserver + botui container |
|
|
| `alm-ci` | `<ci-runner-ip>` | CI/CD runner (Forgejo Actions) |
|
|
| `drive` | `<storage-server-ip>` | Object storage |
|
|
| `monitor` | `<monitor-server-ip>` | Monitoring service |
|
|
|
|
### Port Mapping (system container)
|
|
|
|
| Service | Internal Port | External URL |
|
|
|---------|--------------|--------------|
|
|
| botserver | `5858` | `https://system.example.com` |
|
|
| botui | `5859` | `https://chat.example.com` |
|
|
|
|
### Access
|
|
|
|
```bash
|
|
# SSH to host
|
|
ssh admin@<host-ip>
|
|
|
|
# Execute inside system container
|
|
sudo incus exec system -- bash -c 'command'
|
|
|
|
# SSH from host to container (used by CI)
|
|
ssh -o StrictHostKeyChecking=no system "command"
|
|
```
|
|
|
|
## Services
|
|
|
|
### botserver.service
|
|
|
|
- **Binary**: `/opt/gbo/bin/botserver`
|
|
- **Port**: `5858`
|
|
- **User**: `gbuser`
|
|
- **Logs**: `/opt/gbo/logs/out.log`, `/opt/gbo/logs/err.log`
|
|
- **Config**: `/etc/systemd/system/botserver.service`
|
|
- **Env**: `PORT=5858`
|
|
|
|
### ui.service
|
|
|
|
- **Binary**: `/opt/gbo/bin/botui`
|
|
- **Port**: `5859`
|
|
- **Config**: `/etc/systemd/system/ui.service`
|
|
- **Env**: `BOTSERVER_URL=http://localhost:5858`
|
|
- ⚠️ MUST be `http://localhost:5858` — NOT `https://system.example.com`
|
|
- Rust proxy runs server-side, needs direct localhost access
|
|
- JS client uses relative URLs through `chat.example.com`
|
|
|
|
### Data Directory
|
|
|
|
- **Path**: `/opt/gbo/data/`
|
|
- **Structure**: `<botname>.gbai/<botname>.gbdialog/*.bas`
|
|
- **Work dir**: `/opt/gbo/work/` (compiled .ast cache)
|
|
|
|
### Stack Services (managed by botserver bootstrap)
|
|
|
|
- **Vault**: Secrets management
|
|
- **PostgreSQL**: Database (port 5432)
|
|
- **Valkey**: Cache (port 6379, password auth)
|
|
- **MinIO**: Object storage
|
|
- **Zitadel**: Identity provider
|
|
- **LLM**: llama.cpp
|
|
|
|
## CI/CD Pipeline
|
|
|
|
### Repositories
|
|
|
|
| Repo | ALM URL | GitHub URL |
|
|
|------|---------|------------|
|
|
| gb | `https://alm.example.com/organization/gb.git` | `git@github.com:organization/gb.git` |
|
|
| botserver | `https://alm.example.com/organization/BotServer.git` | `git@github.com:organization/botserver.git` |
|
|
| botui | `https://alm.example.com/organization/BotUI.git` | `git@github.com:organization/botui.git` |
|
|
| botlib | `https://alm.example.com/organization/botlib.git` | `git@github.com:organization/botlib.git` |
|
|
|
|
### Push Order
|
|
|
|
```bash
|
|
# 1. Push submodules first
|
|
cd botserver && git push alm main && git push origin main && cd ..
|
|
cd botui && git push alm main && git push origin main && cd ..
|
|
|
|
# 2. Update root workspace references
|
|
git add botserver botui botlib
|
|
git commit -m "Update submodules: <description>"
|
|
git push alm main && git push origin main
|
|
```
|
|
|
|
### Build Environment
|
|
|
|
- **CI runner**: `ci-runner` container (Debian Trixie, glibc 2.41)
|
|
- **Target**: `system` container (Debian 12 Bookworm, glibc 2.36)
|
|
- **⚠️ GLIBC MISMATCH**: Building on CI runner produces binaries incompatible with system container
|
|
- **Solution**: CI workflow transfers source to system container and builds there via SSH
|
|
|
|
### Workflow File
|
|
|
|
- **Location**: `botserver/.forgejo/workflows/botserver.yaml`
|
|
- **Triggers**: Push to `main` branch
|
|
- **Steps**:
|
|
1. Setup workspace on CI runner (clone repos)
|
|
2. Transfer source to system container via `tar | ssh`
|
|
3. Build inside system container (matches glibc 2.36)
|
|
4. Deploy binary inside container
|
|
5. Verify botserver is running
|
|
|
|
## Common Operations
|
|
|
|
### Check Service Status
|
|
|
|
```bash
|
|
# From host
|
|
sudo incus exec system -- systemctl status botserver --no-pager
|
|
sudo incus exec system -- systemctl status ui --no-pager
|
|
|
|
# Check if running
|
|
sudo incus exec system -- pgrep -f botserver
|
|
sudo incus exec system -- pgrep -f botui
|
|
```
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# Systemd journal
|
|
sudo incus exec system -- journalctl -u botserver --no-pager -n 50
|
|
sudo incus exec system -- journalctl -u ui --no-pager -n 50
|
|
|
|
# Application logs
|
|
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log
|
|
sudo incus exec system -- tail -50 /opt/gbo/logs/err.log
|
|
|
|
# Live tail
|
|
sudo incus exec system -- tail -f /opt/gbo/logs/out.log
|
|
```
|
|
|
|
### Restart Services
|
|
|
|
**CRITICAL PRODUCTION RULE:** In production, NEVER start botserver or botui directly. Always use `systemctl` to ensure proper initialization, environment loading, and logging.
|
|
|
|
```bash
|
|
sudo incus exec system -- systemctl restart botserver
|
|
sudo incus exec system -- systemctl restart ui
|
|
```
|
|
|
|
**PROHIBITED in production:**
|
|
```bash
|
|
# ❌ NEVER DO THIS IN PRODUCTION:
|
|
sudo incus exec system -- /opt/gbo/bin/botserver # Wrong - no systemd integration
|
|
sudo incus exec system -- /opt/gbo/bin/botserver & # Wrong - no service management
|
|
sudo incus exec system -- cd /opt/gbo/bin && ./botserver # Wrong - missing env vars
|
|
|
|
# ✅ CORRECT - Always use systemctl:
|
|
sudo incus exec system -- systemctl start botserver
|
|
sudo incus exec system -- systemctl restart botserver
|
|
sudo incus exec system -- systemctl stop botserver
|
|
sudo incus exec system -- systemctl status botserver
|
|
```
|
|
|
|
**Why:**
|
|
- `systemctl` loads `/opt/gbo/bin/.env` (via `EnvironmentFile` in service definition)
|
|
- `systemctl` manages process lifecycle, auto-restart, and dependencies
|
|
- `systemctl` sends logs to `/opt/gbo/logs/out.log` and `/opt/gbo/logs/err.log`
|
|
- Direct execution skips environment variables and systemd service configuration
|
|
|
|
### Manual Deploy (emergency)
|
|
|
|
```bash
|
|
# Kill old process
|
|
sudo incus exec system -- killall botserver
|
|
|
|
# Copy binary (from host CI workspace or local)
|
|
sudo incus exec system -- cp /opt/gbo/ci/botserver/target/debug/botserver /opt/gbo/bin/botserver
|
|
sudo incus exec system -- chmod +x /opt/gbo/bin/botserver
|
|
sudo incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver
|
|
|
|
# Start service
|
|
sudo incus exec system -- systemctl start botserver
|
|
```
|
|
|
|
### Transfer Bot Files to Production
|
|
|
|
```bash
|
|
# From local to prod host
|
|
tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai
|
|
scp /tmp/bots.tar.gz admin@<host-ip>:/tmp/
|
|
|
|
# From host to container
|
|
sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'
|
|
|
|
# Clear compiled cache
|
|
sudo incus exec system -- find /opt/gbo/data -name "*.ast" -delete
|
|
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
|
|
```
|
|
|
|
### Snapshots
|
|
|
|
```bash
|
|
# List snapshots
|
|
sudo incus snapshot list system
|
|
|
|
# Restore snapshot
|
|
sudo incus snapshot restore system <snapshot-name>
|
|
```
|
|
|
|
## DriveMonitor & Bot Configuration Sync
|
|
|
|
### DriveMonitor Architecture
|
|
|
|
DriveMonitor is a background service that synchronizes bot files from MinIO (S3-compatible storage) to the local filesystem and database. It monitors three directories per bot:
|
|
|
|
| Directory | Purpose | Sync Behavior |
|
|
|-----------|---------|---------------|
|
|
| `{bot}.gbai/{bot}.gbdialog/` | BASIC scripts (.bas) | Downloads and compiles on change |
|
|
| `{bot}.gbai/{bot}.gbot/` | Configuration files | Syncs to `bot_configuration` table |
|
|
| `{bot}.gbkb/` | Knowledge base documents | Downloads and indexes for vector search |
|
|
|
|
### Bot Configuration Database Tables
|
|
|
|
#### `bot_configuration` (main config table)
|
|
```sql
|
|
-- Location: botserver database
|
|
SELECT * FROM bot_configuration WHERE bot_id = '<bot_uuid>';
|
|
|
|
-- Key columns:
|
|
-- - bot_id: Bot UUID (link to bots table)
|
|
-- - config_key: Configuration key (e.g., "llm-provider", "system-prompt")
|
|
-- - config_value: Configuration value
|
|
-- - config_type: Type (string, boolean, number)
|
|
-- - is_encrypted: Whether value is encrypted
|
|
-- - updated_at: Last modification timestamp
|
|
```
|
|
|
|
#### `gbot_config_sync` (sync tracking table)
|
|
```sql
|
|
-- Location: botserver database
|
|
-- Tracks config.csv sync status from bucket
|
|
SELECT * FROM gbot_config_sync g
|
|
JOIN bots b ON g.bot_id = b.id
|
|
WHERE b.name = 'salesianos';
|
|
|
|
-- Key columns:
|
|
-- - bot_id: Bot UUID
|
|
-- - config_file_path: Path to config.csv in bucket
|
|
-- - last_sync_at: Timestamp of last successful sync
|
|
-- - file_hash: ETag/MD5 of synced file
|
|
-- - sync_count: Number of times synced
|
|
```
|
|
|
|
### config.csv Sync Process
|
|
|
|
**File Locations:**
|
|
- Source: `{bot}.gbai/{bot}.gbot/config.csv` in MinIO bucket
|
|
- Sync method: DriveMonitor → ConfigManager → `bot_configuration` table
|
|
- Sync frequency: Every 10 seconds (DriveMonitor periodic check)
|
|
|
|
**Sync Trigger Conditions:**
|
|
1. File ETag changes in MinIO
|
|
2. Initial DriveMonitor startup
|
|
3. Manual botserver restart
|
|
|
|
**CSV Format:**
|
|
```csv
|
|
llm-provider,groq
|
|
llm-api-key,sk-xxx
|
|
llm-url,http://localhost:8085
|
|
system-prompt-file,PROMPT.md
|
|
theme-color1,#cc0000
|
|
theme-title,MyBot
|
|
whatsapp-id,botname
|
|
```
|
|
|
|
### Checking Bot Configuration Status
|
|
|
|
#### Method 1: Query bot_configuration table
|
|
```bash
|
|
# Get all config for a bot
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
|
|
SELECT b.name, bc.config_key, bc.config_value, bc.updated_at
|
|
FROM bot_configuration bc
|
|
JOIN bots b ON bc.bot_id = b.id
|
|
WHERE b.name = 'salesianos'
|
|
ORDER BY bc.config_key;
|
|
"
|
|
|
|
# Get specific LLM provider config
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
|
|
SELECT config_key, config_value, updated_at
|
|
FROM bot_configuration
|
|
WHERE bot_id = (
|
|
SELECT id FROM bots WHERE name = 'salesianos'
|
|
)
|
|
AND config_key LIKE 'llm-%'
|
|
ORDER BY config_key;
|
|
"
|
|
```
|
|
|
|
#### Method 2: Check DriveMonitor sync status
|
|
```bash
|
|
# Check if config.csv has been synced
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
|
|
SELECT b.name, gcs.last_sync_at, gcs.sync_count, gcs.config_file_path
|
|
FROM gbot_config_sync gcs
|
|
JOIN bots b ON gcs.bot_id = b.id
|
|
WHERE b.name IN ('salesianos', 'default');
|
|
"
|
|
|
|
-- Empty result = DriveMonitor hasn't synced config.csv yet
|
|
-- If sync_count = 0, config.csv exists but hasn't been processed
|
|
```
|
|
|
|
#### Method 3: Direct MinIO inspection
|
|
```bash
|
|
# Check if config.csv exists in bucket
|
|
sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/
|
|
|
|
# View config.csv contents
|
|
sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv
|
|
|
|
# Check file ETag (for sync comparison)
|
|
sudo incus exec drive -- /opt/gbo/bin/mc stat local/salesianos.gbai/salesianos.gbot/config.csv
|
|
```
|
|
|
|
### DriveMonitor Debugging Logs
|
|
|
|
#### Key log patterns to monitor
|
|
```bash
|
|
# Monitor DriveMonitor activity in real-time
|
|
sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)"
|
|
|
|
# Check for config.csv sync attempts
|
|
sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -20
|
|
|
|
# Check for config synchronization
|
|
sudo incus exec system -- grep "sync_gbot_config" /opt/gbo/logs/out.log | tail -20
|
|
|
|
# Check for DriveMonitor errors
|
|
sudo incus exec system -- grep -i "drive.*error" /opt/gbo/logs/err.log | tail -20
|
|
```
|
|
|
|
#### Expected successful sync logs
|
|
```
|
|
check_gbot: Checking bucket salesianos.gbai for config.csv changes
|
|
check_gbot: Found config.csv at path: salesianos.gai/salesianos.gbot/config.csv
|
|
info config:Synced config.csv for bot <uuid> - updated 3 keys
|
|
```
|
|
|
|
#### Error patterns and meanings
|
|
```
|
|
# Config.csv not found in bucket
|
|
check_gbot: Config file not found or inaccessible: path/to/config.csv
|
|
|
|
# Sync to database failed
|
|
error config:Failed to sync_gbot_config: <database error>
|
|
|
|
# DriveMonitor not running
|
|
(no check_gbot logs in output.log)
|
|
|
|
# MinIO connection failed
|
|
error drive_monitor:S3/MinIO unavailable for bucket <bucket>
|
|
```
|
|
|
|
### Common Issues and Fixes
|
|
|
|
#### Issue 1: config.csv not syncing to database
|
|
|
|
**Symptoms:**
|
|
- `gbot_config_sync` table empty (0 rows)
|
|
- LLM provider changes in bucket not reflected in bot behavior
|
|
- Database shows old configuration values
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# 1. Check if config.csv exists in bucket
|
|
sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/
|
|
|
|
# 2. Check DriveMonitor logs for sync attempts
|
|
sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -10
|
|
|
|
# 3. Check if DriveMonitor is running for the bot
|
|
sudo incus exec system -- ps aux | grep botserver
|
|
```
|
|
|
|
**Root Causes:**
|
|
1. config.csv missing from `{bot}.gai/{bot}.gbot/` folder
|
|
2. DriveMonitor not started for the bot
|
|
3. MinIO connection issues
|
|
4. Database write permissions
|
|
|
|
**Fixes:**
|
|
```bash
|
|
# Case 1: Create missing config.csv
|
|
sudo incus exec drive -- bash -c '
|
|
cat > /tmp/config.csv << EOF
|
|
llm-provider,groq
|
|
llm-api-key,your-api-key
|
|
llm-url,http://localhost:8085
|
|
system-prompt-file,PROMPT.md
|
|
theme-color1,#cc0000
|
|
theme-title,Salesianos
|
|
EOF
|
|
/opt/gbo/bin/mc cp /tmp/config.csv local/salesianos.gbai/salesianos.gbot/config.csv
|
|
'
|
|
|
|
# Case 2: Restart botserver to reinitialize DriveMonitor
|
|
sudo incus exec system -- systemctl restart botserver
|
|
|
|
# Case 3: Force immediate sync by touching config.csv
|
|
sudo incus exec drive -- /opt/gbo/bin/mc cp local/salesianos.gbai/salesianos.gbot/config.csv local/salesianos.gbai/salesianos.gbot/config.csv
|
|
```
|
|
|
|
#### Issue 2: LLM provider changes not taking effect
|
|
|
|
**Symptoms:**
|
|
- config.csv shows correct provider (e.g., groq)
|
|
- Bot still uses old provider
|
|
- Database shows old value
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Compare bucket vs database
|
|
BUCKET_PROVIDER=$(sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv | grep "^llm-provider" | cut -d',' -f2)
|
|
DB_PROVIDER=$(sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c "
|
|
SELECT config_value FROM bot_configuration
|
|
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos')
|
|
AND config_key = 'llm-provider';
|
|
")
|
|
|
|
echo "Bucket: $BUCKET_PROVIDER"
|
|
echo "Database: $DB_PROVIDER"
|
|
|
|
# Check last sync time
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c "
|
|
SELECT last_sync_at FROM gbot_config_sync
|
|
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos');
|
|
"
|
|
```
|
|
|
|
**Fix:**
|
|
```bash
|
|
# If sync is stale (> 10 minutes), restart DriveMonitor
|
|
sudo incus exec system -- systemctl restart botserver
|
|
|
|
# Or manually update config value in database (temporary fix)
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "
|
|
UPDATE bot_configuration
|
|
SET config_value = 'groq', updated_at = NOW()
|
|
WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos')
|
|
AND config_key = 'llm-provider';
|
|
"
|
|
```
|
|
|
|
#### Issue 3: DriveMonitor not checking for changes
|
|
|
|
**Symptoms:**
|
|
- No new log entries after 30 seconds
|
|
- File changes in bucket not detected
|
|
- Bot compilation not happening after .bas file updates
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check DriveMonitor loop logs
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep "DRIVE_MONITOR.*Inside monitoring loop"
|
|
|
|
# Check if is_processing flag is stuck
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -E "(is_processing|monitoring loop)"
|
|
```
|
|
|
|
**Fix:**
|
|
```bash
|
|
# Restart botserver to clear stuck state
|
|
sudo incus exec system -- systemctl restart botserver
|
|
|
|
# Monitor startup logs to verify DriveMonitor started
|
|
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep "Drive Monitor"
|
|
```
|
|
|
|
### Database Schema Reference
|
|
|
|
#### List all bot databases
|
|
```bash
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "\l" | grep bot_
|
|
```
|
|
|
|
#### List tables in a specific bot database
|
|
```bash
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d bot_salesianos -c "\dt"
|
|
```
|
|
|
|
#### List botserver management tables
|
|
```bash
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "\dt" | grep -E "(bot|config|sync)"
|
|
```
|
|
|
|
### Connection Methods Summary
|
|
|
|
| Method | Use Case | Command Pattern |
|
|
|--------|-----------|-----------------|
|
|
| **SSH to host** | Initial access, file transfer | `ssh admin@63.141.255.9` |
|
|
| **incus exec** | Execute inside container | `sudo incus exec system -- command` |
|
|
| **psql direct** | Database queries from container | `sudo incus exec tables -- psql ...` |
|
|
| **mc (MinIO CLI)** | Inspect buckets, copy files | `sudo incus exec drive -- /opt/gbo/bin/mc ...` |
|
|
| **HTTP/curl** | Service health checks | `curl http://<ip>:5858/health` |
|
|
| **journalctl** | Systemd service logs | `sudo incus exec system -- journalctl -u botserver` |
|
|
|
|
## Vault Security Architecture
|
|
|
|
### Overview
|
|
|
|
The production environment uses **HashiCorp Vault** as the centralized secrets management system. All sensitive credentials (database passwords, API keys, tokens) are stored in Vault, NEVER in code or environment files.
|
|
|
|
### Vault Connection Flow
|
|
|
|
```
|
|
1. botserver starts
|
|
↓
|
|
2. Reads VAULT_ADDR, VAULT_TOKEN from .env
|
|
↓
|
|
3. Initializes VaultClient with TLS/mTLS
|
|
↓
|
|
4. Reads secrets from Vault paths (gbo/tables, gbo/drive, etc.)
|
|
↓
|
|
5. Falls back to defaults if Vault unavailable
|
|
```
|
|
|
|
### Environment Variables (Allowed)
|
|
|
|
**File Location:** `/opt/gbo/bin/.env` (system container)
|
|
|
|
```bash
|
|
# Vault Connection (MANDATORY for production)
|
|
VAULT_ADDR=https://<vault-ip>:8200
|
|
VAULT_TOKEN=<root-token>
|
|
VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
|
|
# Optional: Skip TLS verification (NOT recommended for production)
|
|
VAULT_SKIP_VERIFY=false
|
|
|
|
# Optional: Use mTLS certificates
|
|
VAULT_CLIENT_CERT=/opt/gbo/conf/system/certificates/botserver/client.crt
|
|
VAULT_CLIENT_KEY=/opt/gbo/conf/system/certificates/botserver/client.key
|
|
|
|
# Optional: Cache TTL in seconds (default: 300)
|
|
VAULT_CACHE_TTL=300
|
|
|
|
# Server Configuration
|
|
PORT=5858
|
|
DATA_DIR=/opt/gbo/data/
|
|
WORK_DIR=/opt/gbo/work/
|
|
LOAD_ONLY=default,salesianos
|
|
```
|
|
|
|
**Security Rule:**
|
|
- **ONLY** `VAULT_*` environment variables are allowed in `.env`
|
|
- All other secrets MUST come from Vault
|
|
- Hardcoded secrets in code are FORBIDDEN (see AGENTS.md)
|
|
|
|
### Vault Secret Paths Structure
|
|
|
|
#### System-Wide Paths (Global)
|
|
|
|
| Path | Purpose | Example Keys |
|
|
|------|---------|---------------|
|
|
| `gbo/tables` | Database (PostgreSQL) | host, port, database, username, password |
|
|
| `gbo/drive` | MinIO (Object Storage) | host, accesskey, secret |
|
|
| `gbo/cache` | Valkey (Redis) | host, port, password |
|
|
| `gbo/directory` | Zitadel (Auth) | url, project_id, client_id, client_secret |
|
|
| `gbo/email` | SMTP Email | smtp_host, smtp_port, smtp_user, smtp_password |
|
|
| `gbo/llm` | LLM Configuration | url, model, openai_key, anthropic_key |
|
|
| `gbo/vectordb` | Qdrant (Vector DB) | url, api_key |
|
|
| `gbo/jwt` | JWT Signing | secret |
|
|
| `gbo/meet` | Jitsi Meet | url, app_id, app_secret |
|
|
| `gbo/alm` | ALM Repository | url, token |
|
|
| `gbo/encryption` | Encryption Keys | master_key |
|
|
| `gbo/system/observability` | Monitoring | url, org, bucket, token |
|
|
| `gbo/system/security` | Security Policies | require_auth, anonymous_paths |
|
|
| `gbo/system/cloud` | Cloud Config | region, access_key, secret_key |
|
|
| `gbo/system/app` | Application Settings | url, environment |
|
|
| `gbo/system/models` | BotModels API | url |
|
|
|
|
#### Organization-Specific Paths
|
|
|
|
| Path Pattern | Purpose |
|
|
|--------------|---------|
|
|
| `gbo/orgs/{org_id}/config` | Organization configuration |
|
|
| `gbo/orgs/{org_id}/bots/{bot_id}` | Bot-specific secrets |
|
|
| `gbo/orgs/{org_id}/users/{user_id}` | User-specific secrets |
|
|
| `gbo/tenants/{tenant_id}/infrastructure` | Tenant database/cache/drive |
|
|
| `gbo/tenants/{tenant_id}/config` | Tenant configuration |
|
|
|
|
### Credential Resolution Hierarchy
|
|
|
|
For bot email configuration (example):
|
|
```
|
|
1. Check gbo/orgs/{org_id}/bots/{bot_id}/email
|
|
2. Fallback: gbo/bots/default/email
|
|
3. Fallback: gbo/email
|
|
4. Fallback: Environment variables (development only)
|
|
```
|
|
|
|
### Vault Client Initialization (Code Reference)
|
|
|
|
**File:** `botserver/src/core/secrets/mod.rs`
|
|
|
|
```rust
|
|
// SecretsManager::from_env() reads:
|
|
// - VAULT_ADDR (required)
|
|
// - VAULT_TOKEN (required)
|
|
// - VAULT_CACERT (optional, has default)
|
|
// - VAULT_SKIP_VERIFY (optional, default: false)
|
|
// - VAULT_CLIENT_CERT (optional, mTLS)
|
|
// - VAULT_CLIENT_KEY (optional, mTLS)
|
|
// - VAULT_CACHE_TTL (optional, default: 300s)
|
|
|
|
impl SecretsManager {
|
|
pub fn from_env() -> Result<Self> {
|
|
let addr = env::var("VAULT_ADDR").unwrap_or_default();
|
|
let token = env::var("VAULT_TOKEN").unwrap_or_default();
|
|
|
|
if token.is_empty() || addr.is_empty() {
|
|
// Vault not configured - use environment variables directly
|
|
warn!("Vault not configured. Using environment variables directly.");
|
|
return Ok(Self { client: None, enabled: false, ... });
|
|
}
|
|
|
|
// Initialize VaultClient with TLS
|
|
let client = VaultClient::new(settings)?;
|
|
Ok(Self { client: Some(client), enabled: true, ... })
|
|
}
|
|
}
|
|
```
|
|
|
|
### Vault Operations - Production Usage
|
|
|
|
#### Read Secrets from Vault
|
|
|
|
```bash
|
|
# From system container (using vault CLI)
|
|
sudo incus exec system -- bash -c '
|
|
export VAULT_ADDR=https://10.157.134.250:8200
|
|
export VAULT_TOKEN=<vault-token>
|
|
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
|
|
# Read database secrets
|
|
vault kv get -field=password secret/gbo/tables
|
|
vault kv get secret/gbo/tables
|
|
|
|
# Read drive secrets
|
|
vault kv get secret/gbo/drive
|
|
|
|
# Read LLM configuration
|
|
vault kv get secret/gbo/llm
|
|
'
|
|
```
|
|
|
|
#### Read Secrets via HTTP API (from any container)
|
|
|
|
```bash
|
|
sudo incus exec system -- curl -sf \
|
|
--cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
|
|
-H "X-Vault-Token: <vault-token>" \
|
|
https://10.157.134.250:8200/v1/secret/data/gbo/drive | jq
|
|
```
|
|
|
|
#### Verify Vault Health
|
|
|
|
```bash
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
|
|
|
|
# Expected output:
|
|
# {"initialized":true,"sealed":false,"standby":false,"performance_standby":false,"replication_performance_mode":"disabled","replication_dr_mode":"disabled","server_time_utc":"2026-04-10T13:55:00.123Z"}
|
|
```
|
|
|
|
### init.json (Vault Initialization Data)
|
|
|
|
**Location:** `/opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json`
|
|
|
|
**Purpose:** Stores Vault unseal keys and root token (created during Vault initialization)
|
|
|
|
**Contents:**
|
|
```json
|
|
{
|
|
"recovery_keys_b64": [],
|
|
"recovery_keys_hex": [],
|
|
"recovery_keys_shares": 0,
|
|
"recovery_keys_threshold": 0,
|
|
"root_token": "<vault-token>",
|
|
"unseal_keys_b64": ["<5 unseal keys base64-encoded>"],
|
|
"unseal_keys_hex": ["<5 unseal keys hex-encoded>"],
|
|
"unseal_shares": 5,
|
|
"unseal_threshold": 3
|
|
}
|
|
```
|
|
|
|
**Security Notes:**
|
|
- `root_token`: Used to authenticate to Vault as admin
|
|
- `unseal_keys`: Required to unseal Vault after restart (5 keys, need 3 to unseal)
|
|
- **CRITICAL:** Store `init.json` in a secure, encrypted location
|
|
- Never commit `init.json` to git or store in repo
|
|
|
|
### Troubleshooting Vault Connection
|
|
|
|
#### Issue 1: Botserver cannot connect to Vault
|
|
|
|
**Symptoms:**
|
|
- Logs show "Vault connection failed"
|
|
- Secrets fall back to defaults
|
|
- Bot cannot authenticate to database
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check Vault is running
|
|
sudo incus exec vault -- systemctl status vault
|
|
|
|
# Check Vault health
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
|
|
|
|
# Check .env has Vault credentials
|
|
sudo incus exec system -- grep "^VAULT_" /opt/gbo/bin/.env
|
|
|
|
# Test Vault connection from system container
|
|
sudo incus exec system -- bash -c '
|
|
curl -k -sf --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
|
|
-H "X-Vault-Token: $(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2)" \
|
|
https://10.157.134.250:8200/v1/secret/data/gbo/tables
|
|
'
|
|
```
|
|
|
|
**Common Causes:**
|
|
1. Vault service not running (vault container stopped)
|
|
2. `VAULT_TOKEN` expired or invalid
|
|
3. TLS certificate path incorrect or CA certificate missing
|
|
4. Network connectivity between system and vault containers
|
|
|
|
**Fix:**
|
|
```bash
|
|
# 1. Restart Vault if stopped
|
|
sudo incus exec vault -- systemctl restart vault
|
|
|
|
# 2. Generate new token if expired
|
|
sudo incus exec vault -- bash -c '
|
|
export VAULT_ADDR=https://localhost:8200
|
|
export VAULT_TOKEN=<root-token-from-init.json>
|
|
vault token create -policy="botserver" -ttl="8760h" -format=json | jq -r .auth.client_token
|
|
'
|
|
|
|
# 3. Update .env with new token
|
|
sudo incus exec system -- sed -i "s|VAULT_TOKEN=.*|VAULT_TOKEN=<new-token>|" /opt/gbo/bin/.env
|
|
|
|
# 4. Restart botserver
|
|
sudo incus exec system -- systemctl restart botserver
|
|
```
|
|
|
|
#### Issue 2: Secrets not being read from Vault
|
|
|
|
**Symptoms:**
|
|
- Logs show "Vault read failed for 'gbo/drive'"
|
|
- Services use default credentials
|
|
- DriveMonitor cannot access MinIO
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check if Vault has secrets configured
|
|
sudo incus exec system -- bash -c '
|
|
export VAULT_ADDR=https://10.157.134.250:8200
|
|
export VAULT_TOKEN=$(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2)
|
|
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
|
|
echo "=== Database Secrets ==="
|
|
vault kv get secret/gbo/tables || echo "NOT FOUND"
|
|
|
|
echo "=== Drive Secrets ==="
|
|
vault kv get secret/gbo/drive || echo "NOT FOUND"
|
|
|
|
echo "=== LLM Secrets ==="
|
|
vault kv get secret/gbo/llm || echo "NOT FOUND"
|
|
'
|
|
```
|
|
|
|
**Fix - Adding Secrets to Vault:**
|
|
```bash
|
|
sudo incus exec vault -- bash -c '
|
|
export VAULT_ADDR=https://localhost:8200
|
|
export VAULT_TOKEN=<root-token>
|
|
|
|
# Add database secrets
|
|
vault kv put secret/gbo/tables \
|
|
host=<tables-ip> \
|
|
port=5432 \
|
|
database=botserver \
|
|
username=gbuser \
|
|
password=<secure-password>
|
|
|
|
# Add drive (MinIO) secrets
|
|
vault kv put secret/gbo/drive \
|
|
host=<drive-ip> \
|
|
port=9100 \
|
|
accesskey=<minio-access-key> \
|
|
secret=<minio-secret>
|
|
|
|
# Add LLM secrets
|
|
vault kv put secret/gbo/llm \
|
|
url=http://localhost:8085 \
|
|
model=gpt-4 \
|
|
openai_key=<openai-api-key> \
|
|
anthropic_key=<anthropic-api-key>
|
|
'
|
|
```
|
|
|
|
#### Issue 3: Vault sealed after restart
|
|
|
|
**Symptoms:**
|
|
- All Vault operations fail
|
|
- botserver cannot read secrets
|
|
- Logs show "Vault is sealed"
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health | jq .sealed
|
|
```
|
|
|
|
**Fix - Unseal Vault:**
|
|
```bash
|
|
sudo incus exec vault -- bash -c '
|
|
# Need 3 of 5 unseal keys from init.json
|
|
vault operator unseal <key1>
|
|
vault operator unseal <key2>
|
|
vault operator unseal <key3>
|
|
|
|
# Verify unsealed
|
|
vault status
|
|
'
|
|
```
|
|
|
|
#### Issue 4: TLS certificate errors
|
|
|
|
**Symptoms:**
|
|
- "certificate verify failed" errors
|
|
- TLS handshake failures
|
|
- curl: (60) SSL certificate problem
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
sudo incus exec system -- bash -c '
|
|
# Check CA certificate exists
|
|
ls -la /opt/gbo/conf/system/certificates/ca/ca.crt
|
|
|
|
# Test certificate
|
|
openssl x509 -in /opt/gbo/conf/system/certificates/ca/ca.crt -text -noout
|
|
'
|
|
```
|
|
|
|
**Fix:**
|
|
```bash
|
|
# If CA cert is missing, copy from vault container
|
|
sudo incus exec vault -- cp /opt/gbo/conf/vault/ca.crt /tmp/
|
|
|
|
sudo incus exec system -- mkdir -p /opt/gbo/conf/system/certificates/ca/
|
|
sudo incus exec system -- bash -c '
|
|
# Copy certificate from vault container
|
|
incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt
|
|
cp /tmp/ca.crt /opt/gbo/conf/system/certificates/ca/
|
|
chmod 644 /opt/gbo/conf/system/certificates/ca/ca.crt
|
|
'
|
|
```
|
|
|
|
### Security Best Practices
|
|
|
|
1. **Never commit secrets to git**
|
|
- No API keys, passwords, tokens in code
|
|
- Use Vault for ALL sensitive data
|
|
- Init secrets from `SecretsManager::from_env()`
|
|
|
|
2. **Use Vault for all service credentials**
|
|
- Database passwords: `gbo/tables`
|
|
- MinIO keys: `gbo/drive`
|
|
- LLM API keys: `gbo/llm`
|
|
- Email passwords: `gbo/email`
|
|
|
|
3. **Rotate credentials regularly**
|
|
- Generate new tokens/keys periodically
|
|
- Update Vault using `vault kv put`
|
|
- No need to restart services (next read gets new values)
|
|
|
|
4. **Enable TLS/mTLS in production**
|
|
- Always use `VAULT_CACERT`
|
|
- Enable mTLS for critical services: `VAULT_CLIENT_CERT` + `VAULT_CLIENT_KEY`
|
|
- Never use `VAULT_SKIP_VERIFY=true` in production
|
|
|
|
5. **Limit token lifetimes**
|
|
- Root token: single use or very short TTL
|
|
- Service tokens: limited to needed time (e.g., 8760h = 1 year)
|
|
- Generate new tokens when old ones expire
|
|
|
|
6. **Audit Vault access**
|
|
```bash
|
|
# Check recent Vault operations
|
|
sudo incus exec vault -- vault audit list
|
|
sudo incus exec vault -- vault audit file /var/log/vault_audit.log
|
|
```
|
|
|
|
### Vault Backup & Recovery
|
|
|
|
#### Backup Vault Data
|
|
|
|
```bash
|
|
# Snapshot vault container (includes all secrets)
|
|
sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M)
|
|
|
|
# Export Vault config (init.json with unseal keys)
|
|
sudo incus exec vault -- cat /opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json > /tmp/vault-init.json
|
|
|
|
# Backup all secrets (JSON format)
|
|
sudo incus exec vault -- bash -c '
|
|
export VAULT_ADDR=https://localhost:8200
|
|
export VAULT_TOKEN=<root-token>
|
|
|
|
# Backup each path
|
|
for path in gbo/tables gbo/drive gbo/cache gbo/llm; do
|
|
vault kv get -format=json secret/$path > /tmp/vault-$path.json
|
|
done
|
|
'
|
|
```
|
|
|
|
#### Restore from Snapshot
|
|
|
|
```bash
|
|
# Stop vault
|
|
sudo incus exec vault -- systemctl stop vault
|
|
|
|
# Restore snapshot
|
|
sudo incus snapshot restore vault <snapshot-name>
|
|
|
|
# Start vault
|
|
sudo incus exec vault -- systemctl start vault
|
|
|
|
# Wait for Vault to be ready
|
|
sleep 10
|
|
|
|
# Verify health
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### GLIBC Version Mismatch
|
|
|
|
**Symptom**: `GLIBC_2.39 not found` or `GLIBC_2.38 not found`
|
|
|
|
**Cause**: Binary compiled on CI runner (glibc 2.41) but runs in system container (glibc 2.36)
|
|
|
|
**Fix**: CI workflow must build inside the system container. Check `botserver.yaml` uses SSH to build in container.
|
|
|
|
### botserver Not Starting
|
|
|
|
```bash
|
|
# Check binary
|
|
sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found"
|
|
|
|
# Check direct execution
|
|
sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1
|
|
|
|
# Check data directory
|
|
sudo incus exec system -- ls -la /opt/gbo/data/
|
|
```
|
|
|
|
### botui Can't Reach botserver
|
|
|
|
```bash
|
|
# Check BOTSERVER_URL
|
|
sudo incus exec system -- grep BOTSERVER_URL /etc/systemd/system/ui.service
|
|
|
|
# Must be http://localhost:5858, NOT https://system.example.com
|
|
# Fix:
|
|
sudo incus exec system -- sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' /etc/systemd/system/ui.service
|
|
sudo incus exec system -- systemctl daemon-reload
|
|
sudo incus exec system -- systemctl restart ui
|
|
```
|
|
|
|
### Suggestions Not Showing
|
|
|
|
```bash
|
|
# Check bot files exist
|
|
sudo incus exec system -- ls -la /opt/gbo/data/<bot>.gbai/<bot>.gbdialog/
|
|
|
|
# Check for compilation errors
|
|
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep -i "error\|fail\|compile"
|
|
|
|
# Clear cache and restart
|
|
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
|
|
sudo incus exec system -- systemctl restart botserver
|
|
```
|
|
|
|
### IPv6 DNS Issues
|
|
|
|
**Symptom**: External API calls (Groq, Cloudflare) timeout
|
|
|
|
**Cause**: Container DNS returns AAAA records but no IPv6 connectivity
|
|
|
|
**Fix**: Container has `IPV6=no` in network config and `gai.conf` labels. If issues persist, check `RES_OPTIONS=inet4` in botserver.service.
|
|
|
|
### Vault Connection & Service Discovery Issues
|
|
|
|
**Symptom**: Logs show `Failed to read data directory <development-path>` or `Config scan failed`
|
|
|
|
**Cause**: Botserver is using hardcoded development paths instead of production paths
|
|
|
|
**Fix**:
|
|
|
|
1. **Check current configuration**:
|
|
```bash
|
|
# Check .env file
|
|
sudo incus exec system -- cat /opt/gbo/bin/.env
|
|
|
|
# Check data directory
|
|
sudo incus exec system -- ls -la /opt/gbo/data/
|
|
sudo incus exec system -- ls -la /opt/gbo/work/
|
|
```
|
|
|
|
2. **Verify Vault connection**:
|
|
```bash
|
|
# Test Vault from system container
|
|
sudo incus exec system -- curl -k -sf https://<vault-ip>:8200/v1/sys/health
|
|
|
|
# Check Vault token
|
|
sudo incus exec system -- grep VAULT_TOKEN /opt/gbo/bin/.env
|
|
```
|
|
|
|
3. **Check service discovery**:
|
|
```bash
|
|
# Check if botserver is reading Vault secrets
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i vault
|
|
|
|
# Check for service configuration errors
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "config\|service"
|
|
```
|
|
|
|
4. **Fix data directory paths**:
|
|
- Ensure botserver uses `/opt/gbo/data/` instead of development paths
|
|
- Update configuration if hardcoded paths exist
|
|
- Restart botserver after fixing
|
|
|
|
5. **Verify all services are accessible**:
|
|
```bash
|
|
# Check PostgreSQL
|
|
sudo incus exec system -- pg_isready -h <database-ip> -p 5432
|
|
|
|
# Check Valkey
|
|
sudo incus exec system -- redis-cli -h <cache-ip> -a <password> ping
|
|
|
|
# Check MinIO
|
|
sudo incus exec system -- curl -sf http://<storage-ip>:9100/minio/health/live
|
|
```
|
|
|
|
6. **Update botserver configuration**:
|
|
- Ensure botserver reads from `/opt/gbo/bin/.env` for Vault configuration
|
|
- Verify service discovery uses Vault to get service endpoints
|
|
- Check that data directory is set to `/opt/gbo/data/` in configuration
|
|
- Update systemd service if needed:
|
|
```bash
|
|
sudo incus exec system -- cat /etc/systemd/system/botserver.service
|
|
# Ensure EnvironmentFile=/opt/gbo/bin/.env is present
|
|
```
|
|
|
|
7. **Test after fixes**:
|
|
```bash
|
|
# Restart botserver
|
|
sudo incus exec system -- systemctl restart botserver
|
|
|
|
# Wait for startup
|
|
sleep 10
|
|
|
|
# Check logs for errors
|
|
sudo incus exec system -- tail -50 /opt/gbo/logs/err.log
|
|
|
|
# Verify health endpoint
|
|
curl -sf http://<main-server-ip>:5858/health
|
|
```
|
|
|
|
### Vault Connection Errors
|
|
|
|
**Symptom**: `Vault connection failed` or `Vault token invalid`
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Check Vault is running
|
|
sudo incus exec vault -- systemctl status vault
|
|
|
|
# Check Vault health
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
|
|
|
|
# Verify token is valid
|
|
sudo incus exec system -- bash -c '
|
|
export VAULT_ADDR=https://<vault-ip>:8200
|
|
export VAULT_TOKEN=<vault_token>
|
|
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
vault token lookup
|
|
'
|
|
|
|
# If token is invalid, generate new one
|
|
sudo incus exec vault -- bash -c '
|
|
export VAULT_ADDR=https://localhost:8200
|
|
export VAULT_TOKEN=<root_token>
|
|
vault token create -policy="botserver" -ttl="8760h"
|
|
'
|
|
|
|
# Update .env with new token
|
|
sudo incus exec system -- sed -i 's|VAULT_TOKEN=.*|VAULT_TOKEN=<new_token>|' /opt/gbo/bin/.env
|
|
sudo incus exec system -- systemctl restart botserver
|
|
```
|
|
|
|
### Service Discovery Failures
|
|
|
|
**Symptom**: `Service not found` or `Failed to connect to service`
|
|
|
|
**Fix**:
|
|
```bash
|
|
# Check if service is running
|
|
sudo incus exec tables -- systemctl status postgresql
|
|
sudo incus exec cache -- systemctl status valkey
|
|
sudo incus exec drive -- systemctl status minio
|
|
|
|
# Check if service is accessible from system container
|
|
sudo incus exec system -- nc -zv <database-ip> 5432 # PostgreSQL
|
|
sudo incus exec system -- nc -zv <cache-ip> 6379 # Valkey
|
|
sudo incus exec system -- nc -zv <storage-ip> 9100 # MinIO
|
|
|
|
# Check Vault has service configuration
|
|
sudo incus exec system -- bash -c '
|
|
export VAULT_ADDR=https://<vault-ip>:8200
|
|
export VAULT_TOKEN=<vault_token>
|
|
export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
vault kv list secret/botserver
|
|
'
|
|
|
|
# If service config is missing, add it (see Vault Configuration section)
|
|
```
|
|
|
|
### Monitoring & Verification
|
|
|
|
**Check botserver is working correctly**:
|
|
```bash
|
|
# Health check
|
|
curl -sf http://<main-server-ip>:5858/health
|
|
|
|
# Check logs for errors
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "error\|fail"
|
|
|
|
# Check logs for successful service connections
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "connected\|service\|vault"
|
|
|
|
# Verify data directory is correct
|
|
sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "data\|work"
|
|
|
|
# Should show /opt/gbo/data/ and /opt/gbo/work/, not development paths
|
|
```
|
|
|
|
**Expected log output**:
|
|
```
|
|
info vault:Connected to Vault at https://<vault-ip>:8200
|
|
info service_discovery:Loaded service configuration from Vault
|
|
info database:Connected to PostgreSQL at <database-ip>:5432
|
|
info cache:Connected to Valkey at <cache-ip>:6379
|
|
info storage:Connected to MinIO at http://<storage-ip>:9100
|
|
info watcher:Watching data directory /opt/gbo/data
|
|
info botserver:BotServer started successfully on port 5858
|
|
```
|
|
|
|
**If logs show errors**:
|
|
1. Check Vault connection (see Vault Connection Errors section)
|
|
2. Check service accessibility (see Service Discovery Failures section)
|
|
3. Fix data directory paths (see Fix Development Paths in Production section)
|
|
4. Restart botserver and verify again
|
|
|
|
### Vault Backup & Restore
|
|
|
|
**Create Vault snapshot**:
|
|
```bash
|
|
# Stop Vault
|
|
sudo incus exec vault -- systemctl stop vault
|
|
|
|
# Create snapshot
|
|
sudo incus snapshot create vault manual-$(date +%Y-%m-%d-%H%M)
|
|
|
|
# Start Vault
|
|
sudo incus exec vault -- systemctl start vault
|
|
|
|
# Verify
|
|
sudo incus snapshot list vault
|
|
```
|
|
|
|
**Restore Vault from snapshot**:
|
|
```bash
|
|
# Stop Vault
|
|
sudo incus exec vault -- systemctl stop vault
|
|
|
|
# List snapshots
|
|
sudo incus snapshot list vault
|
|
|
|
# Restore from latest snapshot
|
|
sudo incus snapshot restore vault <snapshot-name>
|
|
|
|
# Start Vault
|
|
sudo incus exec vault -- systemctl start vault
|
|
|
|
# Verify Vault is running
|
|
sudo incus exec vault -- systemctl status vault
|
|
sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health
|
|
```
|
|
|
|
**Automated snapshots**:
|
|
```bash
|
|
# Create cron job for daily snapshots
|
|
sudo incus exec vault -- bash -c 'cat > /etc/cron.daily/vault-snapshot << EOF
|
|
#!/bin/bash
|
|
systemctl stop vault
|
|
incus snapshot create vault daily-$(date +\%Y\%m\%d)
|
|
systemctl start vault
|
|
EOF
|
|
chmod +x /etc/cron.daily/vault-snapshot'
|
|
```
|
|
|
|
### Update Botserver for Production
|
|
|
|
**Required changes in botserver code**:
|
|
|
|
1. **Read configuration from Vault**:
|
|
- Add Vault client initialization
|
|
- Read service endpoints from Vault
|
|
- Read secrets from Vault
|
|
- Fallback to environment variables if Vault is unavailable
|
|
|
|
2. **Use production paths**:
|
|
- Remove hardcoded development paths
|
|
- Use environment variables for data directory
|
|
- Default to `/opt/gbo/data/` for production
|
|
|
|
3. **Update .env file**:
|
|
```bash
|
|
# /opt/gbo/bin/.env
|
|
VAULT_ADDR=https://<vault-ip>:8200
|
|
VAULT_TOKEN=<vault_token>
|
|
VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt
|
|
DATA_DIR=/opt/gbo/data/
|
|
WORK_DIR=/opt/gbo/work/
|
|
PORT=5858
|
|
```
|
|
|
|
4. **Update systemd service**:
|
|
```bash
|
|
sudo incus exec system -- cat > /etc/systemd/system/botserver.service << 'EOF'
|
|
[Unit]
|
|
Description=BotServer Service
|
|
After=network.target
|
|
|
|
[Service]
|
|
User=root
|
|
Group=root
|
|
WorkingDirectory=/opt/gbo/bin
|
|
EnvironmentFile=/opt/gbo/bin/.env
|
|
ExecStart=/opt/gbo/bin/botserver --noconsole
|
|
Restart=always
|
|
RestartSec=5
|
|
StandardOutput=append:/opt/gbo/logs/out.log
|
|
StandardError=append:/opt/gbo/logs/err.log
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
EOF
|
|
|
|
sudo incus exec system -- systemctl daemon-reload
|
|
sudo incus exec system -- systemctl restart botserver
|
|
```
|
|
|
|
5. **Deploy updated botserver**:
|
|
```bash
|
|
# Push changes to ALM
|
|
cd botserver && git push alm main && git push origin main
|
|
|
|
# CI will build and deploy automatically
|
|
# Or manually deploy (see Manual Deploy section)
|
|
```
|
|
|
|
## Security
|
|
|
|
- **NEVER** push secrets to git
|
|
- **NEVER** commit files to root with credentials
|
|
- **Vault** is single source of truth for secrets
|
|
- **CI/CD** is the only deployment method — never manually scp binaries
|
|
- **ALM** is production — ask before pushing
|