diff --git a/.goutputstream-0HCON3 b/.goutputstream-0HCON3 new file mode 100644 index 0000000..5aaec64 Binary files /dev/null and b/.goutputstream-0HCON3 differ diff --git a/AGENTS.md b/AGENTS.md index e716dd4..fab794d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,6 +6,39 @@ - If in trouble with a tool, go to the official website for install instructions. - See `botserver/src/drive/local_file_monitor.rs` to load bots from `/opt/gbo/data`. +## 🚨 CRITICAL PRODUCTION RULES + +### NEVER Start Services Directly in Production + +When working with the production environment (63.141.255.9), **NEVER** start botserver or botui directly: + +```bash +# ❌ NEVER DO THIS IN PRODUCTION: +sudo incus exec system -- /opt/gbo/bin/botserver # Wrong +sudo incus exec system -- ./botserver # Wrong +sudo incus exec system -- /opt/gbo/bin/botserver & # Wrong + +# ✅ ALWAYS USE THIS: +sudo incus exec system -- systemctl start botserver +sudo incus exec system -- systemctl restart botserver +sudo incus exec system -- systemctl stop botserver +sudo incus exec system -- systemctl status botserver +``` + +**Why:** +- `systemctl` loads `/opt/gbo/bin/.env` (Vault credentials, paths, etc.) +- Direct execution skips environment variables → Vault connection fails → services break +- `systemctl` manages auto-restart, logging, and process lifecycle + +### Development vs Production + +| Environment | Start Method | Env File | +|-------------|--------------|-----------| +| **Development** | `cargo run` or `./target/debug/botserver` | `botserver/.env` | +| **Production** | `systemctl start botserver` | `/opt/gbo/bin/.env` | + +**Only use `cargo run` or direct execution in development!** + --- ## 📁 Workspace Structure diff --git a/PROD.md b/PROD.md new file mode 100644 index 0000000..5f7174a --- /dev/null +++ b/PROD.md @@ -0,0 +1,1336 @@ +# Production Environment Guide + +## ⚠️ CRITICAL PRODUCTION RULES + +**READ THIS FIRST:** + +### 🚫 NEVER Start Services Directly + +In production, **NEVER** start botserver or botui directly. Always use `systemctl`: + +```bash +# ❌ NEVER DO THIS IN PRODUCTION: +/opt/gbo/bin/botserver # Wrong +./botserver # Wrong +/opt/gbo/bin/botserver & # Wrong + +# ✅ ALWAYS USE THIS: +sudo incus exec system -- systemctl start botserver +sudo incus exec system -- systemctl restart botserver +sudo incus exec system -- systemctl stop botserver +sudo incus exec system -- systemctl status botserver +``` + +**Why:** +- `systemctl` loads `/opt/gbo/bin/.env` (Vault credentials, paths, etc.) +- Direct execution skips environment variables → services fail +- `systemctl` manages auto-restart, logging, and dependencies + +### 🔐 Security Rules + +- **NEVER** push secrets to git (API keys, passwords, tokens) +- **NEVER** commit `init.json` (Vault unseal keys) +- **ALWAYS** use Vault for secrets (see [Vault Security Architecture](#vault-security-architecture)) +- **ONLY** `VAULT_*` environment variables allowed in `.env` + +### 🚢 Deployment Rules + +- **NEVER** deploy manually (scp, ssh copy) — use CI/CD only +- **NEVER** push to ALM without asking first +- **ALWAYS** push ALL submodules (botserver, botui, botlib) when pushing main repo +- **ALWAYS** use `systemctl` to restart services after deployment + +--- + +## Infrastructure + +### Servers + +| Host | IP | Purpose | +|------|-----|---------| +| `system` | `` | Main botserver + botui container | +| `alm-ci` | `` | CI/CD runner (Forgejo Actions) | +| `drive` | `` | Object storage | +| `monitor` | `` | Monitoring service | + +### Port Mapping (system container) + +| Service | Internal Port | External URL | +|---------|--------------|--------------| +| botserver | `5858` | `https://system.example.com` | +| botui | `5859` | `https://chat.example.com` | + +### Access + +```bash +# SSH to host +ssh admin@ + +# Execute inside system container +sudo incus exec system -- bash -c 'command' + +# SSH from host to container (used by CI) +ssh -o StrictHostKeyChecking=no system "command" +``` + +## Services + +### botserver.service + +- **Binary**: `/opt/gbo/bin/botserver` +- **Port**: `5858` +- **User**: `gbuser` +- **Logs**: `/opt/gbo/logs/out.log`, `/opt/gbo/logs/err.log` +- **Config**: `/etc/systemd/system/botserver.service` +- **Env**: `PORT=5858` + +### ui.service + +- **Binary**: `/opt/gbo/bin/botui` +- **Port**: `5859` +- **Config**: `/etc/systemd/system/ui.service` +- **Env**: `BOTSERVER_URL=http://localhost:5858` + - ⚠️ MUST be `http://localhost:5858` — NOT `https://system.example.com` + - Rust proxy runs server-side, needs direct localhost access + - JS client uses relative URLs through `chat.example.com` + +### Data Directory + +- **Path**: `/opt/gbo/data/` +- **Structure**: `.gbai/.gbdialog/*.bas` +- **Work dir**: `/opt/gbo/work/` (compiled .ast cache) + +### Stack Services (managed by botserver bootstrap) + +- **Vault**: Secrets management +- **PostgreSQL**: Database (port 5432) +- **Valkey**: Cache (port 6379, password auth) +- **MinIO**: Object storage +- **Zitadel**: Identity provider +- **LLM**: llama.cpp + +## CI/CD Pipeline + +### Repositories + +| Repo | ALM URL | GitHub URL | +|------|---------|------------| +| gb | `https://alm.example.com/organization/gb.git` | `git@github.com:organization/gb.git` | +| botserver | `https://alm.example.com/organization/BotServer.git` | `git@github.com:organization/botserver.git` | +| botui | `https://alm.example.com/organization/BotUI.git` | `git@github.com:organization/botui.git` | +| botlib | `https://alm.example.com/organization/botlib.git` | `git@github.com:organization/botlib.git` | + +### Push Order + +```bash +# 1. Push submodules first +cd botserver && git push alm main && git push origin main && cd .. +cd botui && git push alm main && git push origin main && cd .. + +# 2. Update root workspace references +git add botserver botui botlib +git commit -m "Update submodules: " +git push alm main && git push origin main +``` + +### Build Environment + +- **CI runner**: `ci-runner` container (Debian Trixie, glibc 2.41) +- **Target**: `system` container (Debian 12 Bookworm, glibc 2.36) +- **⚠️ GLIBC MISMATCH**: Building on CI runner produces binaries incompatible with system container +- **Solution**: CI workflow transfers source to system container and builds there via SSH + +### Workflow File + +- **Location**: `botserver/.forgejo/workflows/botserver.yaml` +- **Triggers**: Push to `main` branch +- **Steps**: + 1. Setup workspace on CI runner (clone repos) + 2. Transfer source to system container via `tar | ssh` + 3. Build inside system container (matches glibc 2.36) + 4. Deploy binary inside container + 5. Verify botserver is running + +## Common Operations + +### Check Service Status + +```bash +# From host +sudo incus exec system -- systemctl status botserver --no-pager +sudo incus exec system -- systemctl status ui --no-pager + +# Check if running +sudo incus exec system -- pgrep -f botserver +sudo incus exec system -- pgrep -f botui +``` + +### View Logs + +```bash +# Systemd journal +sudo incus exec system -- journalctl -u botserver --no-pager -n 50 +sudo incus exec system -- journalctl -u ui --no-pager -n 50 + +# Application logs +sudo incus exec system -- tail -50 /opt/gbo/logs/out.log +sudo incus exec system -- tail -50 /opt/gbo/logs/err.log + +# Live tail +sudo incus exec system -- tail -f /opt/gbo/logs/out.log +``` + +### Restart Services + +**CRITICAL PRODUCTION RULE:** In production, NEVER start botserver or botui directly. Always use `systemctl` to ensure proper initialization, environment loading, and logging. + +```bash +sudo incus exec system -- systemctl restart botserver +sudo incus exec system -- systemctl restart ui +``` + +**PROHIBITED in production:** +```bash +# ❌ NEVER DO THIS IN PRODUCTION: +sudo incus exec system -- /opt/gbo/bin/botserver # Wrong - no systemd integration +sudo incus exec system -- /opt/gbo/bin/botserver & # Wrong - no service management +sudo incus exec system -- cd /opt/gbo/bin && ./botserver # Wrong - missing env vars + +# ✅ CORRECT - Always use systemctl: +sudo incus exec system -- systemctl start botserver +sudo incus exec system -- systemctl restart botserver +sudo incus exec system -- systemctl stop botserver +sudo incus exec system -- systemctl status botserver +``` + +**Why:** +- `systemctl` loads `/opt/gbo/bin/.env` (via `EnvironmentFile` in service definition) +- `systemctl` manages process lifecycle, auto-restart, and dependencies +- `systemctl` sends logs to `/opt/gbo/logs/out.log` and `/opt/gbo/logs/err.log` +- Direct execution skips environment variables and systemd service configuration + +### Manual Deploy (emergency) + +```bash +# Kill old process +sudo incus exec system -- killall botserver + +# Copy binary (from host CI workspace or local) +sudo incus exec system -- cp /opt/gbo/ci/botserver/target/debug/botserver /opt/gbo/bin/botserver +sudo incus exec system -- chmod +x /opt/gbo/bin/botserver +sudo incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver + +# Start service +sudo incus exec system -- systemctl start botserver +``` + +### Transfer Bot Files to Production + +```bash +# From local to prod host +tar czf /tmp/bots.tar.gz -C /opt/gbo/data .gbai +scp /tmp/bots.tar.gz admin@:/tmp/ + +# From host to container +sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/' + +# Clear compiled cache +sudo incus exec system -- find /opt/gbo/data -name "*.ast" -delete +sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete +``` + +### Snapshots + +```bash +# List snapshots +sudo incus snapshot list system + +# Restore snapshot +sudo incus snapshot restore system +``` + +## DriveMonitor & Bot Configuration Sync + +### DriveMonitor Architecture + +DriveMonitor is a background service that synchronizes bot files from MinIO (S3-compatible storage) to the local filesystem and database. It monitors three directories per bot: + +| Directory | Purpose | Sync Behavior | +|-----------|---------|---------------| +| `{bot}.gbai/{bot}.gbdialog/` | BASIC scripts (.bas) | Downloads and compiles on change | +| `{bot}.gbai/{bot}.gbot/` | Configuration files | Syncs to `bot_configuration` table | +| `{bot}.gbkb/` | Knowledge base documents | Downloads and indexes for vector search | + +### Bot Configuration Database Tables + +#### `bot_configuration` (main config table) +```sql +-- Location: botserver database +SELECT * FROM bot_configuration WHERE bot_id = ''; + +-- Key columns: +-- - bot_id: Bot UUID (link to bots table) +-- - config_key: Configuration key (e.g., "llm-provider", "system-prompt") +-- - config_value: Configuration value +-- - config_type: Type (string, boolean, number) +-- - is_encrypted: Whether value is encrypted +-- - updated_at: Last modification timestamp +``` + +#### `gbot_config_sync` (sync tracking table) +```sql +-- Location: botserver database +-- Tracks config.csv sync status from bucket +SELECT * FROM gbot_config_sync g + JOIN bots b ON g.bot_id = b.id + WHERE b.name = 'salesianos'; + +-- Key columns: +-- - bot_id: Bot UUID +-- - config_file_path: Path to config.csv in bucket +-- - last_sync_at: Timestamp of last successful sync +-- - file_hash: ETag/MD5 of synced file +-- - sync_count: Number of times synced +``` + +### config.csv Sync Process + +**File Locations:** +- Source: `{bot}.gbai/{bot}.gbot/config.csv` in MinIO bucket +- Sync method: DriveMonitor → ConfigManager → `bot_configuration` table +- Sync frequency: Every 10 seconds (DriveMonitor periodic check) + +**Sync Trigger Conditions:** +1. File ETag changes in MinIO +2. Initial DriveMonitor startup +3. Manual botserver restart + +**CSV Format:** +```csv +llm-provider,groq +llm-api-key,sk-xxx +llm-url,http://localhost:8085 +system-prompt-file,PROMPT.md +theme-color1,#cc0000 +theme-title,MyBot +whatsapp-id,botname +``` + +### Checking Bot Configuration Status + +#### Method 1: Query bot_configuration table +```bash +# Get all config for a bot +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c " + SELECT b.name, bc.config_key, bc.config_value, bc.updated_at + FROM bot_configuration bc + JOIN bots b ON bc.bot_id = b.id + WHERE b.name = 'salesianos' + ORDER BY bc.config_key; +" + +# Get specific LLM provider config +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c " + SELECT config_key, config_value, updated_at + FROM bot_configuration + WHERE bot_id = ( + SELECT id FROM bots WHERE name = 'salesianos' + ) + AND config_key LIKE 'llm-%' + ORDER BY config_key; +" +``` + +#### Method 2: Check DriveMonitor sync status +```bash +# Check if config.csv has been synced +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c " + SELECT b.name, gcs.last_sync_at, gcs.sync_count, gcs.config_file_path + FROM gbot_config_sync gcs + JOIN bots b ON gcs.bot_id = b.id + WHERE b.name IN ('salesianos', 'default'); +" + +-- Empty result = DriveMonitor hasn't synced config.csv yet +-- If sync_count = 0, config.csv exists but hasn't been processed +``` + +#### Method 3: Direct MinIO inspection +```bash +# Check if config.csv exists in bucket +sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/ + +# View config.csv contents +sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv + +# Check file ETag (for sync comparison) +sudo incus exec drive -- /opt/gbo/bin/mc stat local/salesianos.gbai/salesianos.gbot/config.csv +``` + +### DriveMonitor Debugging Logs + +#### Key log patterns to monitor +```bash +# Monitor DriveMonitor activity in real-time +sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)" + +# Check for config.csv sync attempts +sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -20 + +# Check for config synchronization +sudo incus exec system -- grep "sync_gbot_config" /opt/gbo/logs/out.log | tail -20 + +# Check for DriveMonitor errors +sudo incus exec system -- grep -i "drive.*error" /opt/gbo/logs/err.log | tail -20 +``` + +#### Expected successful sync logs +``` +check_gbot: Checking bucket salesianos.gbai for config.csv changes +check_gbot: Found config.csv at path: salesianos.gai/salesianos.gbot/config.csv +info config:Synced config.csv for bot - updated 3 keys +``` + +#### Error patterns and meanings +``` +# Config.csv not found in bucket +check_gbot: Config file not found or inaccessible: path/to/config.csv + +# Sync to database failed +error config:Failed to sync_gbot_config: + +# DriveMonitor not running +(no check_gbot logs in output.log) + +# MinIO connection failed +error drive_monitor:S3/MinIO unavailable for bucket +``` + +### Common Issues and Fixes + +#### Issue 1: config.csv not syncing to database + +**Symptoms:** +- `gbot_config_sync` table empty (0 rows) +- LLM provider changes in bucket not reflected in bot behavior +- Database shows old configuration values + +**Diagnosis:** +```bash +# 1. Check if config.csv exists in bucket +sudo incus exec drive -- /opt/gbo/bin/mc ls local/salesianos.gbai/salesianos.gbot/ + +# 2. Check DriveMonitor logs for sync attempts +sudo incus exec system -- grep "check_gbot" /opt/gbo/logs/out.log | tail -10 + +# 3. Check if DriveMonitor is running for the bot +sudo incus exec system -- ps aux | grep botserver +``` + +**Root Causes:** +1. config.csv missing from `{bot}.gai/{bot}.gbot/` folder +2. DriveMonitor not started for the bot +3. MinIO connection issues +4. Database write permissions + +**Fixes:** +```bash +# Case 1: Create missing config.csv +sudo incus exec drive -- bash -c ' +cat > /tmp/config.csv << EOF +llm-provider,groq +llm-api-key,your-api-key +llm-url,http://localhost:8085 +system-prompt-file,PROMPT.md +theme-color1,#cc0000 +theme-title,Salesianos +EOF +/opt/gbo/bin/mc cp /tmp/config.csv local/salesianos.gbai/salesianos.gbot/config.csv +' + +# Case 2: Restart botserver to reinitialize DriveMonitor +sudo incus exec system -- systemctl restart botserver + +# Case 3: Force immediate sync by touching config.csv +sudo incus exec drive -- /opt/gbo/bin/mc cp local/salesianos.gbai/salesianos.gbot/config.csv local/salesianos.gbai/salesianos.gbot/config.csv +``` + +#### Issue 2: LLM provider changes not taking effect + +**Symptoms:** +- config.csv shows correct provider (e.g., groq) +- Bot still uses old provider +- Database shows old value + +**Diagnosis:** +```bash +# Compare bucket vs database +BUCKET_PROVIDER=$(sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv | grep "^llm-provider" | cut -d',' -f2) +DB_PROVIDER=$(sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c " + SELECT config_value FROM bot_configuration + WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') + AND config_key = 'llm-provider'; +") + +echo "Bucket: $BUCKET_PROVIDER" +echo "Database: $DB_PROVIDER" + +# Check last sync time +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -t -c " + SELECT last_sync_at FROM gbot_config_sync + WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos'); +" +``` + +**Fix:** +```bash +# If sync is stale (> 10 minutes), restart DriveMonitor +sudo incus exec system -- systemctl restart botserver + +# Or manually update config value in database (temporary fix) +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c " + UPDATE bot_configuration + SET config_value = 'groq', updated_at = NOW() + WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') + AND config_key = 'llm-provider'; +" +``` + +#### Issue 3: DriveMonitor not checking for changes + +**Symptoms:** +- No new log entries after 30 seconds +- File changes in bucket not detected +- Bot compilation not happening after .bas file updates + +**Diagnosis:** +```bash +# Check DriveMonitor loop logs +sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep "DRIVE_MONITOR.*Inside monitoring loop" + +# Check if is_processing flag is stuck +sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -E "(is_processing|monitoring loop)" +``` + +**Fix:** +```bash +# Restart botserver to clear stuck state +sudo incus exec system -- systemctl restart botserver + +# Monitor startup logs to verify DriveMonitor started +sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep "Drive Monitor" +``` + +### Database Schema Reference + +#### List all bot databases +```bash +sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "\l" | grep bot_ +``` + +#### List tables in a specific bot database +```bash +sudo incus exec tables -- psql -h localhost -U postgres -d bot_salesianos -c "\dt" +``` + +#### List botserver management tables +```bash +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "\dt" | grep -E "(bot|config|sync)" +``` + +### Connection Methods Summary + +| Method | Use Case | Command Pattern | +|--------|-----------|-----------------| +| **SSH to host** | Initial access, file transfer | `ssh admin@63.141.255.9` | +| **incus exec** | Execute inside container | `sudo incus exec system -- command` | +| **psql direct** | Database queries from container | `sudo incus exec tables -- psql ...` | +| **mc (MinIO CLI)** | Inspect buckets, copy files | `sudo incus exec drive -- /opt/gbo/bin/mc ...` | +| **HTTP/curl** | Service health checks | `curl http://:5858/health` | +| **journalctl** | Systemd service logs | `sudo incus exec system -- journalctl -u botserver` | + +## Vault Security Architecture + +### Overview + +The production environment uses **HashiCorp Vault** as the centralized secrets management system. All sensitive credentials (database passwords, API keys, tokens) are stored in Vault, NEVER in code or environment files. + +### Vault Connection Flow + +``` +1. botserver starts + ↓ +2. Reads VAULT_ADDR, VAULT_TOKEN from .env + ↓ +3. Initializes VaultClient with TLS/mTLS + ↓ +4. Reads secrets from Vault paths (gbo/tables, gbo/drive, etc.) + ↓ +5. Falls back to defaults if Vault unavailable +``` + +### Environment Variables (Allowed) + +**File Location:** `/opt/gbo/bin/.env` (system container) + +```bash +# Vault Connection (MANDATORY for production) +VAULT_ADDR=https://:8200 +VAULT_TOKEN= +VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + +# Optional: Skip TLS verification (NOT recommended for production) +VAULT_SKIP_VERIFY=false + +# Optional: Use mTLS certificates +VAULT_CLIENT_CERT=/opt/gbo/conf/system/certificates/botserver/client.crt +VAULT_CLIENT_KEY=/opt/gbo/conf/system/certificates/botserver/client.key + +# Optional: Cache TTL in seconds (default: 300) +VAULT_CACHE_TTL=300 + +# Server Configuration +PORT=5858 +DATA_DIR=/opt/gbo/data/ +WORK_DIR=/opt/gbo/work/ +LOAD_ONLY=default,salesianos +``` + +**Security Rule:** +- **ONLY** `VAULT_*` environment variables are allowed in `.env` +- All other secrets MUST come from Vault +- Hardcoded secrets in code are FORBIDDEN (see AGENTS.md) + +### Vault Secret Paths Structure + +#### System-Wide Paths (Global) + +| Path | Purpose | Example Keys | +|------|---------|---------------| +| `gbo/tables` | Database (PostgreSQL) | host, port, database, username, password | +| `gbo/drive` | MinIO (Object Storage) | host, accesskey, secret | +| `gbo/cache` | Valkey (Redis) | host, port, password | +| `gbo/directory` | Zitadel (Auth) | url, project_id, client_id, client_secret | +| `gbo/email` | SMTP Email | smtp_host, smtp_port, smtp_user, smtp_password | +| `gbo/llm` | LLM Configuration | url, model, openai_key, anthropic_key | +| `gbo/vectordb` | Qdrant (Vector DB) | url, api_key | +| `gbo/jwt` | JWT Signing | secret | +| `gbo/meet` | Jitsi Meet | url, app_id, app_secret | +| `gbo/alm` | ALM Repository | url, token | +| `gbo/encryption` | Encryption Keys | master_key | +| `gbo/system/observability` | Monitoring | url, org, bucket, token | +| `gbo/system/security` | Security Policies | require_auth, anonymous_paths | +| `gbo/system/cloud` | Cloud Config | region, access_key, secret_key | +| `gbo/system/app` | Application Settings | url, environment | +| `gbo/system/models` | BotModels API | url | + +#### Organization-Specific Paths + +| Path Pattern | Purpose | +|--------------|---------| +| `gbo/orgs/{org_id}/config` | Organization configuration | +| `gbo/orgs/{org_id}/bots/{bot_id}` | Bot-specific secrets | +| `gbo/orgs/{org_id}/users/{user_id}` | User-specific secrets | +| `gbo/tenants/{tenant_id}/infrastructure` | Tenant database/cache/drive | +| `gbo/tenants/{tenant_id}/config` | Tenant configuration | + +### Credential Resolution Hierarchy + +For bot email configuration (example): +``` +1. Check gbo/orgs/{org_id}/bots/{bot_id}/email +2. Fallback: gbo/bots/default/email +3. Fallback: gbo/email +4. Fallback: Environment variables (development only) +``` + +### Vault Client Initialization (Code Reference) + +**File:** `botserver/src/core/secrets/mod.rs` + +```rust +// SecretsManager::from_env() reads: +// - VAULT_ADDR (required) +// - VAULT_TOKEN (required) +// - VAULT_CACERT (optional, has default) +// - VAULT_SKIP_VERIFY (optional, default: false) +// - VAULT_CLIENT_CERT (optional, mTLS) +// - VAULT_CLIENT_KEY (optional, mTLS) +// - VAULT_CACHE_TTL (optional, default: 300s) + +impl SecretsManager { + pub fn from_env() -> Result { + let addr = env::var("VAULT_ADDR").unwrap_or_default(); + let token = env::var("VAULT_TOKEN").unwrap_or_default(); + + if token.is_empty() || addr.is_empty() { + // Vault not configured - use environment variables directly + warn!("Vault not configured. Using environment variables directly."); + return Ok(Self { client: None, enabled: false, ... }); + } + + // Initialize VaultClient with TLS + let client = VaultClient::new(settings)?; + Ok(Self { client: Some(client), enabled: true, ... }) + } +} +``` + +### Vault Operations - Production Usage + +#### Read Secrets from Vault + +```bash +# From system container (using vault CLI) +sudo incus exec system -- bash -c ' + export VAULT_ADDR=https://10.157.134.250:8200 + export VAULT_TOKEN= + export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + + # Read database secrets + vault kv get -field=password secret/gbo/tables + vault kv get secret/gbo/tables + + # Read drive secrets + vault kv get secret/gbo/drive + + # Read LLM configuration + vault kv get secret/gbo/llm +' +``` + +#### Read Secrets via HTTP API (from any container) + +```bash +sudo incus exec system -- curl -sf \ + --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \ + -H "X-Vault-Token: " \ + https://10.157.134.250:8200/v1/secret/data/gbo/drive | jq +``` + +#### Verify Vault Health + +```bash +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health + +# Expected output: +# {"initialized":true,"sealed":false,"standby":false,"performance_standby":false,"replication_performance_mode":"disabled","replication_dr_mode":"disabled","server_time_utc":"2026-04-10T13:55:00.123Z"} +``` + +### init.json (Vault Initialization Data) + +**Location:** `/opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json` + +**Purpose:** Stores Vault unseal keys and root token (created during Vault initialization) + +**Contents:** +```json +{ + "recovery_keys_b64": [], + "recovery_keys_hex": [], + "recovery_keys_shares": 0, + "recovery_keys_threshold": 0, + "root_token": "", + "unseal_keys_b64": ["<5 unseal keys base64-encoded>"], + "unseal_keys_hex": ["<5 unseal keys hex-encoded>"], + "unseal_shares": 5, + "unseal_threshold": 3 +} +``` + +**Security Notes:** +- `root_token`: Used to authenticate to Vault as admin +- `unseal_keys`: Required to unseal Vault after restart (5 keys, need 3 to unseal) +- **CRITICAL:** Store `init.json` in a secure, encrypted location +- Never commit `init.json` to git or store in repo + +### Troubleshooting Vault Connection + +#### Issue 1: Botserver cannot connect to Vault + +**Symptoms:** +- Logs show "Vault connection failed" +- Secrets fall back to defaults +- Bot cannot authenticate to database + +**Diagnosis:** +```bash +# Check Vault is running +sudo incus exec vault -- systemctl status vault + +# Check Vault health +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health + +# Check .env has Vault credentials +sudo incus exec system -- grep "^VAULT_" /opt/gbo/bin/.env + +# Test Vault connection from system container +sudo incus exec system -- bash -c ' + curl -k -sf --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \ + -H "X-Vault-Token: $(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2)" \ + https://10.157.134.250:8200/v1/secret/data/gbo/tables +' +``` + +**Common Causes:** +1. Vault service not running (vault container stopped) +2. `VAULT_TOKEN` expired or invalid +3. TLS certificate path incorrect or CA certificate missing +4. Network connectivity between system and vault containers + +**Fix:** +```bash +# 1. Restart Vault if stopped +sudo incus exec vault -- systemctl restart vault + +# 2. Generate new token if expired +sudo incus exec vault -- bash -c ' + export VAULT_ADDR=https://localhost:8200 + export VAULT_TOKEN= + vault token create -policy="botserver" -ttl="8760h" -format=json | jq -r .auth.client_token +' + +# 3. Update .env with new token +sudo incus exec system -- sed -i "s|VAULT_TOKEN=.*|VAULT_TOKEN=|" /opt/gbo/bin/.env + +# 4. Restart botserver +sudo incus exec system -- systemctl restart botserver +``` + +#### Issue 2: Secrets not being read from Vault + +**Symptoms:** +- Logs show "Vault read failed for 'gbo/drive'" +- Services use default credentials +- DriveMonitor cannot access MinIO + +**Diagnosis:** +```bash +# Check if Vault has secrets configured +sudo incus exec system -- bash -c ' + export VAULT_ADDR=https://10.157.134.250:8200 + export VAULT_TOKEN=$(grep VAULT_TOKEN /opt/gbo/bin/.env | cut -d= -f2) + export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + + echo "=== Database Secrets ===" + vault kv get secret/gbo/tables || echo "NOT FOUND" + + echo "=== Drive Secrets ===" + vault kv get secret/gbo/drive || echo "NOT FOUND" + + echo "=== LLM Secrets ===" + vault kv get secret/gbo/llm || echo "NOT FOUND" +' +``` + +**Fix - Adding Secrets to Vault:** +```bash +sudo incus exec vault -- bash -c ' + export VAULT_ADDR=https://localhost:8200 + export VAULT_TOKEN= + + # Add database secrets + vault kv put secret/gbo/tables \ + host= \ + port=5432 \ + database=botserver \ + username=gbuser \ + password= + + # Add drive (MinIO) secrets + vault kv put secret/gbo/drive \ + host= \ + port=9100 \ + accesskey= \ + secret= + + # Add LLM secrets + vault kv put secret/gbo/llm \ + url=http://localhost:8085 \ + model=gpt-4 \ + openai_key= \ + anthropic_key= +' +``` + +#### Issue 3: Vault sealed after restart + +**Symptoms:** +- All Vault operations fail +- botserver cannot read secrets +- Logs show "Vault is sealed" + +**Diagnosis:** +```bash +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health | jq .sealed +``` + +**Fix - Unseal Vault:** +```bash +sudo incus exec vault -- bash -c ' + # Need 3 of 5 unseal keys from init.json + vault operator unseal + vault operator unseal + vault operator unseal + + # Verify unsealed + vault status +' +``` + +#### Issue 4: TLS certificate errors + +**Symptoms:** +- "certificate verify failed" errors +- TLS handshake failures +- curl: (60) SSL certificate problem + +**Diagnosis:** +```bash +sudo incus exec system -- bash -c ' + # Check CA certificate exists + ls -la /opt/gbo/conf/system/certificates/ca/ca.crt + + # Test certificate + openssl x509 -in /opt/gbo/conf/system/certificates/ca/ca.crt -text -noout +' +``` + +**Fix:** +```bash +# If CA cert is missing, copy from vault container +sudo incus exec vault -- cp /opt/gbo/conf/vault/ca.crt /tmp/ + +sudo incus exec system -- mkdir -p /opt/gbo/conf/system/certificates/ca/ +sudo incus exec system -- bash -c ' + # Copy certificate from vault container + incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt + cp /tmp/ca.crt /opt/gbo/conf/system/certificates/ca/ + chmod 644 /opt/gbo/conf/system/certificates/ca/ca.crt +' +``` + +### Security Best Practices + +1. **Never commit secrets to git** + - No API keys, passwords, tokens in code + - Use Vault for ALL sensitive data + - Init secrets from `SecretsManager::from_env()` + +2. **Use Vault for all service credentials** + - Database passwords: `gbo/tables` + - MinIO keys: `gbo/drive` + - LLM API keys: `gbo/llm` + - Email passwords: `gbo/email` + +3. **Rotate credentials regularly** + - Generate new tokens/keys periodically + - Update Vault using `vault kv put` + - No need to restart services (next read gets new values) + +4. **Enable TLS/mTLS in production** + - Always use `VAULT_CACERT` + - Enable mTLS for critical services: `VAULT_CLIENT_CERT` + `VAULT_CLIENT_KEY` + - Never use `VAULT_SKIP_VERIFY=true` in production + +5. **Limit token lifetimes** + - Root token: single use or very short TTL + - Service tokens: limited to needed time (e.g., 8760h = 1 year) + - Generate new tokens when old ones expire + +6. **Audit Vault access** + ```bash + # Check recent Vault operations + sudo incus exec vault -- vault audit list + sudo incus exec vault -- vault audit file /var/log/vault_audit.log + ``` + +### Vault Backup & Recovery + +#### Backup Vault Data + +```bash +# Snapshot vault container (includes all secrets) +sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M) + +# Export Vault config (init.json with unseal keys) +sudo incus exec vault -- cat /opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json > /tmp/vault-init.json + +# Backup all secrets (JSON format) +sudo incus exec vault -- bash -c ' + export VAULT_ADDR=https://localhost:8200 + export VAULT_TOKEN= + + # Backup each path + for path in gbo/tables gbo/drive gbo/cache gbo/llm; do + vault kv get -format=json secret/$path > /tmp/vault-$path.json + done +' +``` + +#### Restore from Snapshot + +```bash +# Stop vault +sudo incus exec vault -- systemctl stop vault + +# Restore snapshot +sudo incus snapshot restore vault + +# Start vault +sudo incus exec vault -- systemctl start vault + +# Wait for Vault to be ready +sleep 10 + +# Verify health +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health +``` + +## Troubleshooting + +### GLIBC Version Mismatch + +**Symptom**: `GLIBC_2.39 not found` or `GLIBC_2.38 not found` + +**Cause**: Binary compiled on CI runner (glibc 2.41) but runs in system container (glibc 2.36) + +**Fix**: CI workflow must build inside the system container. Check `botserver.yaml` uses SSH to build in container. + +### botserver Not Starting + +```bash +# Check binary +sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found" + +# Check direct execution +sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1 + +# Check data directory +sudo incus exec system -- ls -la /opt/gbo/data/ +``` + +### botui Can't Reach botserver + +```bash +# Check BOTSERVER_URL +sudo incus exec system -- grep BOTSERVER_URL /etc/systemd/system/ui.service + +# Must be http://localhost:5858, NOT https://system.example.com +# Fix: +sudo incus exec system -- sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' /etc/systemd/system/ui.service +sudo incus exec system -- systemctl daemon-reload +sudo incus exec system -- systemctl restart ui +``` + +### Suggestions Not Showing + +```bash +# Check bot files exist +sudo incus exec system -- ls -la /opt/gbo/data/.gbai/.gbdialog/ + +# Check for compilation errors +sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep -i "error\|fail\|compile" + +# Clear cache and restart +sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete +sudo incus exec system -- systemctl restart botserver +``` + +### IPv6 DNS Issues + +**Symptom**: External API calls (Groq, Cloudflare) timeout + +**Cause**: Container DNS returns AAAA records but no IPv6 connectivity + +**Fix**: Container has `IPV6=no` in network config and `gai.conf` labels. If issues persist, check `RES_OPTIONS=inet4` in botserver.service. + +### Vault Connection & Service Discovery Issues + +**Symptom**: Logs show `Failed to read data directory ` or `Config scan failed` + +**Cause**: Botserver is using hardcoded development paths instead of production paths + +**Fix**: + +1. **Check current configuration**: + ```bash + # Check .env file + sudo incus exec system -- cat /opt/gbo/bin/.env + + # Check data directory + sudo incus exec system -- ls -la /opt/gbo/data/ + sudo incus exec system -- ls -la /opt/gbo/work/ + ``` + +2. **Verify Vault connection**: + ```bash + # Test Vault from system container + sudo incus exec system -- curl -k -sf https://:8200/v1/sys/health + + # Check Vault token + sudo incus exec system -- grep VAULT_TOKEN /opt/gbo/bin/.env + ``` + +3. **Check service discovery**: + ```bash + # Check if botserver is reading Vault secrets + sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i vault + + # Check for service configuration errors + sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "config\|service" + ``` + +4. **Fix data directory paths**: + - Ensure botserver uses `/opt/gbo/data/` instead of development paths + - Update configuration if hardcoded paths exist + - Restart botserver after fixing + +5. **Verify all services are accessible**: + ```bash + # Check PostgreSQL + sudo incus exec system -- pg_isready -h -p 5432 + + # Check Valkey + sudo incus exec system -- redis-cli -h -a ping + + # Check MinIO + sudo incus exec system -- curl -sf http://:9100/minio/health/live + ``` + +6. **Update botserver configuration**: + - Ensure botserver reads from `/opt/gbo/bin/.env` for Vault configuration + - Verify service discovery uses Vault to get service endpoints + - Check that data directory is set to `/opt/gbo/data/` in configuration + - Update systemd service if needed: + ```bash + sudo incus exec system -- cat /etc/systemd/system/botserver.service + # Ensure EnvironmentFile=/opt/gbo/bin/.env is present + ``` + +7. **Test after fixes**: + ```bash + # Restart botserver + sudo incus exec system -- systemctl restart botserver + + # Wait for startup + sleep 10 + + # Check logs for errors + sudo incus exec system -- tail -50 /opt/gbo/logs/err.log + + # Verify health endpoint + curl -sf http://:5858/health + ``` + +### Vault Connection Errors + +**Symptom**: `Vault connection failed` or `Vault token invalid` + +**Fix**: +```bash +# Check Vault is running +sudo incus exec vault -- systemctl status vault + +# Check Vault health +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health + +# Verify token is valid +sudo incus exec system -- bash -c ' + export VAULT_ADDR=https://:8200 + export VAULT_TOKEN= + export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + vault token lookup +' + +# If token is invalid, generate new one +sudo incus exec vault -- bash -c ' + export VAULT_ADDR=https://localhost:8200 + export VAULT_TOKEN= + vault token create -policy="botserver" -ttl="8760h" +' + +# Update .env with new token +sudo incus exec system -- sed -i 's|VAULT_TOKEN=.*|VAULT_TOKEN=|' /opt/gbo/bin/.env +sudo incus exec system -- systemctl restart botserver +``` + +### Service Discovery Failures + +**Symptom**: `Service not found` or `Failed to connect to service` + +**Fix**: +```bash +# Check if service is running +sudo incus exec tables -- systemctl status postgresql +sudo incus exec cache -- systemctl status valkey +sudo incus exec drive -- systemctl status minio + +# Check if service is accessible from system container +sudo incus exec system -- nc -zv 5432 # PostgreSQL +sudo incus exec system -- nc -zv 6379 # Valkey +sudo incus exec system -- nc -zv 9100 # MinIO + +# Check Vault has service configuration +sudo incus exec system -- bash -c ' + export VAULT_ADDR=https://:8200 + export VAULT_TOKEN= + export VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + vault kv list secret/botserver +' + +# If service config is missing, add it (see Vault Configuration section) +``` + +### Monitoring & Verification + +**Check botserver is working correctly**: +```bash +# Health check +curl -sf http://:5858/health + +# Check logs for errors +sudo incus exec system -- tail -100 /opt/gbo/logs/err.log | grep -i "error\|fail" + +# Check logs for successful service connections +sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "connected\|service\|vault" + +# Verify data directory is correct +sudo incus exec system -- tail -100 /opt/gbo/logs/out.log | grep -i "data\|work" + +# Should show /opt/gbo/data/ and /opt/gbo/work/, not development paths +``` + +**Expected log output**: +``` +info vault:Connected to Vault at https://:8200 +info service_discovery:Loaded service configuration from Vault +info database:Connected to PostgreSQL at :5432 +info cache:Connected to Valkey at :6379 +info storage:Connected to MinIO at http://:9100 +info watcher:Watching data directory /opt/gbo/data +info botserver:BotServer started successfully on port 5858 +``` + +**If logs show errors**: +1. Check Vault connection (see Vault Connection Errors section) +2. Check service accessibility (see Service Discovery Failures section) +3. Fix data directory paths (see Fix Development Paths in Production section) +4. Restart botserver and verify again + +### Vault Backup & Restore + +**Create Vault snapshot**: +```bash +# Stop Vault +sudo incus exec vault -- systemctl stop vault + +# Create snapshot +sudo incus snapshot create vault manual-$(date +%Y-%m-%d-%H%M) + +# Start Vault +sudo incus exec vault -- systemctl start vault + +# Verify +sudo incus snapshot list vault +``` + +**Restore Vault from snapshot**: +```bash +# Stop Vault +sudo incus exec vault -- systemctl stop vault + +# List snapshots +sudo incus snapshot list vault + +# Restore from latest snapshot +sudo incus snapshot restore vault + +# Start Vault +sudo incus exec vault -- systemctl start vault + +# Verify Vault is running +sudo incus exec vault -- systemctl status vault +sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health +``` + +**Automated snapshots**: +```bash +# Create cron job for daily snapshots +sudo incus exec vault -- bash -c 'cat > /etc/cron.daily/vault-snapshot << EOF +#!/bin/bash +systemctl stop vault +incus snapshot create vault daily-$(date +\%Y\%m\%d) +systemctl start vault +EOF +chmod +x /etc/cron.daily/vault-snapshot' +``` + +### Update Botserver for Production + +**Required changes in botserver code**: + +1. **Read configuration from Vault**: + - Add Vault client initialization + - Read service endpoints from Vault + - Read secrets from Vault + - Fallback to environment variables if Vault is unavailable + +2. **Use production paths**: + - Remove hardcoded development paths + - Use environment variables for data directory + - Default to `/opt/gbo/data/` for production + +3. **Update .env file**: + ```bash + # /opt/gbo/bin/.env + VAULT_ADDR=https://:8200 + VAULT_TOKEN= + VAULT_CACERT=/opt/gbo/conf/system/certificates/ca/ca.crt + DATA_DIR=/opt/gbo/data/ + WORK_DIR=/opt/gbo/work/ + PORT=5858 + ``` + +4. **Update systemd service**: + ```bash + sudo incus exec system -- cat > /etc/systemd/system/botserver.service << 'EOF' + [Unit] + Description=BotServer Service + After=network.target + + [Service] + User=root + Group=root + WorkingDirectory=/opt/gbo/bin + EnvironmentFile=/opt/gbo/bin/.env + ExecStart=/opt/gbo/bin/botserver --noconsole + Restart=always + RestartSec=5 + StandardOutput=append:/opt/gbo/logs/out.log + StandardError=append:/opt/gbo/logs/err.log + + [Install] + WantedBy=multi-user.target + EOF + + sudo incus exec system -- systemctl daemon-reload + sudo incus exec system -- systemctl restart botserver + ``` + +5. **Deploy updated botserver**: + ```bash + # Push changes to ALM + cd botserver && git push alm main && git push origin main + + # CI will build and deploy automatically + # Or manually deploy (see Manual Deploy section) + ``` + +## Security + +- **NEVER** push secrets to git +- **NEVER** commit files to root with credentials +- **Vault** is single source of truth for secrets +- **CI/CD** is the only deployment method — never manually scp binaries +- **ALM** is production — ask before pushing diff --git a/botserver b/botserver index dc933c2..918cb62 160000 --- a/botserver +++ b/botserver @@ -1 +1 @@ -Subproject commit dc933c22e4a14dd1ee2acfe342e6b5e7f54c802a +Subproject commit 918cb623a17db5f35aaf56ff49945f1052b1e73a diff --git a/prompts/htmlview.md b/prompts/htmlview.md new file mode 100644 index 0000000..de5dc48 --- /dev/null +++ b/prompts/htmlview.md @@ -0,0 +1 @@ +AYVRWxru3Ciwlw7E GXmWnXQYXjn1OoK4kWnY3579FJVYTGBT \ No newline at end of file diff --git a/prompts/prod.md b/prompts/prod.md deleted file mode 100644 index d7a7c37..0000000 --- a/prompts/prod.md +++ /dev/null @@ -1,259 +0,0 @@ -# Production Environment Guide - -## Infrastructure - -### Servers - -| Host | IP | Purpose | -|------|-----|---------| -| `system` | `10.157.134.196` | Main botserver + botui container | -| `alm-ci` | `10.157.134.200` | CI/CD runner (Forgejo Actions) | -| `alm` | `10.157.134.34` | Forgejo git server | -| `dns` | `10.157.134.214` | DNS container | -| `drive` | `10.157.134.206` | Drive storage | -| `email` | `10.157.134.40` | Email service | -| `proxy` | `10.157.134.241` | Reverse proxy | -| `tables` | `10.157.134.174` | PostgreSQL | -| `table-editor` | `10.157.134.184` | Table editor | -| `webmail` | `10.157.134.86` | Webmail | - -### Port Mapping (system container) - -| Service | Internal Port | External URL | -|---------|--------------|--------------| -| botserver | `5858` | `https://system.pragmatismo.com.br` | -| botui | `5859` | `https://chat.pragmatismo.com.br` | - -### Access - -```bash -# SSH to host -ssh administrator@63.141.255.9 - -# Execute inside system container -sudo incus exec system -- bash -c 'command' - -# SSH from host to container (used by CI) -ssh -o StrictHostKeyChecking=no system "command" -``` - -## Services - -### botserver.service - -- **Binary**: `/opt/gbo/bin/botserver` -- **Port**: `5858` -- **User**: `gbuser` -- **Logs**: `/opt/gbo/logs/out.log`, `/opt/gbo/logs/err.log` -- **Config**: `/etc/systemd/system/botserver.service` -- **Env**: `PORT=5858` - -### ui.service - -- **Binary**: `/opt/gbo/bin/botui` -- **Port**: `5859` -- **Config**: `/etc/systemd/system/ui.service` -- **Env**: `BOTSERVER_URL=http://localhost:5858` - - ⚠️ MUST be `http://localhost:5858` — NOT `https://system.pragmatismo.com.br` - - Rust proxy runs server-side, needs direct localhost access - - JS client uses relative URLs through `chat.pragmatismo.com.br` - -### Data Directory - -- **Path**: `/opt/gbo/data/` -- **Structure**: `.gbai/.gbdialog/*.bas` -- **Bots**: cristo, fema, jucees, oerlabs, poupatempo, pragmatismogb, salesianos, sentient, seplagse -- **Work dir**: `/opt/gbo/work/` (compiled .ast cache) - -### Stack Services (managed by botserver bootstrap) - -- **Vault**: Secrets management -- **PostgreSQL**: Database (port 5432) -- **Valkey**: Cache (port 6379, password auth) -- **MinIO**: Object storage -- **Zitadel**: Identity provider -- **LLM**: llama.cpp - -## CI/CD Pipeline - -### Repositories - -| Repo | ALM URL | GitHub URL | -|------|---------|------------| -| gb | `https://alm.pragmatismo.com.br/GeneralBots/gb.git` | `git@github.com:GeneralBots/gb.git` | -| botserver | `https://alm.pragmatismo.com.br/GeneralBots/BotServer.git` | `git@github.com:GeneralBots/botserver.git` | -| botui | `https://alm.pragmatismo.com.br/GeneralBots/BotUI.git` | `git@github.com:GeneralBots/botui.git` | -| botlib | `https://alm.pragmatismo.com.br/GeneralBots/botlib.git` | `git@github.com:GeneralBots/botlib.git` | - -### Push Order - -```bash -# 1. Push submodules first -cd botserver && git push alm main && git push origin main && cd .. -cd botui && git push alm main && git push origin main && cd .. - -# 2. Update root workspace references -git add botserver botui botlib -git commit -m "Update submodules: " -git push alm main && git push origin main -``` - -### Build Environment - -- **CI runner**: `alm-ci` container (Debian Trixie, glibc 2.41) -- **Target**: `system` container (Debian 12 Bookworm, glibc 2.36) -- **⚠️ GLIBC MISMATCH**: Building on CI runner produces binaries incompatible with system container -- **Solution**: CI workflow transfers source to system container and builds there via SSH - -### Workflow File - -- **Location**: `botserver/.forgejo/workflows/botserver.yaml` -- **Triggers**: Push to `main` branch -- **Steps**: - 1. Setup workspace on CI runner (clone repos) - 2. Transfer source to system container via `tar | ssh` - 3. Build inside system container (matches glibc 2.36) - 4. Deploy binary inside container - 5. Verify botserver is running - -## Common Operations - -### Check Service Status - -```bash -# From host -sudo incus exec system -- systemctl status botserver --no-pager -sudo incus exec system -- systemctl status ui --no-pager - -# Check if running -sudo incus exec system -- pgrep -f botserver -sudo incus exec system -- pgrep -f botui -``` - -### View Logs - -```bash -# Systemd journal -sudo incus exec system -- journalctl -u botserver --no-pager -n 50 -sudo incus exec system -- journalctl -u ui --no-pager -n 50 - -# Application logs -sudo incus exec system -- tail -50 /opt/gbo/logs/out.log -sudo incus exec system -- tail -50 /opt/gbo/logs/err.log - -# Live tail -sudo incus exec system -- tail -f /opt/gbo/logs/out.log -``` - -### Restart Services - -```bash -sudo incus exec system -- systemctl restart botserver -sudo incus exec system -- systemctl restart ui -``` - -### Manual Deploy (emergency) - -```bash -# Kill old process -sudo incus exec system -- killall botserver - -# Copy binary (from host CI workspace or local) -sudo incus exec system -- cp /opt/gbo/ci/botserver/target/debug/botserver /opt/gbo/bin/botserver -sudo incus exec system -- chmod +x /opt/gbo/bin/botserver -sudo incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver - -# Start service -sudo incus exec system -- systemctl start botserver -``` - -### Transfer Bot Files to Production - -```bash -# From local to prod host -tar czf /tmp/bots.tar.gz -C /opt/gbo/data .gbai -scp /tmp/bots.tar.gz administrator@63.141.255.9:/tmp/ - -# From host to container -sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/' - -# Clear compiled cache -sudo incus exec system -- find /opt/gbo/data -name "*.ast" -delete -sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete -``` - -### Snapshots - -```bash -# List snapshots -sudo incus snapshot list system - -# Restore snapshot -sudo incus snapshot restore system -``` - -## Troubleshooting - -### GLIBC Version Mismatch - -**Symptom**: `GLIBC_2.39 not found` or `GLIBC_2.38 not found` - -**Cause**: Binary compiled on CI runner (glibc 2.41) but runs in system container (glibc 2.36) - -**Fix**: CI workflow must build inside the system container. Check `botserver.yaml` uses SSH to build in container. - -### botserver Not Starting - -```bash -# Check binary -sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found" - -# Check direct execution -sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1 - -# Check data directory -sudo incus exec system -- ls -la /opt/gbo/data/ -``` - -### botui Can't Reach botserver - -```bash -# Check BOTSERVER_URL -sudo incus exec system -- grep BOTSERVER_URL /etc/systemd/system/ui.service - -# Must be http://localhost:5858, NOT https://system.pragmatismo.com.br -# Fix: -sudo incus exec system -- sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' /etc/systemd/system/ui.service -sudo incus exec system -- systemctl daemon-reload -sudo incus exec system -- systemctl restart ui -``` - -### Suggestions Not Showing - -```bash -# Check bot files exist -sudo incus exec system -- ls -la /opt/gbo/data/.gbai/.gbdialog/ - -# Check for compilation errors -sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep -i "error\|fail\|compile" - -# Clear cache and restart -sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete -sudo incus exec system -- systemctl restart botserver -``` - -### IPv6 DNS Issues - -**Symptom**: External API calls (Groq, Cloudflare) timeout - -**Cause**: Container DNS returns AAAA records but no IPv6 connectivity - -**Fix**: Container has `IPV6=no` in network config and `gai.conf` labels. If issues persist, check `RES_OPTIONS=inet4` in botserver.service. - -## Security - -- **NEVER** push secrets to git -- **NEVER** commit files to root with credentials -- **Vault** is single source of truth for secrets -- **CI/CD** is the only deployment method — never manually scp binaries -- **ALM** is production — ask before pushing diff --git a/restart-fast.sh b/restart-fast.sh deleted file mode 100644 index 2e5f0e1..0000000 --- a/restart-fast.sh +++ /dev/null @@ -1,49 +0,0 @@ -#!/bin/bash - -echo "=== Fast Restart: botserver + botmodels only ===" - -# Kill only the app services, keep infra running -pkill -f "botserver --noconsole" || true -pkill -f "botmodels" || true - -# Clean logs -rm -f botserver.log botmodels.log - -# Build only botserver (botui likely already built) -cargo build -p botserver - -# Start botmodels -cd botmodels -source venv/bin/activate -uvicorn src.main:app --host 0.0.0.0 --port 8085 > ../botmodels.log 2>&1 & -echo " botmodels PID: $!" -cd .. - -# Wait for botmodels -for i in $(seq 1 20); do - if curl -s http://localhost:8085/api/health > /dev/null 2>&1; then - echo " botmodels ready" - break - fi - sleep 1 -done - -# Start botserver (keep botui running if already up) -if ! pgrep -f "botui" > /dev/null; then - echo "Starting botui..." - cargo build -p botui - cd botui - BOTSERVER_URL="http://localhost:8080" ./target/debug/botui > ../botui.log 2>&1 & - echo " botui PID: $!" - cd .. -fi - -# Start botserver -BOTMODELS_HOST="http://localhost:8085" BOTMODELS_API_KEY="starter" RUST_LOG=info ./target/debug/botserver --noconsole > botserver.log 2>&1 & -echo " botserver PID: $!" - -# Quick health check -sleep 2 -curl -s http://localhost:8080/health > /dev/null 2>&1 && echo "✅ botserver ready" || echo "❌ botserver failed" - -echo "Done. botserver $(pgrep -f 'botserver --noconsole') botui $(pgrep -f botui) botmodels $(pgrep -f botmodels)" diff --git a/restart.sh b/restart.sh index 4dd8efd..2e5f0e1 100755 --- a/restart.sh +++ b/restart.sh @@ -1,55 +1,49 @@ #!/bin/bash -set -e -echo "=== Fast Restart: botserver only (keeps infra running) ===" +echo "=== Fast Restart: botserver + botmodels only ===" -# Only kill the app services, keep infra (postgres, valkey, minio, vault, zitadel) running +# Kill only the app services, keep infra running pkill -f "botserver --noconsole" || true pkill -f "botmodels" || true -# Clean app logs only +# Clean logs rm -f botserver.log botmodels.log -# Build botserver (incremental, should be fast) +# Build only botserver (botui likely already built) cargo build -p botserver -# Start botmodels if not running -if ! pgrep -f "botmodels" > /dev/null; then - echo "Starting botmodels..." - cd botmodels - source venv/bin/activate - uvicorn src.main:app --host 0.0.0.0 --port 8085 > ../botmodels.log 2>&1 & - echo " botmodels PID: $!" - cd .. - - # Wait for botmodels - for i in $(seq 1 15); do - if curl -s http://localhost:8085/api/health > /dev/null 2>&1; then - echo " botmodels ready" - break - fi - sleep 1 - done -else - echo " botmodels already running" -fi +# Start botmodels +cd botmodels +source venv/bin/activate +uvicorn src.main:app --host 0.0.0.0 --port 8085 > ../botmodels.log 2>&1 & +echo " botmodels PID: $!" +cd .. -# Start botserver -echo "Starting botserver..." -BOTMODELS_HOST="http://localhost:8085" BOTMODELS_API_KEY="starter" RUST_LOG=info \ - ./target/debug/botserver --noconsole > botserver.log 2>&1 & -echo " botserver PID: $!" - -# Wait for botserver health with timeout -echo "Waiting for botserver..." -for i in $(seq 1 10); do - if curl -sf http://localhost:8080/health > /dev/null 2>&1; then - echo "✅ botserver ready" - exit 0 +# Wait for botmodels +for i in $(seq 1 20); do + if curl -s http://localhost:8085/api/health > /dev/null 2>&1; then + echo " botmodels ready" + break fi sleep 1 done -echo "❌ botserver failed to start - check botserver.log" -tail -20 botserver.log -exit 1 +# Start botserver (keep botui running if already up) +if ! pgrep -f "botui" > /dev/null; then + echo "Starting botui..." + cargo build -p botui + cd botui + BOTSERVER_URL="http://localhost:8080" ./target/debug/botui > ../botui.log 2>&1 & + echo " botui PID: $!" + cd .. +fi + +# Start botserver +BOTMODELS_HOST="http://localhost:8085" BOTMODELS_API_KEY="starter" RUST_LOG=info ./target/debug/botserver --noconsole > botserver.log 2>&1 & +echo " botserver PID: $!" + +# Quick health check +sleep 2 +curl -s http://localhost:8080/health > /dev/null 2>&1 && echo "✅ botserver ready" || echo "❌ botserver failed" + +echo "Done. botserver $(pgrep -f 'botserver --noconsole') botui $(pgrep -f botui) botmodels $(pgrep -f botmodels)" diff --git a/stop.sh b/stop.sh deleted file mode 100755 index cf7d0d3..0000000 --- a/stop.sh +++ /dev/null @@ -1,4 +0,0 @@ -pkill botui -pkill botserver -9 - -