gb/PROD.md
2026-04-12 14:16:15 -03:00

14 KiB

Production Environment Guide (Compact)

CRITICAL RULES — READ FIRST

NEVER INCLUDE HERE CREDENTIALS OR COMPANY INFORMATION, THIS IS COMPANY AGNOSTIC. Always manage services with systemctl inside the system Incus container. Never run /opt/gbo/bin/botserver or /opt/gbo/bin/botui directly — they will fail because they won't load the .env file containing Vault credentials and paths. The correct commands are sudo incus exec system -- systemctl start|stop|restart|status botserver and the same for ui. Systemctl handles environment loading, auto-restart, logging, and dependencies.

Never push secrets (API keys, passwords, tokens) to git. Never commit init.json (it contains Vault unseal keys). All secrets must come from Vault — only VAULT_* variables are allowed in .env. Never deploy manually via scp or ssh; always use CI/CD. Always push all submodules (botserver, botui, botlib) before or alongside the main repo. Always ask before pushing to ALM.


Infrastructure Overview

The host machine is PROD-GBO1, accessed via ssh user@<hostname>, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with sudo incus exec <container> -- <command> and list them with sudo incus list.

The containers and their roles are: system runs botserver on port 5858 and botui on port 5859; alm-ci runs the Forgejo Actions CI runner; alm hosts the Forgejo git server; tables runs PostgreSQL on port 5432; cache runs Valkey/Redis on port 6379; drive runs MinIO object storage on port 9100; vault runs HashiCorp Vault on port 8200; vector runs Qdrant on port 6333.

Externally, botserver is reachable at https://system.example.com and botui at https://chat.example.com. Internally, botui's BOTSERVER_URL must be http://localhost:5858 — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access.


Services Detail

Botserver runs as user gbuser, binary at /opt/gbo/bin/botserver, logs at /opt/gbo/logs/out.log and /opt/gbo/logs/err.log, systemd unit at /etc/systemd/system/botserver.service, env loaded from /opt/gbo/bin/.env. Bot BASIC scripts live under /opt/gbo/data/<botname>.gbai/<botname>.gbdialog/*.bas; compiled AST cache goes to /opt/gbo/work/.

The directory service runs Zitadel as user root, binary at /opt/gbo/bin/zitadel, logs at /opt/gbo/logs/zitadel.log, systemd unit at /etc/systemd/system/directory.service, and loads environment from the service configuration. Zitadel provides identity management and OAuth2 services for the platform.

Internally, Zitadel listens on port 8080 within the directory container. For external access:

  • Via public domain (HTTPS): https://login.example.com (configured through proxy container)
  • Via host IP (HTTP): http://<host-ip>:9000 (direct container port forwarding)
  • Via container IP (HTTP): http://<directory-container-ip>:9000 (direct container access) Access the Zitadel console at https://login.example.com/ui/console with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under /v2/organizations and /v2/users for all operations.

The botserver bootstrap also manages: Vault (secrets), PostgreSQL (database), Valkey (cache, password auth), MinIO (object storage), Zitadel (identity provider), and llama.cpp (LLM). To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt) and include -H \"Host: <IP> \" for correct host resolution (replace with your directory container IP).


Common Operations

Check status: sudo incus exec system -- systemctl status botserver --no-pager (same for ui). To check process existence: sudo incus exec system -- pgrep -f botserver.

View logs: For systemd journal: sudo incus exec system -- journalctl -u botserver --no-pager -n 50. For application logs: sudo incus exec system -- tail -50 /opt/gbo/logs/out.log or err.log. For live tail: sudo incus exec system -- tail -f /opt/gbo/logs/out.log.

Restart: sudo incus exec system -- systemctl restart botserver and same for ui. Never run the binary directly.

Emergency manual deploy: Kill the old process with sudo incus exec system -- killall botserver, copy the new binary from /opt/gbo/ci/botserver/target/debug/botserver to /opt/gbo/bin/botserver, set permissions with chmod +x and chown gbuser:gbuser, then start with systemctl start botserver.

Transfer bot files: Archive locally with tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai, copy to host with scp, then extract inside container with sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'. Clear compiled cache with find /opt/gbo/data -name "*.ast" -delete and same for /opt/gbo/work.

Snapshots: sudo incus snapshot list system to list, sudo incus snapshot restore system <name> to restore.


CI/CD Pipeline

Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos are gb (main workspace), botserver, botui, and botlib. Always push submodules first (cd botserver && git push alm main && git push origin main), then update submodule references in the root repo and push that too.

The CI runner container (alm-ci) runs Debian Trixie with glibc 2.41, but the system container runs Debian 12 Bookworm with glibc 2.36. Binaries compiled on the CI runner are incompatible with the system container. The CI workflow (botserver/.forgejo/workflows/botserver.yaml) solves this by transferring source to the system container via tar | ssh and building there. The workflow triggers on pushes to main, clones repos, transfers source, builds inside system container, deploys the binary, and verifies botserver is running.


DriveMonitor & Bot Configuration

DriveMonitor is a background service inside botserver that watches MinIO buckets and syncs changes to the local filesystem and database every 10 seconds. It monitors three directory types per bot: the .gbdialog/ folder for BASIC scripts (downloads and recompiles on change), the .gbot/ folder for config.csv (syncs to the bot_configuration database table), and the .gbkb/ folder for knowledge base documents (downloads and indexes for vector search).

Bot configuration is stored in two PostgreSQL tables inside the botserver database. The bot_configuration table holds key-value pairs with columns bot_id, config_key, config_value, config_type, is_encrypted, and updated_at. The gbot_config_sync table tracks sync state with columns bot_id, config_file_path, last_sync_at, file_hash, and sync_count.

The config.csv format is a plain CSV with no header: each line is key,value, for example llm-provider,groq or theme-color1,#cc0000. DriveMonitor syncs it when the file ETag changes in MinIO, on botserver startup, or after a restart.

Check config status: Query bot_configuration via sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') ORDER BY config_key;". Check sync state via the gbot_config_sync table. Inspect the bucket directly with sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv.

Debug DriveMonitor: Monitor live logs with sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)". An empty gbot_config_sync table means DriveMonitor has not synced yet. If no new log entries appear after 30 seconds, the loop may be stuck — restart botserver with systemctl to clear the state.

Common config issues: If config.csv is missing from the bucket, create and upload it with mc cp. If the database shows stale values, restart botserver to force a fresh sync, or as a temporary fix update the database directly with UPDATE bot_configuration SET config_value = 'groq', updated_at = NOW() WHERE .... To force a re-sync without restarting, copy config.csv over itself with mc cp local/... local/... to change the ETag.


MinIO (Drive) Operations

All bot files live in MinIO buckets. Use the mc CLI at /opt/gbo/bin/mc from inside the drive container. The bucket structure per bot is: {bot}.gbai/ as root, {bot}.gbai/{bot}.gbdialog/ for BASIC scripts, {bot}.gbai/{bot}.gbot/ for config.csv, and {bot}.gbai/{bot}.gbkb/ for knowledge base folders.

Common mc commands: mc ls local/ lists all buckets; mc ls local/salesianos.gbai/ lists a bucket; mc cat local/.../start.bas prints a file; mc cp local/.../file /tmp/file downloads; mc cp /tmp/file local/.../file uploads (this triggers DriveMonitor recompile); mc stat local/.../config.csv shows ETag and metadata; mc mb local/newbot.gbai creates a bucket; mc rb local/oldbot.gbai removes an empty bucket.

If mc is not found, use the full path /opt/gbo/bin/mc. If alias local is not configured, check with mc config host list. If MinIO is not running, check with sudo incus exec drive -- systemctl status minio.


Vault Security Architecture

HashiCorp Vault is the single source of truth for all secrets. Botserver reads VAULT_ADDR and VAULT_TOKEN from /opt/gbo/bin/.env at startup, initializes a TLS/mTLS client, then reads credentials from Vault paths. If Vault is unavailable, it falls back to defaults. The .env file must only contain VAULT_* variables plus PORT, DATA_DIR, WORK_DIR, and LOAD_ONLY.

Global Vault paths: gbo/tables holds PostgreSQL credentials; gbo/drive holds MinIO access key and secret; gbo/cache holds Valkey password; gbo/llm holds LLM URL and API keys; gbo/directory holds Zitadel config; gbo/email holds SMTP credentials; gbo/vectordb holds Qdrant config; gbo/jwt holds JWT signing secret; gbo/encryption holds the master encryption key. Organization-scoped secrets follow patterns like gbo/orgs/{org_id}/bots/{bot_id} and tenant infrastructure uses gbo/tenants/{tenant_id}/infrastructure.

Credential resolution: For any service, botserver checks the most specific Vault path first (org+bot level), falls back to a default bot path, then falls back to the global path, and only uses environment variables as a last resort in development.

Verify Vault health: sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health should return JSON with "sealed":false. To read a secret: set VAULT_ADDR, VAULT_TOKEN, and VAULT_CACERT then run vault kv get secret/gbo/tables. To test from the system container, use curl with --cacert /opt/gbo/conf/system/certificates/ca/ca.crt and -H "X-Vault-Token: <token>".

init.json is stored at /opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json and contains the root token and 5 unseal keys (3 needed to unseal). Never commit this file to git. Store it encrypted in a secure location.

Vault troubleshooting — cannot connect: Check that the vault container's systemd unit is running, verify the token in .env is not expired with vault token lookup, confirm the CA cert path in .env matches the actual file location, and test network connectivity from system to vault container. To generate a new token: vault token create -policy="botserver" -ttl="8760h" -format=json then update .env and restart botserver.

Vault troubleshooting — secrets missing: Run vault kv get secret/gbo/tables (and other paths) to check if secrets exist. If a path returns NOT FOUND, add secrets with vault kv put secret/gbo/tables host=<ip> port=5432 database=botserver username=gbuser password=<pw> and similar for other paths.

Vault sealed after restart: Run vault operator unseal <key1>, repeat with key2 and key3 (3 of 5 keys from init.json), then verify with vault status.

TLS certificate errors: Confirm /opt/gbo/conf/system/certificates/ca/ca.crt exists in the system container. If missing, copy it from the vault container using incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt then place it at the expected path.

Vault snapshots: Stop vault, run sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M), start vault. Restore with sudo incus snapshot restore vault <name> while vault is stopped.


Troubleshooting Quick Reference

GLIBC mismatch (GLIBC_2.39 not found): The binary was compiled on the CI runner (glibc 2.41) not inside the system container (glibc 2.36). The CI workflow must SSH into the system container to build. Check botserver.yaml to confirm this.

botserver won't start: Run sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found" to check for missing libraries. Run sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1 to see startup errors. Confirm /opt/gbo/data/ exists and is accessible.

botui can't reach botserver: Check that the ui.service systemd file has BOTSERVER_URL=http://localhost:5858 — not the external HTTPS URL. Fix with sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' on the service file, then systemctl daemon-reload and systemctl restart ui.

Suggestions not showing: Confirm bot .bas files exist under /opt/gbo/data/<bot>.gbai/<bot>.gbdialog/. Check logs for compilation errors. Clear the AST cache in /opt/gbo/work/ and restart botserver.

IPv6 DNS timeouts on external APIs (Groq, Cloudflare): The container's DNS may return AAAA records without IPv6 connectivity. The container should have IPV6=no in its network config and gai.conf set appropriately. Check for RES_OPTIONS=inet4 in botserver.service if issues persist.

Logs show development paths instead of /opt/gbo/data/: Botserver is using hardcoded dev paths. Check .env has DATA_DIR=/opt/gbo/data/ and WORK_DIR=/opt/gbo/work/, verify the systemd unit has EnvironmentFile=/opt/gbo/bin/.env, and confirm Vault is reachable so service discovery works. Expected startup log lines include info watcher:Watching data directory /opt/gbo/data and info botserver:BotServer started successfully on port 5858.

Migrations not running after push: If stat /opt/gbo/bin/botserver shows old timestamp and __diesel_schema_migrations table has no new entries, CI did not rebuild. Make a trivial code change (e.g., add a comment) in botserver and push again to force rebuild.