127 lines
No EOL
14 KiB
Markdown
127 lines
No EOL
14 KiB
Markdown
# Production Environment Guide (Compact)
|
|
|
|
## CRITICAL RULES — READ FIRST
|
|
|
|
NEVER INCLUDE HERE CREDENTIALS OR COMPANY INFORMATION, THIS IS COMPANY AGNOSTIC.
|
|
Always manage services with `systemctl` inside the `system` Incus container. Never run `/opt/gbo/bin/botserver` or `/opt/gbo/bin/botui` directly — they will fail because they won't load the `.env` file containing Vault credentials and paths. The correct commands are `sudo incus exec system -- systemctl start|stop|restart|status botserver` and the same for `ui`. Systemctl handles environment loading, auto-restart, logging, and dependencies.
|
|
|
|
Never push secrets (API keys, passwords, tokens) to git. Never commit `init.json` (it contains Vault unseal keys). All secrets must come from Vault — only `VAULT_*` variables are allowed in `.env`. Never deploy manually via scp or ssh; always use CI/CD. Always push all submodules (botserver, botui, botlib) before or alongside the main repo. Always ask before pushing to ALM.
|
|
|
|
---
|
|
|
|
## Infrastructure Overview
|
|
|
|
The host machine is `PROD-GBO1`, accessed via `ssh user@<hostname>`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec <container> -- <command>` and list them with `sudo incus list`.
|
|
|
|
The containers and their roles are: `system` runs botserver on port 5858 and botui on port 5859; `alm-ci` runs the Forgejo Actions CI runner; `alm` hosts the Forgejo git server; `tables` runs PostgreSQL on port 5432; `cache` runs Valkey/Redis on port 6379; `drive` runs MinIO object storage on port 9100; `vault` runs HashiCorp Vault on port 8200; `vector` runs Qdrant on port 6333.
|
|
|
|
Externally, botserver is reachable at `https://system.example.com` and botui at `https://chat.example.com`. Internally, botui's `BOTSERVER_URL` must be `http://localhost:5858` — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access.
|
|
|
|
---
|
|
|
|
## Services Detail
|
|
|
|
Botserver runs as user `gbuser`, binary at `/opt/gbo/bin/botserver`, logs at `/opt/gbo/logs/out.log` and `/opt/gbo/logs/err.log`, systemd unit at `/etc/systemd/system/botserver.service`, env loaded from `/opt/gbo/bin/.env`. Bot BASIC scripts live under `/opt/gbo/data/<botname>.gbai/<botname>.gbdialog/*.bas`; compiled AST cache goes to `/opt/gbo/work/`.
|
|
|
|
The directory service runs Zitadel as user `root`, binary at `/opt/gbo/bin/zitadel`, logs at `/opt/gbo/logs/zitadel.log`, systemd unit at `/etc/systemd/system/directory.service`, and loads environment from the service configuration. Zitadel provides identity management and OAuth2 services for the platform.
|
|
|
|
Internally, Zitadel listens on port 8080 within the directory container. For external access:
|
|
- Via public domain (HTTPS): `https://login.example.com` (configured through proxy container)
|
|
- Via host IP (HTTP): `http://<host-ip>:9000` (direct container port forwarding)
|
|
- Via container IP (HTTP): `http://<directory-container-ip>:9000` (direct container access)
|
|
Access the Zitadel console at `https://login.example.com/ui/console` with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under `/v2/organizations` and `/v2/users` for all operations.
|
|
|
|
The botserver bootstrap also manages: Vault (secrets), PostgreSQL (database), Valkey (cache, password auth), MinIO (object storage), Zitadel (identity provider), and llama.cpp (LLM).
|
|
To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H \"Host: <IP> \"` for correct host resolution (replace with your directory container IP).
|
|
|
|
---
|
|
|
|
## Common Operations
|
|
|
|
**Check status:** `sudo incus exec system -- systemctl status botserver --no-pager` (same for `ui`). To check process existence: `sudo incus exec system -- pgrep -f botserver`.
|
|
|
|
**View logs:** For systemd journal: `sudo incus exec system -- journalctl -u botserver --no-pager -n 50`. For application logs: `sudo incus exec system -- tail -50 /opt/gbo/logs/out.log` or `err.log`. For live tail: `sudo incus exec system -- tail -f /opt/gbo/logs/out.log`.
|
|
|
|
**Restart:** `sudo incus exec system -- systemctl restart botserver` and same for `ui`. Never run the binary directly.
|
|
|
|
**Emergency manual deploy:** Kill the old process with `sudo incus exec system -- killall botserver`, copy the new binary from `/opt/gbo/ci/botserver/target/debug/botserver` to `/opt/gbo/bin/botserver`, set permissions with `chmod +x` and `chown gbuser:gbuser`, then start with `systemctl start botserver`.
|
|
|
|
**Transfer bot files:** Archive locally with `tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai`, copy to host with `scp`, then extract inside container with `sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'`. Clear compiled cache with `find /opt/gbo/data -name "*.ast" -delete` and same for `/opt/gbo/work`.
|
|
|
|
**Snapshots:** `sudo incus snapshot list system` to list, `sudo incus snapshot restore system <name>` to restore.
|
|
|
|
---
|
|
|
|
## CI/CD Pipeline
|
|
|
|
Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos are `gb` (main workspace), `botserver`, `botui`, and `botlib`. Always push submodules first (`cd botserver && git push alm main && git push origin main`), then update submodule references in the root repo and push that too.
|
|
|
|
The CI runner container (`alm-ci`) runs Debian Trixie with glibc 2.41, but the `system` container runs Debian 12 Bookworm with glibc 2.36. Binaries compiled on the CI runner are incompatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) solves this by transferring source to the system container via `tar | ssh` and building there. The workflow triggers on pushes to `main`, clones repos, transfers source, builds inside system container, deploys the binary, and verifies botserver is running.
|
|
|
|
---
|
|
|
|
## DriveMonitor & Bot Configuration
|
|
|
|
DriveMonitor is a background service inside botserver that watches MinIO buckets and syncs changes to the local filesystem and database every 10 seconds. It monitors three directory types per bot: the `.gbdialog/` folder for BASIC scripts (downloads and recompiles on change), the `.gbot/` folder for `config.csv` (syncs to the `bot_configuration` database table), and the `.gbkb/` folder for knowledge base documents (downloads and indexes for vector search).
|
|
|
|
Bot configuration is stored in two PostgreSQL tables inside the `botserver` database. The `bot_configuration` table holds key-value pairs with columns `bot_id`, `config_key`, `config_value`, `config_type`, `is_encrypted`, and `updated_at`. The `gbot_config_sync` table tracks sync state with columns `bot_id`, `config_file_path`, `last_sync_at`, `file_hash`, and `sync_count`.
|
|
|
|
The `config.csv` format is a plain CSV with no header: each line is `key,value`, for example `llm-provider,groq` or `theme-color1,#cc0000`. DriveMonitor syncs it when the file ETag changes in MinIO, on botserver startup, or after a restart.
|
|
|
|
**Check config status:** Query `bot_configuration` via `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') ORDER BY config_key;"`. Check sync state via the `gbot_config_sync` table. Inspect the bucket directly with `sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv`.
|
|
|
|
**Debug DriveMonitor:** Monitor live logs with `sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)"`. An empty `gbot_config_sync` table means DriveMonitor has not synced yet. If no new log entries appear after 30 seconds, the loop may be stuck — restart botserver with systemctl to clear the state.
|
|
|
|
**Common config issues:** If config.csv is missing from the bucket, create and upload it with `mc cp`. If the database shows stale values, restart botserver to force a fresh sync, or as a temporary fix update the database directly with `UPDATE bot_configuration SET config_value = 'groq', updated_at = NOW() WHERE ...`. To force a re-sync without restarting, copy config.csv over itself with `mc cp local/... local/...` to change the ETag.
|
|
|
|
---
|
|
|
|
## MinIO (Drive) Operations
|
|
|
|
All bot files live in MinIO buckets. Use the `mc` CLI at `/opt/gbo/bin/mc` from inside the `drive` container. The bucket structure per bot is: `{bot}.gbai/` as root, `{bot}.gbai/{bot}.gbdialog/` for BASIC scripts, `{bot}.gbai/{bot}.gbot/` for config.csv, and `{bot}.gbai/{bot}.gbkb/` for knowledge base folders.
|
|
|
|
Common mc commands: `mc ls local/` lists all buckets; `mc ls local/salesianos.gbai/` lists a bucket; `mc cat local/.../start.bas` prints a file; `mc cp local/.../file /tmp/file` downloads; `mc cp /tmp/file local/.../file` uploads (this triggers DriveMonitor recompile); `mc stat local/.../config.csv` shows ETag and metadata; `mc mb local/newbot.gbai` creates a bucket; `mc rb local/oldbot.gbai` removes an empty bucket.
|
|
|
|
If mc is not found, use the full path `/opt/gbo/bin/mc`. If alias `local` is not configured, check with `mc config host list`. If MinIO is not running, check with `sudo incus exec drive -- systemctl status minio`.
|
|
|
|
---
|
|
|
|
## Vault Security Architecture
|
|
|
|
HashiCorp Vault is the single source of truth for all secrets. Botserver reads `VAULT_ADDR` and `VAULT_TOKEN` from `/opt/gbo/bin/.env` at startup, initializes a TLS/mTLS client, then reads credentials from Vault paths. If Vault is unavailable, it falls back to defaults. The `.env` file must only contain `VAULT_*` variables plus `PORT`, `DATA_DIR`, `WORK_DIR`, and `LOAD_ONLY`.
|
|
|
|
**Global Vault paths:** `gbo/tables` holds PostgreSQL credentials; `gbo/drive` holds MinIO access key and secret; `gbo/cache` holds Valkey password; `gbo/llm` holds LLM URL and API keys; `gbo/directory` holds Zitadel config; `gbo/email` holds SMTP credentials; `gbo/vectordb` holds Qdrant config; `gbo/jwt` holds JWT signing secret; `gbo/encryption` holds the master encryption key. Organization-scoped secrets follow patterns like `gbo/orgs/{org_id}/bots/{bot_id}` and tenant infrastructure uses `gbo/tenants/{tenant_id}/infrastructure`.
|
|
|
|
**Credential resolution:** For any service, botserver checks the most specific Vault path first (org+bot level), falls back to a default bot path, then falls back to the global path, and only uses environment variables as a last resort in development.
|
|
|
|
**Verify Vault health:** `sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health` should return JSON with `"sealed":false`. To read a secret: set `VAULT_ADDR`, `VAULT_TOKEN`, and `VAULT_CACERT` then run `vault kv get secret/gbo/tables`. To test from the system container, use curl with `--cacert /opt/gbo/conf/system/certificates/ca/ca.crt` and `-H "X-Vault-Token: <token>"`.
|
|
|
|
**init.json** is stored at `/opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json` and contains the root token and 5 unseal keys (3 needed to unseal). Never commit this file to git. Store it encrypted in a secure location.
|
|
|
|
**Vault troubleshooting — cannot connect:** Check that the vault container's systemd unit is running, verify the token in `.env` is not expired with `vault token lookup`, confirm the CA cert path in `.env` matches the actual file location, and test network connectivity from system to vault container. To generate a new token: `vault token create -policy="botserver" -ttl="8760h" -format=json` then update `.env` and restart botserver.
|
|
|
|
**Vault troubleshooting — secrets missing:** Run `vault kv get secret/gbo/tables` (and other paths) to check if secrets exist. If a path returns NOT FOUND, add secrets with `vault kv put secret/gbo/tables host=<ip> port=5432 database=botserver username=gbuser password=<pw>` and similar for other paths.
|
|
|
|
**Vault sealed after restart:** Run `vault operator unseal <key1>`, repeat with key2 and key3 (3 of 5 keys from init.json), then verify with `vault status`.
|
|
|
|
**TLS certificate errors:** Confirm `/opt/gbo/conf/system/certificates/ca/ca.crt` exists in the system container. If missing, copy it from the vault container using `incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt` then place it at the expected path.
|
|
|
|
**Vault snapshots:** Stop vault, run `sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M)`, start vault. Restore with `sudo incus snapshot restore vault <name>` while vault is stopped.
|
|
|
|
---
|
|
|
|
## Troubleshooting Quick Reference
|
|
|
|
**GLIBC mismatch (`GLIBC_2.39 not found`):** The binary was compiled on the CI runner (glibc 2.41) not inside the system container (glibc 2.36). The CI workflow must SSH into the system container to build. Check `botserver.yaml` to confirm this.
|
|
|
|
**botserver won't start:** Run `sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found"` to check for missing libraries. Run `sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1` to see startup errors. Confirm `/opt/gbo/data/` exists and is accessible.
|
|
|
|
**botui can't reach botserver:** Check that the `ui.service` systemd file has `BOTSERVER_URL=http://localhost:5858` — not the external HTTPS URL. Fix with `sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|'` on the service file, then `systemctl daemon-reload` and `systemctl restart ui`.
|
|
|
|
**Suggestions not showing:** Confirm bot `.bas` files exist under `/opt/gbo/data/<bot>.gbai/<bot>.gbdialog/`. Check logs for compilation errors. Clear the AST cache in `/opt/gbo/work/` and restart botserver.
|
|
|
|
**IPv6 DNS timeouts on external APIs (Groq, Cloudflare):** The container's DNS may return AAAA records without IPv6 connectivity. The container should have `IPV6=no` in its network config and `gai.conf` set appropriately. Check for `RES_OPTIONS=inet4` in `botserver.service` if issues persist.
|
|
|
|
**Logs show development paths instead of `/opt/gbo/data/`:** Botserver is using hardcoded dev paths. Check `.env` has `DATA_DIR=/opt/gbo/data/` and `WORK_DIR=/opt/gbo/work/`, verify the systemd unit has `EnvironmentFile=/opt/gbo/bin/.env`, and confirm Vault is reachable so service discovery works. Expected startup log lines include `info watcher:Watching data directory /opt/gbo/data` and `info botserver:BotServer started successfully on port 5858`.
|
|
|
|
**Migrations not running after push:** If `stat /opt/gbo/bin/botserver` shows old timestamp and `__diesel_schema_migrations` table has no new entries, CI did not rebuild. Make a trivial code change (e.g., add a comment) in botserver and push again to force rebuild. |