748 lines
36 KiB
Markdown
748 lines
36 KiB
Markdown
# Production Environment Guide (Compact)
|
|
|
|
## CRITICAL RULES — READ FIRST
|
|
|
|
NEVER INCLUDE HERE CREDENTIALS OR COMPANY INFORMATION, THIS IS COMPANY AGNOSTIC.
|
|
Always manage services with `systemctl` inside the `system` Incus container. Never run `/opt/gbo/bin/botserver` or `/opt/gbo/bin/botui` directly — they will fail because they won't load the `.env` file containing Vault credentials and paths. The correct commands are `sudo incus exec system -- systemctl start|stop|restart|status botserver` and the same for `ui`. Systemctl handles environment loading, auto-restart, logging, and dependencies.
|
|
|
|
Never push secrets (API keys, passwords, tokens) to git. Never commit `init.json` (it contains Vault unseal keys). All secrets must come from Vault — only `VAULT_*` variables are allowed in `.env`. Never deploy manually via scp or ssh; always use CI/CD. Always push all submodules (botserver, botui, botlib) before or alongside the main repo. Always ask before pushing to ALM.
|
|
|
|
---
|
|
|
|
## Infrastructure Overview
|
|
|
|
The host machine is `<hostname>`, accessed via `ssh user@<hostname>`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec <container> -- <command>` and list them with `sudo incus list`.
|
|
|
|
The containers and their roles are: `system` runs botserver on port 5858 and botui on port 5859; `alm-ci` runs the Forgejo Actions CI runner; `alm` hosts the Forgejo git server; `tables` runs PostgreSQL on port 5432; `cache` runs Valkey/Redis on port 6379; `drive` runs MinIO object storage on port 9100; `vault` runs HashiCorp Vault on port 8200; `vector` runs Qdrant on port 6333.
|
|
|
|
Externally, botserver is reachable at `https://<system-domain>` and botui at `https://<chat-domain>`. Internally, botui's `BOTSERVER_URL` must be `http://localhost:5858` — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access.
|
|
|
|
---
|
|
|
|
## Services Detail
|
|
|
|
Botserver runs as user `gbuser`, binary at `/opt/gbo/bin/botserver`, logs at `/opt/gbo/logs/out.log` and `/opt/gbo/logs/err.log`, systemd unit at `/etc/systemd/system/botserver.service`, env loaded from `/opt/gbo/bin/.env`. Bot BASIC scripts live under `/opt/gbo/data/<botname>.gbai/<botname>.gbdialog/*.bas`; compiled AST cache goes to `/opt/gbo/work/`.
|
|
|
|
The directory service runs Zitadel as user `root`, binary at `/opt/gbo/bin/zitadel`, logs at `/opt/gbo/logs/zitadel.log`, systemd unit at `/etc/systemd/system/directory.service`, and loads environment from the service configuration. Zitadel provides identity management and OAuth2 services for the platform.
|
|
|
|
Internally, Zitadel listens on port 8080 within the directory container. For external access:
|
|
- Via public domain (HTTPS): `https://<login-domain>` (configured through proxy container)
|
|
- Via host IP (HTTP): `http://<host-ip>:9000` (direct container port forwarding)
|
|
- Via container IP (HTTP): `http://<directory-container-ip>:9000` (direct container access)
|
|
Access the Zitadel console at `https://<login-domain>/ui/console` with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under `/v2/organizations` and `/v2/users` for all operations.
|
|
|
|
The botserver bootstrap also manages: Vault (secrets), PostgreSQL (database), Valkey (cache, password auth), MinIO (object storage), Zitadel (identity provider), and llama.cpp (LLM).
|
|
To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H "Host: <directory-ip> "` for correct host resolution (replace with your directory container IP).
|
|
|
|
---
|
|
|
|
## Directory Management (Zitadel)
|
|
|
|
### Getting Admin PAT (Personal Access Token)
|
|
|
|
```bash
|
|
# Get the admin PAT from directory container
|
|
PAT=$(ssh administrator@<hostname> "sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt")
|
|
```
|
|
|
|
### User Management via API (v2)
|
|
|
|
**Create a Human User:**
|
|
```bash
|
|
curl -X POST "http://<directory-ip>:8080/v2/users/human" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: <directory-ip>" \
|
|
-d '{
|
|
"username": "testuser",
|
|
"profile": {"givenName": "Test", "familyName": "User"},
|
|
"email": {"email": "test@example.com", "isVerified": true},
|
|
"password": {"password": "SecurePass123!", "changeRequired": false}
|
|
}'
|
|
```
|
|
|
|
**List Users:**
|
|
```bash
|
|
curl -X POST "http://<directory-ip>:8080/v2/users" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: <directory-ip>" \
|
|
-d '{"query": {"offset": 0, "limit": 100}}'
|
|
```
|
|
|
|
**Update User Password:**
|
|
```bash
|
|
curl -X POST "http://<directory-ip>:8080/v2/users/<user-id>/password" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: <directory-ip>" \
|
|
-d '{
|
|
"newPassword": {"password": "NewPass123!", "changeRequired": false}
|
|
}'
|
|
```
|
|
|
|
**Delete User:**
|
|
```bash
|
|
curl -X DELETE "http://<directory-ip>:8080/v2/users/<user-id>" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: <directory-ip>"
|
|
```
|
|
|
|
### Directory Quick Reference
|
|
|
|
| Task | Command |
|
|
|------|---------|
|
|
| Get PAT | `sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt` |
|
|
| Check health | `curl -sf http://<directory-ip>:8080/debug/healthz` |
|
|
| Console UI | `http://<host-ip>:9000/ui/console` |
|
|
| Create user | `POST /v2/users/human` |
|
|
| List users | `POST /v2/users` |
|
|
| Update password | `POST /v2/users/{id}/password` |
|
|
|
|
### Zitadel API v2 Usage with PAT
|
|
|
|
**Important:** Zitadel API v2 requires a valid Personal Access Token (PAT) for authentication. The PAT must have the appropriate scopes for the operations you want to perform.
|
|
|
|
**Using PAT with curl:**
|
|
|
|
```bash
|
|
# Set your PAT as an environment variable
|
|
PAT="<your-pat-token>"
|
|
|
|
# Include the required headers in all API calls
|
|
curl -X POST "http://<directory-ip>:8080/v2/organizations" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: <directory-ip>" \
|
|
-d '{
|
|
"name": "pragmatismo"
|
|
}'
|
|
```
|
|
|
|
**Critical Headers:**
|
|
- `Authorization: Bearer $PAT` - Your PAT token
|
|
- `Host: <directory-ip>` - Required for gRPC-gateway routing
|
|
- `Content-Type: application/json` - For POST/PUT/PATCH requests
|
|
|
|
**Common API v2 Endpoints:**
|
|
|
|
Create Organization:
|
|
```bash
|
|
curl -X POST "http://10.157.134.240:8080/v2/organizations" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: 10.157.134.240" \
|
|
-d '{
|
|
"name": "organization-name"
|
|
}'
|
|
```
|
|
|
|
List Organizations (requires body with query):
|
|
```bash
|
|
curl -X POST "http://10.157.134.240:8080/v2/organizations" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: 10.157.134.240" \
|
|
-d '{
|
|
"query": {
|
|
"offset": 0,
|
|
"limit": 100
|
|
}
|
|
}'
|
|
```
|
|
|
|
Create Human User:
|
|
```bash
|
|
curl -X POST "http://10.157.134.240:8080/v2/users/human" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: 10.157.134.240" \
|
|
-d '{
|
|
"username": "johndoe",
|
|
"profile": {
|
|
"givenName": "John",
|
|
"familyName": "Doe"
|
|
},
|
|
"email": {
|
|
"email": "john@example.com",
|
|
"isVerified": true
|
|
},
|
|
"password": {
|
|
"password": "SecurePass123!",
|
|
"changeRequired": false
|
|
}
|
|
}'
|
|
```
|
|
|
|
**Testing PAT Validity:**
|
|
```bash
|
|
# Test if PAT is valid by calling users endpoint
|
|
curl -X POST "http://10.157.134.240:8080/v2/users" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $PAT" \
|
|
-H "Host: 10.157.134.240" \
|
|
-d '{"query": {"offset": 0, "limit": 1}}'
|
|
|
|
# If you get {"code":16,"message":"Errors.Token.Invalid (AUTH-7fs1e)"}, the PAT is invalid
|
|
```
|
|
|
|
**Generating a New PAT via Web Console:**
|
|
1. Access: `http://<directory-ip>:9000/ui/console`
|
|
2. Login with admin credentials
|
|
3. Navigate to your profile (top right corner)
|
|
4. Go to "Personal Access Tokens"
|
|
5. Click "Create"
|
|
6. Name the token and select expiration
|
|
7. Copy the token (you won't see it again!)
|
|
8. Update `/opt/gbo/conf/directory/admin-pat.txt` with the new token
|
|
|
|
### Production Credentials
|
|
|
|
**Admin Account:**
|
|
- Username: `admin`
|
|
- Password: `Admin123!`
|
|
- Access: `https://<login-domain>/ui/console`
|
|
|
|
**Test User Account (created via API):**
|
|
- Username: `rodriguez`
|
|
- Password: `SecurePass2026!`
|
|
- User ID: `368981346720188144`
|
|
- Access: Use with any bot login page
|
|
|
|
---
|
|
|
|
### Zitadel Setup & Initialization
|
|
|
|
**Database Configuration:**
|
|
Zitadel connects to PostgreSQL with these credentials (set in `directory.service`):
|
|
- Database: `PROD-DIRECTORY`
|
|
- Host: `10.157.134.174` (tables container)
|
|
- Port: `5432`
|
|
- User: `postgres`
|
|
- Password: `67a690df` (from Vault: `secret/gbo/tables`)
|
|
|
|
**Current Production Settings:**
|
|
- Container IP: `10.157.134.240`
|
|
- Internal port: `8080`
|
|
- External port: `9000`
|
|
- Masterkey: `MasterkeyNeedsToHave32Characters` (CHANGE THIS IN PRODUCTION!)
|
|
- TLS mode: `disabled`
|
|
- External domain: `10.157.134.240`
|
|
|
|
**Initialization File:**
|
|
Location: `/opt/gbo/conf/directory/zitadel-init-steps.yaml`
|
|
```yaml
|
|
FirstInstance:
|
|
InstanceName: "BotServer"
|
|
DefaultLanguage: "en"
|
|
PatPath: "/opt/gbo/conf/directory/admin-pat.txt"
|
|
Org:
|
|
Name: "BotServer"
|
|
Machine:
|
|
Machine:
|
|
Username: "admin-sa"
|
|
Name: "Admin Service Account"
|
|
Pat:
|
|
ExpirationDate: "2099-01-01T00:00:00Z"
|
|
Human:
|
|
UserName: "admin"
|
|
FirstName: "Admin"
|
|
LastName: "User"
|
|
Email:
|
|
Address: "admin@localhost"
|
|
Verified: true
|
|
Password: "Admin123!"
|
|
PasswordChangeRequired: false
|
|
```
|
|
|
|
**To Reinitialize Zitadel (if database is empty or corrupted):**
|
|
```bash
|
|
# 1. Stop the service
|
|
sudo incus exec directory -- systemctl stop directory
|
|
|
|
# 2. Drop and recreate the database
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";"
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";"
|
|
|
|
# 3. Run initialization
|
|
sudo incus exec directory -- bash -c '
|
|
export ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174
|
|
export ZITADEL_DATABASE_POSTGRES_PORT=5432
|
|
export ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY
|
|
export ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres
|
|
export ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df
|
|
export ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable
|
|
/opt/gbo/bin/zitadel setup init \
|
|
--config /opt/gbo/conf/directory/zitadel-init-steps.yaml \
|
|
--masterkey MasterkeyNeedsToHave32Characters \
|
|
--tlsMode disabled
|
|
'
|
|
|
|
# 4. Start the service
|
|
sudo incus exec directory -- systemctl start directory
|
|
|
|
# 5. Verify health
|
|
curl -sf http://10.157.134.240:8080/debug/healthz
|
|
```
|
|
|
|
**Zitadel Database Schema:**
|
|
The database uses multiple schemas:
|
|
- `system` - System tables and configuration
|
|
- `projections` - Read-only projection tables (orgs, users, sessions, etc.)
|
|
- `eventstore` - Event sourcing tables
|
|
- `adminapi`, `auth`, `logstore`, `cache`, `queue` - Specialized schemas
|
|
|
|
To query organizations:
|
|
```bash
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \
|
|
"SELECT id, name FROM projections.orgs1;"
|
|
```
|
|
|
|
---
|
|
|
|
### Zitadel Troubleshooting
|
|
|
|
**Database Connection Errors:**
|
|
If logs show `failed SASL auth: FATAL: password authentication failed for user "postgres"`:
|
|
```bash
|
|
# Check systemd unit has correct credentials
|
|
sudo incus exec directory -- cat /etc/systemd/system/directory.service
|
|
|
|
# Verify Vault has the correct credentials
|
|
TOKEN="${VAULT_TOKEN}"
|
|
sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
|
|
-H "X-Vault-Token: $TOKEN" \
|
|
https://10.157.134.250:8200/v1/secret/data/gbo/tables
|
|
|
|
# If credentials changed, update systemd unit and restart
|
|
sudo incus exec directory -- systemctl daemon-reload
|
|
sudo incus exec directory -- systemctl restart directory
|
|
```
|
|
|
|
**Empty Database (No Organizations):**
|
|
If the database was initialized but tables are missing:
|
|
```bash
|
|
# Check if tables exist
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \
|
|
"SELECT tablename FROM pg_tables WHERE schemaname = 'projections' LIMIT 5;"
|
|
|
|
# If no tables, reinitialize using the steps above
|
|
```
|
|
|
|
**PAT Token Invalid:**
|
|
If API calls return `Errors.Token.Invalid (AUTH-7fs1e)`:
|
|
```bash
|
|
# Check if PAT file exists
|
|
sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt
|
|
|
|
# If missing or expired, regenerate via console or API:
|
|
# 1. Login to console: http://<directory-ip>:9000/ui/console
|
|
# 2. Go to Profile → Personal Access Tokens → Create
|
|
# 3. Save the new token to admin-pat.txt
|
|
```
|
|
|
|
**Health Check Fails:**
|
|
```bash
|
|
# Check service status
|
|
sudo incus exec directory -- systemctl status directory
|
|
|
|
# Check logs
|
|
sudo incus exec directory -- tail -50 /opt/gbo/logs/stderr.log
|
|
sudo incus exec directory -- tail -50 /opt/gbo/logs/stdout.log
|
|
|
|
# Verify database connectivity
|
|
sudo incus exec directory -- pg_isready -h 10.157.134.174 -p 5432 -U postgres
|
|
```
|
|
|
|
**Migration Errors:**
|
|
If migrations fail or database is in bad state:
|
|
```bash
|
|
# Stop service
|
|
sudo incus exec directory -- systemctl stop directory
|
|
|
|
# Drop and recreate database
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";"
|
|
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";"
|
|
|
|
# Reinitialize (see initialization steps above)
|
|
```
|
|
|
|
**Systemd Unit Configuration:**
|
|
The `directory.service` unit contains all environment variables:
|
|
```ini
|
|
[Unit]
|
|
Description=Directory (Zitadel)
|
|
After=network.target
|
|
|
|
[Service]
|
|
User=root
|
|
Group=root
|
|
WorkingDirectory=/opt/gbo
|
|
Environment=ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174
|
|
Environment=ZITADEL_DATABASE_POSTGRES_PORT=5432
|
|
Environment=ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY
|
|
Environment=ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres
|
|
Environment=ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df
|
|
Environment=ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable
|
|
Environment=ZITADEL_EXTERNALSECURE=false
|
|
Environment=ZITADEL_EXTERNALDOMAIN=10.157.134.240
|
|
Environment=ZITADEL_EXTERNALPORT=9000
|
|
Environment=ZITADEL_TLS_ENABLED=false
|
|
ExecStart=/opt/gbo/bin/zitadel start --masterkey MasterkeyNeedsToHave32Characters --tlsMode disabled --externalDomain 10.157.134.240 --externalPort 9000
|
|
Restart=always
|
|
RestartSec=5
|
|
StandardOutput=append:/opt/gbo/logs/stdout.log
|
|
StandardError=append:/opt/gbo/logs/stderr.log
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
---
|
|
|
|
## Common Operations
|
|
|
|
**Check status:** `sudo incus exec system -- systemctl status botserver --no-pager` (same for `ui`). To check process existence: `sudo incus exec system -- pgrep -f botserver`.
|
|
|
|
**View logs:** For systemd journal: `sudo incus exec system -- journalctl -u botserver --no-pager -n 50`. For application logs: `sudo incus exec system -- tail -50 /opt/gbo/logs/out.log` or `err.log`. For live tail: `sudo incus exec system -- tail -f /opt/gbo/logs/out.log`.
|
|
|
|
**Restart:** `sudo incus exec system -- systemctl restart botserver` and same for `ui`. Never run the binary directly.
|
|
|
|
**Emergency manual deploy:** Kill the old process with `sudo incus exec system -- killall botserver`, copy the new binary from `/opt/gbo/ci/botserver/target/debug/botserver` to `/opt/gbo/bin/botserver`, set permissions with `chmod +x` and `chown gbuser:gbuser`, then start with `systemctl start botserver`.
|
|
|
|
**Transfer bot files:** Archive locally with `tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai`, copy to host with `scp`, then extract inside container with `sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'`. Clear compiled cache with `find /opt/gbo/data -name "*.ast" -delete` and same for `/opt/gbo/work`.
|
|
|
|
**Snapshots:** `sudo incus snapshot list system` to list, `sudo incus snapshot restore system <name>` to restore.
|
|
|
|
---
|
|
|
|
## CI/CD Pipeline
|
|
|
|
Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos are `gb` (main workspace), `botserver`, `botui`, and `botlib`. Always push submodules first (`cd botserver && git push alm main && git push origin main`), then update submodule references in the root repo and push that too.
|
|
|
|
The CI runner container (`alm-ci`) runs Debian 12 Bookworm with glibc 2.36, same as the `system` container. Binaries compiled on the CI runner are compatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) builds in alm-ci (which has Rust toolchain) and deploys binary to system container. The workflow triggers on pushes to `main`, clones repos, builds in alm-ci, transfers binary via scp, and verifies botserver is running.
|
|
|
|
### ALM/CI Debugging & Monitoring
|
|
|
|
**Access ALM/CI containers:**
|
|
```bash
|
|
ssh administrator@<hostname>
|
|
sudo incus exec alm-ci -- bash # CI runner container
|
|
sudo incus exec tables -- bash # PostgreSQL (ALM database)
|
|
sudo incus exec system -- bash # botserver container
|
|
```
|
|
|
|
**Check CI runner status:**
|
|
```bash
|
|
# Runner process
|
|
sudo incus exec alm-ci -- ps aux | grep forgejo
|
|
|
|
# Runner logs
|
|
sudo incus exec alm-ci -- cat /opt/gbo/logs/forgejo-runner.log
|
|
|
|
# If runner is down, restart:
|
|
sudo incus exec alm-ci -- pkill -9 forgejo; sleep 2; cd /opt/gbo/bin && nohup ./forgejo-runner daemon --config config.yaml >> /opt/gbo/logs/forgejo-runner.log 2>&1 &
|
|
```
|
|
|
|
**Monitor CI runs in database:**
|
|
```bash
|
|
# Status codes: 0=pending, 1=success, 2=failure, 3=cancelled, 6=running
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 5;"'
|
|
|
|
# Check specific run jobs
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, name FROM action_run_job WHERE run_id = <ID>;"'
|
|
|
|
# Check tasks
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status FROM action_task WHERE repo_id = 3 ORDER BY id DESC LIMIT 3;"'
|
|
|
|
# Reset stuck run to re-trigger
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "UPDATE action_task SET status = 0 WHERE id = <ID>; UPDATE action_run_job SET status = 0 WHERE id = <RUN_ID>; UPDATE action_run SET status = 0 WHERE id = <RUN_ID>;"'
|
|
```
|
|
|
|
**Fix common CI issues:**
|
|
```bash
|
|
# /tmp permission denied for build.log
|
|
sudo incus exec alm-ci -- chmod 1777 /tmp
|
|
sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log
|
|
|
|
# Clean old CI runs (keep recent)
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run WHERE id < <RECENT_ID>;"'
|
|
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run_job WHERE run_id < <RECENT_ID>;"'
|
|
|
|
# Check deploy.log missing error - fix workflow step
|
|
# The Save deploy log step expects /tmp/deploy.log which the workflow doesn't create
|
|
# Fix: ensure deploy step outputs to /tmp/deploy.log
|
|
```
|
|
|
|
**Watch CI in real-time:**
|
|
```bash
|
|
# Tail runner logs
|
|
sudo incus exec alm-ci -- tail -f /opt/gbo/logs/forgejo-runner.log
|
|
|
|
# Check if new builds appear
|
|
watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c \"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\""'
|
|
|
|
# Verify botserver deployed correctly
|
|
sudo incus exec system -- /opt/gbo/bin/botserver --version 2>&1 | head -3
|
|
sudo incus exec system -- tail -5 /opt/gbo/logs/err.log
|
|
```
|
|
|
|
**CI Workflow Structure:**
|
|
1. Setup Git (disable SSL verify, add safe directories)
|
|
2. Setup Workspace (clone/merge gb workspace Cargo.toml)
|
|
3. Install system dependencies
|
|
4. Clean up workspaces
|
|
5. Build BotServer (output to /tmp/build.log)
|
|
6. Save build log
|
|
7. Deploy via ssh tar gzip
|
|
8. Verify botserver started
|
|
9. Save deploy log
|
|
|
|
---
|
|
|
|
## DriveMonitor & Bot Configuration
|
|
|
|
DriveMonitor is a background service inside botserver that watches MinIO buckets and syncs changes to the local filesystem and database every 10 seconds. It monitors three directory types per bot: the `.gbdialog/` folder for BASIC scripts (downloads and recompiles on change), the `.gbot/` folder for `config.csv` (syncs to the `bot_configuration` database table), and the `.gbkb/` folder for knowledge base documents (downloads and indexes for vector search).
|
|
|
|
Bot configuration is stored in two PostgreSQL tables inside the `botserver` database. The `bot_configuration` table holds key-value pairs with columns `bot_id`, `config_key`, `config_value`, `config_type`, `is_encrypted`, and `updated_at`. The `gbot_config_sync` table tracks sync state with columns `bot_id`, `config_file_path`, `last_sync_at`, `file_hash`, and `sync_count`.
|
|
|
|
The `config.csv` format is a plain CSV with no header: each line is `key,value`, for example `llm-provider,groq` or `theme-color1,#cc0000`. DriveMonitor syncs it when the file ETag changes in MinIO, on botserver startup, or after a restart.
|
|
|
|
**Check config status:** Query `bot_configuration` via `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = '<botname>') ORDER BY config_key;"`. Check sync state via the `gbot_config_sync` table. Inspect the bucket directly with `sudo incus exec drive -- /opt/gbo/bin/mc cat local/<botname>.gbai/<botname>.gbot/config.csv`.
|
|
|
|
**Debug DriveMonitor:** Monitor live logs with `sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)"`. An empty `gbot_config_sync` table means DriveMonitor has not synced yet. If no new log entries appear after 30 seconds, the loop may be stuck — restart botserver with systemctl to clear the state.
|
|
|
|
**Common config issues:** If config.csv is missing from the bucket, create and upload it with `mc cp`. If the database shows stale values, restart botserver to force a fresh sync, or as a temporary fix update the database directly with `UPDATE bot_configuration SET config_value = 'groq', updated_at = NOW() WHERE ...`. To force a re-sync without restarting, copy config.csv over itself with `mc cp local/... local/...` to change the ETag.
|
|
|
|
---
|
|
|
|
## MinIO (Drive) Operations
|
|
|
|
All bot files live in MinIO buckets. Use the `mc` CLI at `/opt/gbo/bin/mc` from inside the `drive` container. The bucket structure per bot is: `{bot}.gbai/` as root, `{bot}.gbai/{bot}.gbdialog/` for BASIC scripts, `{bot}.gbai/{bot}.gbot/` for config.csv, and `{bot}.gbai/{bot}.gbkb/` for knowledge base folders.
|
|
|
|
Common mc commands: `mc ls local/` lists all buckets; `mc ls local/salesianos.gbai/` lists a bucket; `mc cat local/.../start.bas` prints a file; `mc cp local/.../file /tmp/file` downloads; `mc cp /tmp/file local/.../file` uploads (this triggers DriveMonitor recompile); `mc stat local/.../config.csv` shows ETag and metadata; `mc mb local/newbot.gbai` creates a bucket; `mc rb local/oldbot.gbai` removes an empty bucket.
|
|
|
|
If mc is not found, use the full path `/opt/gbo/bin/mc`. If alias `local` is not configured, check with `mc config host list`. If MinIO is not running, check with `sudo incus exec drive -- systemctl status minio`.
|
|
|
|
---
|
|
|
|
## Vault Security Architecture
|
|
|
|
HashiCorp Vault is the single source of truth for all secrets. Botserver reads `VAULT_ADDR` and `VAULT_TOKEN` from `/opt/gbo/bin/.env` at startup, initializes a TLS/mTLS client, then reads credentials from Vault paths. If Vault is unavailable, it falls back to defaults. The `.env` file must only contain `VAULT_*` variables plus `PORT`, `DATA_DIR`, `WORK_DIR`, and `LOAD_ONLY`.
|
|
|
|
**Global Vault paths:** `gbo/tables` holds PostgreSQL credentials; `gbo/drive` holds MinIO access key and secret; `gbo/cache` holds Valkey password; `gbo/llm` holds LLM URL and API keys; `gbo/directory` holds Zitadel config; `gbo/email` holds SMTP credentials; `gbo/vectordb` holds Qdrant config; `gbo/jwt` holds JWT signing secret; `gbo/encryption` holds the master encryption key. Organization-scoped secrets follow patterns like `gbo/orgs/{org_id}/bots/{bot_id}` and tenant infrastructure uses `gbo/tenants/{tenant_id}/infrastructure`.
|
|
|
|
**Credential resolution:** For any service, botserver checks the most specific Vault path first (org+bot level), falls back to a default bot path, then falls back to the global path, and only uses environment variables as a last resort in development.
|
|
|
|
**Verify Vault health:** `sudo incus exec vault -- curl -k -sf https://localhost:8200/v1/sys/health` should return JSON with `"sealed":false`. To read a secret: set `VAULT_ADDR`, `VAULT_TOKEN`, and `VAULT_CACERT` then run `vault kv get secret/gbo/tables`. To test from the system container, use curl with `--cacert /opt/gbo/conf/system/certificates/ca/ca.crt` and `-H "X-Vault-Token: <token>"`.
|
|
|
|
**init.json** is stored at `/opt/gbo/bin/botserver-stack/conf/vault/vault-conf/init.json` and contains the root token and 5 unseal keys (3 needed to unseal). Never commit this file to git. Store it encrypted in a secure location.
|
|
|
|
**Vault troubleshooting — cannot connect:** Check that the vault container's systemd unit is running, verify the token in `.env` is not expired with `vault token lookup`, confirm the CA cert path in `.env` matches the actual file location, and test network connectivity from system to vault container. To generate a new token: `vault token create -policy="botserver" -ttl="8760h" -format=json` then update `.env` and restart botserver.
|
|
|
|
# Get database credentials from Vault v2 API
|
|
$ ssh user@<hostname> "sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt -H 'X-Vault-Token: <vault-token>' https://<vault-host>:8200/v1/secret/data/gbo/tables 2>/dev/null"
|
|
|
|
**Vault troubleshooting — secrets missing:** Run `vault kv get secret/gbo/tables` (and other paths) to check if secrets exist. If a path returns NOT FOUND, add secrets with `vault kv put secret/gbo/tables host=<ip> port=5432 database=botserver username=gbuser password=<pw>` and similar for other paths.
|
|
|
|
**Vault sealed after restart:** Run `vault operator unseal <key1>`, repeat with key2 and key3 (3 of 5 keys from init.json), then verify with `vault status`.
|
|
|
|
**TLS certificate errors:** Confirm `/opt/gbo/conf/system/certificates/ca/ca.crt` exists in the system container. If missing, copy it from the vault container using `incus file pull vault/opt/gbo/conf/vault/ca.crt /tmp/ca.crt` then place it at the expected path.
|
|
|
|
**Vault snapshots:** Stop vault, run `sudo incus snapshot create vault backup-$(date +%Y%m%d-%H%M)`, start vault. Restore with `sudo incus snapshot restore vault <name>` while vault is stopped.
|
|
|
|
---
|
|
|
|
## Incus Container Network Configuration
|
|
|
|
### Static IPv4 Address Assignment
|
|
|
|
When creating new containers, they may not receive IPv4 addresses automatically. To assign permanent static IPs:
|
|
|
|
**Step 1: Set static IP on the container device**
|
|
```bash
|
|
# Choose an unused IP in the 10.157.134.x range
|
|
sudo incus config device set <container> eth0 ipv4.address 10.157.134.<last_octet>
|
|
```
|
|
|
|
**Step 2: Configure network inside the container**
|
|
```bash
|
|
sudo incus exec <container> -- bash -c 'cat > /etc/network/interfaces << EOF
|
|
auto lo
|
|
iface lo inet loopback
|
|
|
|
auto eth0
|
|
iface eth0 inet static
|
|
address 10.157.134.<last_octet>
|
|
netmask 255.255.255.0
|
|
gateway 10.157.134.1
|
|
dns-nameservers 8.8.8.8 8.8.4.4
|
|
EOF'
|
|
```
|
|
|
|
**Step 3: Restart the container**
|
|
```bash
|
|
sudo incus restart <container>
|
|
```
|
|
|
|
**Step 4: Verify IPv4 assignment**
|
|
```bash
|
|
sudo incus list <container> -c n4
|
|
sudo incus exec <container> -- ip addr show eth0
|
|
```
|
|
|
|
### Common Network Issues
|
|
|
|
| Problem | Symptom | Fix |
|
|
|---------|---------|-----|
|
|
| No IPv4 | Container shows empty IPV4 column | Set static IP via `incus config device set` |
|
|
| IP conflict | "IP address already defined on another NIC" | Choose different IP, check `incus list` |
|
|
| Can't reach internet | DNS fails inside container | Configure DNS in `/etc/network/interfaces` |
|
|
| IPv6 only | Has IPv6 but no IPv4 | Add static IPv4 config as above |
|
|
| DHCP not working | dhclient fails or returns 169.254.x.x | Use static IP assignment instead |
|
|
|
|
### Container IP Reference
|
|
|
|
Standard IP assignments (10.157.134.x range):
|
|
- `system`: 10.157.134.196
|
|
- `tables`: 10.157.134.174
|
|
- `vault`: 10.157.134.250
|
|
- `cache`: 10.157.134.230
|
|
- `drive`: 10.157.134.206
|
|
- `directory`: 10.157.134.240
|
|
- `llm`: 10.157.134.205
|
|
- `vectordb`: 10.157.134.210
|
|
- `models`: 10.157.134.251 (reserved)
|
|
- `dns`: 10.157.134.214
|
|
- `proxy`: 10.157.134.241
|
|
- `email`: 10.157.134.40
|
|
- `meet`: 10.157.134.220
|
|
|
|
### Creating a New Container with Static IP
|
|
|
|
```bash
|
|
# Create container
|
|
sudo incus launch images:debian/12 <new-container>
|
|
|
|
# Set static IP (before first boot is best)
|
|
sudo incus config device set <new-container> eth0 ipv4.address 10.157.134.<unused>
|
|
|
|
# Configure networking inside container
|
|
sudo incus exec <new-container> -- bash -c 'cat > /etc/network/interfaces << EOF
|
|
auto lo
|
|
iface lo inet loopback
|
|
|
|
auto eth0
|
|
iface eth0 inet static
|
|
address 10.157.134.<unused>
|
|
netmask 255.255.255.0
|
|
gateway 10.157.134.1
|
|
dns-nameservers 8.8.8.8
|
|
EOF'
|
|
|
|
# Restart to apply
|
|
sudo incus restart <new-container>
|
|
|
|
# Verify
|
|
sudo incus list <new-container>
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting Quick Reference
|
|
|
|
**GLIBC mismatch (`GLIBC_2.39 not found`):** The binary was compiled on the CI runner (glibc 2.41) not inside the system container (glibc 2.36). The CI workflow must SSH into the system container to build. Check `botserver.yaml` to confirm this.
|
|
|
|
**botserver won't start:** Run `sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found"` to check for missing libraries. Run `sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1` to see startup errors. Confirm `/opt/gbo/data/` exists and is accessible.
|
|
|
|
**botui can't reach botserver:** Check that the `ui.service` systemd file has `BOTSERVER_URL=http://localhost:5858` — not the external HTTPS URL. Fix with `sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|'` on the service file, then `systemctl daemon-reload` and `systemctl restart ui`.
|
|
|
|
**Suggestions not showing:** Confirm bot `.bas` files exist under `/opt/gbo/data/<bot>.gbai/<bot>.gbdialog/`. Check logs for compilation errors. Clear the AST cache in `/opt/gbo/work/` and restart botserver.
|
|
|
|
**IPv6 DNS timeouts on external APIs (Groq, Cloudflare):** The container's DNS may return AAAA records without IPv6 connectivity. The container should have `IPV6=no` in its network config and `gai.conf` set appropriately. Check for `RES_OPTIONS=inet4` in `botserver.service` if issues persist.
|
|
|
|
**Logs show development paths instead of `/opt/gbo/data/`:** Botserver is using hardcoded dev paths. Check `.env` has `DATA_DIR=/opt/gbo/data/` and `WORK_DIR=/opt/gbo/work/`, verify the systemd unit has `EnvironmentFile=/opt/gbo/bin/.env`, and confirm Vault is reachable so service discovery works. Expected startup log lines include `info watcher:Watching data directory /opt/gbo/data` and `info botserver:BotServer started successfully on port 5858`.
|
|
|
|
**Migrations not running after push:** If `stat /opt/gbo/bin/botserver` shows old timestamp and `__diesel_schema_migrations` table has no new entries, CI did not rebuild. Make a trivial code change (e.g., add a comment) in botserver and push again to force rebuild.
|
|
|
|
---
|
|
|
|
## Drive (MinIO) File Operations Cheatsheet
|
|
|
|
All `mc` commands run inside the `drive` container with `PATH` set: `sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:$PATH && mc <command>'`. If `local` alias is missing, create it with credentials from Vault path `gbo/drive`.
|
|
|
|
**List bucket contents recursively:** `mc ls local/<bot>.gbai/ --recursive`
|
|
|
|
**Read a file from Drive:** `mc cat local/<bot>.gbai/<bot>.gbdialog/start.bas`
|
|
|
|
**Download a file:** `mc cp local/<bot>.gbai/<bot>.gbdialog/start.bas /tmp/start.bas`
|
|
|
|
**Upload a file to Drive (triggers DriveMonitor recompile):** Transfer file to host via `scp`, push into drive container with `sudo incus file push /tmp/file drive/tmp/file`, then `mc put /tmp/file local/<bot>.gbai/<bot>.gbdialog/start.bas`
|
|
|
|
**Full upload workflow example — updating config.csv:**
|
|
```bash
|
|
# 1. Download current config from Drive
|
|
ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc cat local/salesianos.gbai/salesianos.gbot/config.csv'" > /tmp/config.csv
|
|
|
|
# 2. Edit locally (change model, keys, etc.)
|
|
sed -i 's/llm-model,old-model/llm-model,new-model/' /tmp/config.csv
|
|
|
|
# 3. Push edited file back to Drive
|
|
scp /tmp/config.csv user@host:/tmp/config.csv
|
|
ssh user@host "sudo incus file push /tmp/config.csv drive/tmp/config.csv"
|
|
ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc put /tmp/config.csv local/salesianos.gbai/salesianos.gbot/config.csv'"
|
|
|
|
# 4. Wait ~15 seconds, then verify DriveMonitor picked up the change
|
|
ssh user@host "sudo incus exec system -- bash -c 'grep -i \"Model:\" /opt/gbo/logs/err.log | tail -3'"
|
|
```
|
|
|
|
**Force re-sync of config.csv** (change ETag without content change): `mc cp local/<bot>.gbai/<bot>.gbot/config.csv local/<bot>.gbai/<bot>.gbot/config.csv`
|
|
|
|
**Create a new bot bucket:** `mc mb local/newbot.gbai`
|
|
|
|
**Check MinIO health:** `sudo incus exec drive -- bash -c '/opt/gbo/bin/mc admin info local'`
|
|
|
|
---
|
|
|
|
## Logging Quick Reference
|
|
|
|
**Application logs** (searchable, timestamped, most useful): `sudo incus exec system -- tail -f /opt/gbo/logs/err.log` (errors and debug) or `/opt/gbo/logs/out.log` (stdout). The systemd journal only captures process lifecycle events, not application output.
|
|
|
|
**Search logs for specific bot activity:** `grep -i "salesianos\|llm\|Model:\|KB\|USE_KB\|drive_monitor" /opt/gbo/logs/err.log | tail -30`
|
|
|
|
**Check which LLM model a bot is using:** `grep "Model:" /opt/gbo/logs/err.log | tail -5`
|
|
|
|
**Check DriveMonitor config sync:** `grep "check_gbot\|config.csv\|should_sync" /opt/gbo/logs/err.log | tail -20`
|
|
|
|
**Check KB/vector operations:** `grep -i "gbkb\|qdrant\|embedding\|index" /opt/gbo/logs/err.log | tail -20`
|
|
|
|
**Live tail with filter:** `sudo incus exec system -- bash -c 'tail -f /opt/gbo/logs/err.log | grep --line-buffered -i "salesianos\|error\|KB"'`
|
|
|
|
---
|
|
|
|
## Program Access Cheatsheet
|
|
|
|
| Program | Container | Path | Notes |
|
|
|---------|-----------|------|-------|
|
|
| botserver | system | `/opt/gbo/bin/botserver` | Run via systemctl only |
|
|
| botui | system | `/opt/gbo/bin/botui` | Run via systemctl only |
|
|
| mc (MinIO Client) | drive | `/opt/gbo/bin/mc` | Must set `PATH=/opt/gbo/bin:$PATH` |
|
|
| psql | tables | `/usr/bin/psql` | `psql -h localhost -U postgres -d botserver` |
|
|
| vault | vault | `/opt/gbo/bin/vault` | Needs `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_CACERT` |
|
|
| zitadel | directory | `/opt/gbo/bin/zitadel` | Runs as root on port 8080 internally |
|
|
|
|
**Quick psql query — bot config:** `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') ORDER BY config_key;"`
|
|
|
|
**Quick psql query — active KBs for session:** `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT * FROM session_kb_associations WHERE session_id = '<uuid>' AND is_active = true;"`
|
|
|
|
---
|
|
|
|
## BASIC Compilation Architecture
|
|
|
|
Compilation and runtime are now strictly separated. **Compilation** happens only in `BasicCompiler` inside DriveMonitor when it detects `.bas` file changes. The output is a fully preprocessed `.ast` file written to `work/<bot>.gbai/<bot>.gbdialog/<tool>.ast`. **Runtime** (start.bas, TOOL_EXEC, automation, schedule) loads only `.ast` files and calls `ScriptService::run()` which does `engine.compile() + eval_ast_with_scope()` on the already-preprocessed Rhai source — no preprocessing at runtime.
|
|
|
|
The `.ast` file has all transforms applied: `USE KB "cartas"` becomes `USE_KB("cartas")`, `IF/END IF` → `if/{ }`, `WHILE/WEND` → `while/{ }`, `BEGIN TALK/END TALK` → function calls, `SAVE`, `FOR EACH/NEXT`, `SELECT CASE`, `SET SCHEDULE`, `WEBHOOK`, `USE WEBSITE`, `LLM` keyword expansion, variable predeclaration, and keyword lowercasing. Runtime never calls `compile()`, `compile_tool_script()`, or `compile_preprocessed()` — those methods no longer exist.
|
|
|
|
**Tools (TOOL_EXEC) load `.ast` only** — there is no `.bas` fallback. If an `.ast` file is missing, the tool fails with "Failed to read tool .ast file". DriveMonitor must have compiled it first.
|
|
|
|
**Suggestion deduplication** uses Redis `SADD` (set) instead of `RPUSH` (list). This prevents duplicate suggestion buttons when `start.bas` runs multiple times per session. The key format is `suggestions:{bot_id}:{session_id}` and `get_suggestions` uses `SMEMBERS` to read it.
|