Update salesianos to GPT-5.4-mini
This commit is contained in:
parent
3da2574a00
commit
0b1876461e
2 changed files with 16 additions and 16 deletions
30
PROD.md
30
PROD.md
|
|
@ -11,11 +11,11 @@ Never push secrets (API keys, passwords, tokens) to git. Never commit `init.json
|
|||
|
||||
## Infrastructure Overview
|
||||
|
||||
The host machine is `PROD-GBO1`, accessed via `ssh user@<hostname>`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec <container> -- <command>` and list them with `sudo incus list`.
|
||||
The host machine is `<hostname>`, accessed via `ssh user@<hostname>`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec <container> -- <command>` and list them with `sudo incus list`.
|
||||
|
||||
The containers and their roles are: `system` runs botserver on port 5858 and botui on port 5859; `alm-ci` runs the Forgejo Actions CI runner; `alm` hosts the Forgejo git server; `tables` runs PostgreSQL on port 5432; `cache` runs Valkey/Redis on port 6379; `drive` runs MinIO object storage on port 9100; `vault` runs HashiCorp Vault on port 8200; `vector` runs Qdrant on port 6333.
|
||||
|
||||
Externally, botserver is reachable at `https://system.example.com` and botui at `https://chat.example.com`. Internally, botui's `BOTSERVER_URL` must be `http://localhost:5858` — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access.
|
||||
Externally, botserver is reachable at `https://<system-domain>` and botui at `https://<chat-domain>`. Internally, botui's `BOTSERVER_URL` must be `http://localhost:5858` — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -26,13 +26,13 @@ Botserver runs as user `gbuser`, binary at `/opt/gbo/bin/botserver`, logs at `/o
|
|||
The directory service runs Zitadel as user `root`, binary at `/opt/gbo/bin/zitadel`, logs at `/opt/gbo/logs/zitadel.log`, systemd unit at `/etc/systemd/system/directory.service`, and loads environment from the service configuration. Zitadel provides identity management and OAuth2 services for the platform.
|
||||
|
||||
Internally, Zitadel listens on port 8080 within the directory container. For external access:
|
||||
- Via public domain (HTTPS): `https://login.example.com` (configured through proxy container)
|
||||
- Via public domain (HTTPS): `https://<login-domain>` (configured through proxy container)
|
||||
- Via host IP (HTTP): `http://<host-ip>:9000` (direct container port forwarding)
|
||||
- Via container IP (HTTP): `http://<directory-container-ip>:9000` (direct container access)
|
||||
Access the Zitadel console at `https://login.example.com/ui/console` with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under `/v2/organizations` and `/v2/users` for all operations.
|
||||
Access the Zitadel console at `https://<login-domain>/ui/console` with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under `/v2/organizations` and `/v2/users` for all operations.
|
||||
|
||||
The botserver bootstrap also manages: Vault (secrets), PostgreSQL (database), Valkey (cache, password auth), MinIO (object storage), Zitadel (identity provider), and llama.cpp (LLM).
|
||||
To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H \"Host: <IP> \"` for correct host resolution (replace with your directory container IP).
|
||||
To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H "Host: <directory-ip> "` for correct host resolution (replace with your directory container IP).
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -62,7 +62,7 @@ The CI runner container (`alm-ci`) runs Debian Trixie with glibc 2.41, but the `
|
|||
|
||||
**Access ALM/CI containers:**
|
||||
```bash
|
||||
ssh administrator@HOST
|
||||
ssh administrator@<hostname>
|
||||
sudo incus exec alm-ci -- bash # CI runner container
|
||||
sudo incus exec tables -- bash # PostgreSQL (ALM database)
|
||||
sudo incus exec system -- bash # botserver container
|
||||
|
|
@ -83,16 +83,16 @@ sudo incus exec alm-ci -- pkill -9 forgejo; sleep 2; cd /opt/gbo/bin && nohup ./
|
|||
**Monitor CI runs in database:**
|
||||
```bash
|
||||
# List recent runs (status: 0=pending, 1=running, 2=failure, 3=cancelled, 6=success)
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 5;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 5;"'
|
||||
|
||||
# Check specific run jobs
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, name FROM action_run_job WHERE run_id = <ID>;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, name FROM action_run_job WHERE run_id = <ID>;"'
|
||||
|
||||
# Check tasks
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status FROM action_task WHERE repo_id = 3 ORDER BY id DESC LIMIT 3;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status FROM action_task WHERE repo_id = 3 ORDER BY id DESC LIMIT 3;"'
|
||||
|
||||
# Reset stuck run to re-trigger
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "UPDATE action_task SET status = 0 WHERE id = <ID>; UPDATE action_run_job SET status = 0 WHERE id = <RUN_ID>; UPDATE action_run SET status = 0 WHERE id = <RUN_ID>;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "UPDATE action_task SET status = 0 WHERE id = <ID>; UPDATE action_run_job SET status = 0 WHERE id = <RUN_ID>; UPDATE action_run SET status = 0 WHERE id = <RUN_ID>;"'
|
||||
```
|
||||
|
||||
**Fix common CI issues:**
|
||||
|
|
@ -102,8 +102,8 @@ sudo incus exec alm-ci -- chmod 1777 /tmp
|
|||
sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log
|
||||
|
||||
# Clean old CI runs (keep recent)
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run WHERE id < <RECENT_ID>;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run_job WHERE run_id < <RECENT_ID>;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run WHERE id < <RECENT_ID>;"'
|
||||
sudo incus exec tables -- bash -c 'export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run_job WHERE run_id < <RECENT_ID>;"'
|
||||
|
||||
# Check deploy.log missing error - fix workflow step
|
||||
# The Save deploy log step expects /tmp/deploy.log which the workflow doesn't create
|
||||
|
|
@ -116,7 +116,7 @@ sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost
|
|||
sudo incus exec alm-ci -- tail -f /opt/gbo/logs/forgejo-runner.log
|
||||
|
||||
# Check if new builds appear
|
||||
watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c \"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\""'
|
||||
watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=<postgres-password>; psql -h localhost -U postgres -d PROD-ALM -c \"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\""'
|
||||
|
||||
# Verify botserver deployed correctly
|
||||
sudo incus exec system -- /opt/gbo/bin/botserver --version 2>&1 | head -3
|
||||
|
|
@ -144,7 +144,7 @@ Bot configuration is stored in two PostgreSQL tables inside the `botserver` data
|
|||
|
||||
The `config.csv` format is a plain CSV with no header: each line is `key,value`, for example `llm-provider,groq` or `theme-color1,#cc0000`. DriveMonitor syncs it when the file ETag changes in MinIO, on botserver startup, or after a restart.
|
||||
|
||||
**Check config status:** Query `bot_configuration` via `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') ORDER BY config_key;"`. Check sync state via the `gbot_config_sync` table. Inspect the bucket directly with `sudo incus exec drive -- /opt/gbo/bin/mc cat local/salesianos.gbai/salesianos.gbot/config.csv`.
|
||||
**Check config status:** Query `bot_configuration` via `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = '<botname>') ORDER BY config_key;"`. Check sync state via the `gbot_config_sync` table. Inspect the bucket directly with `sudo incus exec drive -- /opt/gbo/bin/mc cat local/<botname>.gbai/<botname>.gbot/config.csv`.
|
||||
|
||||
**Debug DriveMonitor:** Monitor live logs with `sudo incus exec system -- tail -f /opt/gbo/logs/out.log | grep -E "(DRIVE_MONITOR|check_gbot|config)"`. An empty `gbot_config_sync` table means DriveMonitor has not synced yet. If no new log entries appear after 30 seconds, the loop may be stuck — restart botserver with systemctl to clear the state.
|
||||
|
||||
|
|
@ -177,7 +177,7 @@ HashiCorp Vault is the single source of truth for all secrets. Botserver reads `
|
|||
**Vault troubleshooting — cannot connect:** Check that the vault container's systemd unit is running, verify the token in `.env` is not expired with `vault token lookup`, confirm the CA cert path in `.env` matches the actual file location, and test network connectivity from system to vault container. To generate a new token: `vault token create -policy="botserver" -ttl="8760h" -format=json` then update `.env` and restart botserver.
|
||||
|
||||
# Get database credentials from Vault v2 API
|
||||
$ ssh user@ip"sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt -H 'X-Vault-Token: hvs...' https://ip:8200/v1/secret/data/gbo/tables 2>/dev/null"
|
||||
$ ssh user@<hostname> "sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt -H 'X-Vault-Token: <vault-token>' https://<vault-host>:8200/v1/secret/data/gbo/tables 2>/dev/null"
|
||||
|
||||
**Vault troubleshooting — secrets missing:** Run `vault kv get secret/gbo/tables` (and other paths) to check if secrets exist. If a path returns NOT FOUND, add secrets with `vault kv put secret/gbo/tables host=<ip> port=5432 database=botserver username=gbuser password=<pw>` and similar for other paths.
|
||||
|
||||
|
|
|
|||
|
|
@ -1 +1 @@
|
|||
Subproject commit 57accc0deb7cd97869a20477f15f7ab28144c275
|
||||
Subproject commit c60361886578e5e7eda08d34c305718aaf068b18
|
||||
Loading…
Add table
Reference in a new issue