From 2ff5c43531ae9e0ba36b0b076c4f2716d7ca871f Mon Sep 17 00:00:00 2001 From: "Rodrigo Rodriguez (Pragmatismo)" Date: Thu, 16 Apr 2026 08:54:35 -0300 Subject: [PATCH] docs: Add ALM/CI debugging and monitoring section to PROD.md --- PROD.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ botserver | 2 +- 2 files changed, 77 insertions(+), 1 deletion(-) diff --git a/PROD.md b/PROD.md index abc1151..f74f0ca 100644 --- a/PROD.md +++ b/PROD.md @@ -58,6 +58,82 @@ Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos The CI runner container (`alm-ci`) runs Debian Trixie with glibc 2.41, but the `system` container runs Debian 12 Bookworm with glibc 2.36. Binaries compiled on the CI runner are incompatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) solves this by transferring source to the system container via `tar | ssh` and building there. The workflow triggers on pushes to `main`, clones repos, transfers source, builds inside system container, deploys the binary, and verifies botserver is running. +### ALM/CI Debugging & Monitoring + +**Access ALM/CI containers:** +```bash +ssh administrator@HOST +sudo incus exec alm-ci -- bash # CI runner container +sudo incus exec tables -- bash # PostgreSQL (ALM database) +sudo incus exec system -- bash # botserver container +``` + +**Check CI runner status:** +```bash +# Runner process +sudo incus exec alm-ci -- ps aux | grep forgejo + +# Runner logs +sudo incus exec alm-ci -- cat /opt/gbo/logs/forgejo-runner.log + +# If runner is down, restart: +sudo incus exec alm-ci -- pkill -9 forgejo; sleep 2; cd /opt/gbo/bin && nohup ./forgejo-runner daemon --config config.yaml >> /opt/gbo/logs/forgejo-runner.log 2>&1 & +``` + +**Monitor CI runs in database:** +```bash +# List recent runs (status: 0=pending, 1=running, 2=failure, 3=cancelled, 6=success) +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 5;"' + +# Check specific run jobs +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, name FROM action_run_job WHERE run_id = ;"' + +# Check tasks +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status FROM action_task WHERE repo_id = 3 ORDER BY id DESC LIMIT 3;"' + +# Reset stuck run to re-trigger +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "UPDATE action_task SET status = 0 WHERE id = ; UPDATE action_run_job SET status = 0 WHERE id = ; UPDATE action_run SET status = 0 WHERE id = ;"' +``` + +**Fix common CI issues:** +```bash +# /tmp permission denied for build.log +sudo incus exec alm-ci -- chmod 1777 /tmp +sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log + +# Clean old CI runs (keep recent) +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run WHERE id < ;"' +sudo incus exec tables -- bash -c 'export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run_job WHERE run_id < ;"' + +# Check deploy.log missing error - fix workflow step +# The Save deploy log step expects /tmp/deploy.log which the workflow doesn't create +# Fix: ensure deploy step outputs to /tmp/deploy.log +``` + +**Watch CI in real-time:** +```bash +# Tail runner logs +sudo incus exec alm-ci -- tail -f /opt/gbo/logs/forgejo-runner.log + +# Check if new builds appear +watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=postgres; psql -h localhost -U postgres -d PROD-ALM -c \"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\""' + +# Verify botserver deployed correctly +sudo incus exec system -- /opt/gbo/bin/botserver --version 2>&1 | head -3 +sudo incus exec system -- tail -5 /opt/gbo/logs/err.log +``` + +**CI Workflow Structure:** +1. Setup Git (disable SSL verify, add safe directories) +2. Setup Workspace (clone/merge gb workspace Cargo.toml) +3. Install system dependencies +4. Clean up workspaces +5. Build BotServer (output to /tmp/build.log) +6. Save build log +7. Deploy via ssh tar gzip +8. Verify botserver started +9. Save deploy log + --- ## DriveMonitor & Bot Configuration diff --git a/botserver b/botserver index e63c187..04bfd66 160000 --- a/botserver +++ b/botserver @@ -1 +1 @@ -Subproject commit e63c187f322e13a6d750d783888bf47c4a01b37f +Subproject commit 04bfd668a42fda91c90e8b6b6a346edcc1288111