27 KiB
General Bots Cloud — Production Operations Guide
Infrastructure Overview
- Host OS: Ubuntu 24.04 LTS, Incus
- SSH: Key auth only
- Container engine: Incus with ZFS storage pool
- Tenant: pragmatismo (migrated from LXD 82.29.59.188 to Incus 63.141.255.9)
Container Migration: pragmatismo (COMPLETED)
Summary
| Item | Detail |
|---|---|
| Source | LXD 5.21 on Ubuntu 22.04 @ 82.29.59.188 |
| Destination | Incus 6.x on Ubuntu 24.04 @ 63.141.255.9 |
| Migration method | incus copy --instance-only lxd-source:<name> |
| Data transfer | rsync via SSH (pull from destination → source:/opt/gbo) |
| Total downtime | ~4 hours |
| Containers migrated | 10 |
| Data transferred | ~44 GB |
Migrated Containers (destination names)
proxy → proxy (Caddy reverse proxy)
tables → tables (PostgreSQL)
system → system (botserver + botui, privileged)
drive → drive (MinIO S3)
dns → dns (CoreDNS)
email → email (Stalwart mail)
webmail → webmail (Roundcube)
alm → alm (Forgejo ALM)
alm-ci → alm-ci (Forgejo CI runner)
table-editor → table-editor (NocoDB)
Data Paths
- Source data:
root@82.29.59.188:/opt/gbo/(44 GB, tenant data + binaries) - Destination data:
/home/administrator/gbo/tenants/pragmatismo/(rsync in progress) - Final path:
/opt/gbo/tenants/pragmatismo/(symlink or mount)
Key Decisions Made
- No
pragmatismo-prefix on destination (unlike source) - iptables NAT instead of Incus proxy devices (proxy devices conflicted with NAT rules)
- Incus proxy devices removed from all containers after NAT configured
- Disk devices removed from source containers before migration (Incus can't resolve LXD paths)
Port Forwarding (iptables NAT)
| Port | Service |
|---|---|
| 80, 443 | Caddy (HTTP/HTTPS) |
| 25, 465, 587 | SMTP |
| 993, 995, 143, 110, 4190 | IMAP/POP/Sieve |
| 53 | DNS |
Remaining Post-Migration Tasks
- rsync transfer: Source /opt/gbo → destination ~/gbo ✓
- Merge data: rsync to /opt/gbo/tenants/pragmatismo/ ✓
- Configure NAT: iptables PREROUTING rules ✓
- Update Caddy: Replace old IPs with new 10.107.115.x IPs ✓
- Copy data to containers: tar.gz method for proxy, tables, email, webmail, alm-ci, table-editor ✓
- Fix directory structure: system, dns, alm ✓
- Caddy installed and running ✓
- SSL certificates: Let's Encrypt rate limited - need to wait or use existing certs
- botserver binary missing in system container
- DNS cutover: Update NS/A records to point to 63.141.255.9
- Source cleanup: Delete /opt/gbo/ on source after verification
Current Container Status (2026-03-22 17:50 UTC)
| Container | /opt/gbo/ contents | Status |
|---|---|---|
| proxy | conf, data, logs, Caddy running | ✓ OK (SSL pending) |
| tables | conf, data, logs, pgconf, pgdata | ✓ OK |
| conf, data, logs | ✓ OK | |
| webmail | conf, data, logs | ✓ OK |
| alm-ci | conf, data, logs | ✓ OK |
| table-editor | conf, data, logs | ✓ OK |
| system | bin, botserver-stack, conf, data, logs | ✓ OK |
| drive | data, logs | ✓ OK |
| dns | bin, conf, data, logs | ✓ OK |
| alm | alm/, conf, data, logs | ✓ OK |
Known Issues
- Let's Encrypt rate limiting - Too many cert requests from old server. Certificates will auto-renew after rate limit clears (~1 hour)
- botserver database connection - PostgreSQL is in tables container (10.107.115.33), need to update DATABASE_URL in system container
- SSL certificates - Caddy will retry obtaining certs after rate limit clears
Final Status (2026-03-22 18:30 UTC)
Container Services Status
| Container | Service | Port | Status |
|---|---|---|---|
| system | Vault | 8200 | ✓ Running |
| system | Valkey | 6379 | ✓ Running |
| system | MinIO | 9100 | ✓ Running |
| system | Qdrant | 6333 | ✓ Running |
| system | botserver | - | ⚠️ Not listening |
| tables | PostgreSQL | 5432 | ✓ Running |
| proxy | Caddy | 80, 443 | ✓ Running |
| dns | CoreDNS | 53 | ❌ Not running |
| Stalwart | 25,143,465,993,995 | ❌ Not running | |
| webmail | Roundcube | - | ❌ Not running |
| alm | Forgejo | 3000 | ❌ Not running |
| alm-ci | Forgejo-runner | - | ❌ Not running |
| table-editor | NocoDB | - | ❌ Not running |
| drive | MinIO | - | ❌ (in system container) |
Issues Found
- botserver not listening - needs DATABASE_URL pointing to tables container
- dns, email, webmail, alm, alm-ci, table-editor - services not started
- SSL certificates - Let's Encrypt rate limited
Data Structure
Host path: /opt/gbo/tenants/pragmatismo/<containername>/
Container path: /opt/gbo/ (conf, data, logs, bin, etc.)
| Container | Host Path | Container /opt/gbo/ |
|---|---|---|
| system | .../system/ |
bin, botserver-stack, conf, data, logs |
| proxy | .../proxy/ |
conf, data, logs |
| tables | .../tables/ |
conf, data, logs |
| drive | .../drive/ |
data, logs |
| dns | .../dns/ |
bin, conf, data, logs |
.../email/ |
conf, data, logs | |
| webmail | .../webmail/ |
conf, data, logs |
| alm | .../alm/ |
conf, data, logs |
| alm-ci | .../alm-ci/ |
conf, data, logs |
| table-editor | .../table-editor/ |
conf, data, logs |
Attach Data Devices (after moving data)
# Move data to final location
ssh administrator@63.141.255.9 "sudo mv /home/administrator/gbo /opt/gbo/tenants/pragmatismo"
# Attach per-container disk device
for container in system proxy tables drive dns email webmail alm alm-ci table-editor; do
incus config device add $container gbo disk \
source=/opt/gbo/tenants/pragmatismo/$container \
path=/opt/gbo
done
# Fix permissions (each container)
for container in system proxy tables drive dns email webmail alm alm-ci table-editor; do
incus exec $container -- chown -R gbuser:gbuser /opt/gbo/ 2>/dev/null || \
incus exec $container -- chown -R root:root /opt/gbo/
done
Container IPs (for Caddy configuration)
system: 10.107.115.229
proxy: 10.107.115.189
tables: 10.107.115.33
drive: 10.107.115.114
dns: 10.107.115.155
email: 10.107.115.200
webmail: 10.107.115.208
alm: 10.107.115.4
alm-ci: 10.107.115.190
table-editor: (no IP - start container)
LXC Container Architecture (destination)
| Container | Purpose | Exposed Ports |
|---|---|---|
proxy |
Caddy reverse proxy | 80, 443 |
system |
botserver + botui (privileged!) | internal only |
alm |
Forgejo (ALM/Git) | internal only |
alm-ci |
Forgejo CI runner | none |
email |
Stalwart mail server | 25,465,587,993,995,143,110 |
dns |
CoreDNS | 53 |
drive |
MinIO S3 | internal only |
tables |
PostgreSQL | internal only |
table-editor |
NocoDB | internal only |
webmail |
Roundcube | internal only |
Key Rules
systemmust be privileged (security.privileged: true) — required for botserver to own/opt/gbo/mounts- All containers use iptables NAT for port forwarding — NEVER use Incus proxy devices (they conflict with NAT)
- Data copied into each container at
/opt/gbo/— NOT disk devices. Each container has its own copy of data. - CI runner (
alm-ci) must NOT have cross-container disk device mounts — deploy via SSH only - Caddy config must have correct upstream IPs for each backend container
Container Migration (LXD to Incus) — COMPLETED
Migration Workflow (for future tenants)
Best Method: incus copy --instance-only — transfers containers directly between LXD and Incus.
Prerequisites
# 1. Open port 8443 on both servers
ssh root@<source-host> "iptables -I INPUT -p tcp --dport 8443 -j ACCEPT"
ssh administrator@<dest-host> "sudo iptables -I INPUT -p tcp --dport 8443 -j ACCEPT"
# 2. Exchange SSH keys (for rsync data transfer)
ssh administrator@<dest-host> "cat ~/.ssh/id_rsa.pub"
ssh root@<source-host> "echo '<dest-pubkey>' >> /root/.ssh/authorized_keys"
# 3. Add source LXD as Incus remote
ssh administrator@<dest-host> "incus remote add lxd-source <source-ip> --protocol=incus --accept-certificate"
# 4. Add destination cert to source LXD trust
ssh <dest-user>@<dest-host> "cat ~/.config/incus/client.crt"
ssh root@<source-host> "lxc config trust add -"
Migration Steps
# 1. On SOURCE: Remove disk devices (Incus won't have source paths)
for c in $(lxc list --format csv -c n); do
lxc stop $c
for d in $(lxc config device list $c); do
lxc config device remove $c $d
done
done
# 2. On DESTINATION: Copy each container
incus copy --instance-only lxd-source:<source-container> <dest-name>
incus start <dest-name>
# 3. On DESTINATION: Add eth0 network to each container
incus config device add <container> eth0 nic name=eth0 network=incusbr0
# 4. On DESTINATION: Configure iptables NAT (not proxy devices!)
# See iptables NAT Setup above
# 5. On DESTINATION: Pull data via rsync (from destination to source)
ssh administrator@<dest-host> "rsync -avz --progress root@<source-ip>:/opt/gbo/ /home/administrator/gbo/"
# 6. On DESTINATION: Organize data per container
# Data is structured as: /home/administrator/gbo/<containername>/
# Each container gets its own folder with {conf,data,logs,bin}/
# 7. On DESTINATION: Move to final location
ssh administrator@<dest-host> "sudo mkdir -p /opt/gbo/tenants/"
ssh administrator@<dest-host> "sudo mv /home/administrator/gbo /opt/gbo/tenants/<tenant>/"
# 8. On DESTINATION: Copy data into each container
for container in system proxy tables drive dns email webmail alm alm-ci table-editor; do
incus exec $container -- mkdir -p /opt/gbo
incus file push --recursive /opt/gbo/tenants/<tenant>/$container/. $container/opt/gbo/
done
# 9. On DESTINATION: Fix permissions
for container in system proxy tables drive dns email webmail alm alm-ci table-editor; do
incus exec $container -- chown -R gbuser:gbuser /opt/gbo/ 2>/dev/null || \
incus exec $container -- chown -R root:root /opt/gbo/
done
# 10. On DESTINATION: Update Caddy config with new container IPs
# sed -i 's/10.16.164.x/10.107.115.x/g' /opt/gbo/conf/config
incus file push /tmp/new_caddy_config proxy/opt/gbo/conf/config
# 11. Reload Caddy
incus exec proxy -- /opt/gbo/bin/caddy reload --config /opt/gbo/conf/config --adapter caddyfile
iptables NAT Setup (on destination host)
# Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
# NAT rules — proxy container (ports 80, 443)
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 10.107.115.189:80
sudo iptables -t nat -A PREROUTING -p tcp --dport 443 -j DNAT --to-destination 10.107.115.189:443
# NAT rules — email container (SMTP/IMAP)
sudo iptables -t nat -A PREROUTING -p tcp --dport 25 -j DNAT --to-destination 10.107.115.200:25
sudo iptables -t nat -A PREROUTING -p tcp --dport 465 -j DNAT --to-destination 10.107.115.200:465
sudo iptables -t nat -A PREROUTING -p tcp --dport 587 -j DNAT --to-destination 10.107.115.200:587
sudo iptables -t nat -A PREROUTING -p tcp --dport 993 -j DNAT --to-destination 10.107.115.200:993
sudo iptables -t nat -A PREROUTING -p tcp --dport 995 -j DNAT --to-destination 10.107.115.200:995
sudo iptables -t nat -A PREROUTING -p tcp --dport 143 -j DNAT --to-destination 10.107.115.200:143
sudo iptables -t nat -A PREROUTING -p tcp --dport 110 -j DNAT --to-destination 10.107.115.200:110
sudo iptables -t nat -A PREROUTING -p tcp --dport 4190 -j DNAT --to-destination 10.107.115.200:4190
# NAT rules — dns container (DNS)
sudo iptables -t nat -A PREROUTING -p udp --dport 53 -j DNAT --to-destination 10.107.115.155:53
sudo iptables -t nat -A PREROUTING -p tcp --dport 53 -j DNAT --to-destination 10.107.115.155:53
# Masquerade outgoing traffic
sudo iptables -t nat -A POSTROUTING -s 10.107.115.0/24 -j MASQUERADE
# Save rules
sudo netfilter-persistent save
Remove Incus Proxy Devices (after NAT is working)
for c in $(incus list --format csv -c n); do
for d in $(incus config device list $c | grep proxy); do
incus config device remove $c $d
done
done
pragmatismo Migration Notes
- Source server:
root@82.29.59.188(LXD 5.21, Ubuntu 22.04) - Destination:
administrator@63.141.255.9(Incus 6.x, Ubuntu 24.04) - Container naming: No prefix on destination (
proxynotpragmatismo-proxy) - Data: rsync pull from destination (not push from source)
Firewall (host)
⚠️ CRITICAL: NEVER Block SSH Port 22
When installing ANY firewall (UFW, iptables, etc.), ALWAYS allow SSH (port 22) FIRST, before enabling the firewall.
Wrong order (will lock you out!):
ufw enable # BLOCKS SSH!
Correct order:
ufw allow 22/tcp # FIRST: Allow SSH
ufw allow 80/tcp # Allow HTTP
ufw allow 443/tcp # Allow HTTPS
ufw enable # THEN enable firewall
Firewall Setup Steps
-
Always allow SSH before enabling firewall:
sudo ufw allow 22/tcp -
Install UFW:
sudo apt-get install -y ufw -
Configure UFW with SSH allowed:
sudo ufw default forward ACCEPT sudo ufw allow 22/tcp sudo ufw allow 80/tcp sudo ufw allow 443/tcp sudo ufw enable -
Persist iptables rules for NAT (containers): Create
/etc/systemd/system/iptables-restore.service:[Unit] Description=Restore iptables rules on boot After=network-pre.target Before=network.target DefaultDependencies=no [Service] Type=oneshot ExecStart=/bin/bash -c "/sbin/iptables-restore < /etc/iptables/rules.v4" RemainAfterExit=yes [Install] WantedBy=multi-user.targetSave rules and enable:
sudo iptables-save > /etc/iptables/rules.v4 sudo systemctl enable iptables-restore.service -
Install fail2ban:
# Download fail2ban deb from http://ftp.us.debian.org/debian/pool/main/f/fail2ban/ sudo dpkg -i fail2ban_*.deb sudo touch /var/log/auth.log sudo systemctl enable fail2ban sudo systemctl start fail2ban -
Configure fail2ban SSH jail:
sudo fail2ban-client status # Should show sshd jail
Requirements
- ufw with
DEFAULT_FORWARD_POLICY=ACCEPT(needed for container internet) - fail2ban on host (SSH jail) and in email container (mail jail)
- iptables NAT rules must persist via systemd service
🔧 Common Production Issues & Fixes
Issue: Valkey/Redis Connection Timeout
Symptom: botserver logs show Connection timed out (os error 110) when connecting to cache at localhost:6379
Root Cause: iptables DROP rule for port 6379 blocks loopback traffic because no ACCEPT rule for lo interface exists before the DROP rules.
Fix:
# Insert loopback ACCEPT at top of INPUT chain
incus exec system -- iptables -I INPUT 1 -i lo -j ACCEPT
# Persist the rule
incus exec system -- bash -c 'iptables-save > /etc/iptables/rules.v4'
# Verify Valkey responds
incus exec system -- /opt/gbo/bin/botserver-stack/bin/cache/bin/valkey-cli ping
# Should return: PONG
# Restart botserver to pick up working cache
incus exec system -- systemctl restart system.service ui.service
Prevention: Always ensure loopback ACCEPT rule is at the top of iptables INPUT chain before any DROP rules.
Issue: Suggestions Not Showing in Frontend
Symptom: Bot's start.bas has ADD_SUGGESTION_TOOL calls but suggestions don't appear in the UI.
Diagnosis:
# Get bot ID
incus exec system -- /opt/gbo/bin/botserver-stack/bin/tables/bin/psql -h localhost -U gbuser -d botserver -t -c "SELECT id, name FROM bots WHERE name = 'botname';"
# Check if suggestions exist in cache with correct bot_id
incus exec system -- /opt/gbo/bin/botserver-stack/bin/cache/bin/valkey-cli --scan --pattern "suggestions:<bot_id>:*"
# If no keys found, check logs for wrong bot_id being used
incus exec system -- grep "Adding suggestion to Redis key" /opt/gbo/logs/error.log | tail -5
Fix: This was a code bug where suggestions were stored with user_id instead of bot_id. After deploying the fix:
- Wait for CI/CD to build and deploy new binary (~10 minutes)
- Service auto-restarts on binary update
- Test by opening a new session (old sessions may have stale keys)
Deployment & Testing Workflow
# 1. Fix code in dev environment
# 2. Push to ALM (both submodules AND root)
cd botserver && git push alm main
cd .. && git add botserver && git commit -m "Update submodule" && git push alm main
# 3. Wait ~4 minutes for CI/CD build
# Build time: ~3-4 minutes on CI runner
# 4. Verify deployment
ssh root@pragmatismo.com.br "lxc exec pragmatismo-system -- stat /opt/gbo/bin/botserver | grep Modify"
# 5. Test with Playwright
# Use Playwright MCP to open https://chat.pragmatismo.com.br/<botname>
# Verify suggestions appear, TALK executes, no errors in console
Testing with Playwright:
# Open bot in browser via Playwright MCP
Navigate to: https://chat.pragmatismo.com.br/<botname>
# Verify:
# - start.bas executes quickly (< 5 seconds)
# - Suggestions appear in UI
# - No errors in browser console
On destination (Incus):
# Verify botserver binary
incus exec system -- stat /opt/gbo/bin/botserver | grep Modify
# Restart services
incus exec system -- systemctl restart system.service ui.service
⚠️ Caddy Config — CRITICAL RULES
NEVER replace the Caddyfile with a minimal/partial config. The full config has ~25 vhosts. If you only see 1-2 vhosts, you are looking at a broken/partial config.
Before ANY change:
- Backup:
cp /opt/gbo/conf/config /opt/gbo/conf/config.bak-$(date +%Y%m%d%H%M) - Validate:
caddy validate --config /opt/gbo/conf/config --adapter caddyfile - Reload (not restart):
caddy reload --config /opt/gbo/conf/config --adapter caddyfile
Caddy storage must be explicitly set in the global block, otherwise Caddy uses ~/.local/share/caddy and loses existing certificates on restart:
{
storage file_system {
root /opt/gbo/data/caddy
}
}
Dead domains cause ERR_SSL_PROTOCOL_ERROR — if a domain in the Caddyfile has no DNS record, Caddy loops trying to get a certificate and pollutes TLS state. Remove dead domains immediately.
After removing domains from config, restart Caddy (not just reload) to clear in-memory ACME state from old domains.
botserver / botui
- botserver:
/opt/gbo/bin/botserver(system.service, port 5858) - botui:
/opt/gbo/bin/botui(ui.service, port 5859) BOTSERVER_URLinui.servicemust point tohttp://localhost:5858(not HTTPS external URL) — using external URL causes WebSocket disconnect before TALK executes- Valkey/Redis bound to
127.0.0.1:6379— iptables rules must allow loopback on this port or suggestions/cache won't work - Vault unseal keys stored in
/opt/gbo/vault-unseal-keys(production only - never commit to git)
Caddy in Proxy Container
- Binary:
/usr/bin/caddy(system container) orcaddyin PATH - Config:
/opt/gbo/conf/config - Reload:
incus exec proxy -- caddy reload --config /opt/gbo/conf/config --adapter caddyfile - Storage:
/opt/gbo/data/caddy
Upstream IPs (after migration):
| Backend | IP |
|---|---|
| system (botserver) | 10.107.115.229:5858 |
| system (botui) | 10.107.115.229:5859 |
| tables (PostgreSQL) | 10.107.115.33:5432 |
| drive (MinIO S3) | 10.107.115.114:9000 |
| webmail | 10.107.115.208 |
| alm | 10.107.115.4 |
| table-editor | 10.107.115.x (assign IP first) |
Log Locations
botserver/botui logs:
# Main application logs (in pragmatismo-system container)
/opt/gbo/logs/error.log # botserver logs
/opt/gbo/logs/botui-error.log # botui logs
/opt/gbo/logs/output.log # stdout/stderr output
Component logs (in /opt/gbo/bin/botserver-stack/logs/):
cache/ # Valkey/Redis logs
directory/ # Zitadel logs
drive/ # MinIO S3 logs
llm/ # LLM (llama.cpp) logs
tables/ # PostgreSQL logs
vault/ # Vault secrets logs
vector_db/ # Qdrant vector DB logs
Checking component logs:
# Valkey
incus exec system -- tail -f /opt/gbo/bin/botserver-stack/logs/cache/valkey.log
# PostgreSQL
incus exec system -- tail -f /opt/gbo/bin/botserver-stack/logs/tables/postgres.log
# Qdrant
incus exec system -- tail -f /opt/gbo/bin/botserver-stack/logs/vector_db/qdrant.log
iptables loopback rule (required)
Internal services (Valkey, MinIO) are protected by DROP rules. Loopback must be explicitly allowed before the DROP rules:
iptables -I INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 6379 -j DROP # external only
CoreDNS Hardening
Corefile must include acl plugin to prevent DNS amplification attacks:
zone.example.com:53 {
file /opt/gbo/data/zone.example.com.zone
acl {
allow type ANY net 10.0.0.0/8 127.0.0.0/8
allow type A net 0.0.0.0/0
allow type AAAA net 0.0.0.0/0
allow type MX net 0.0.0.0/0
block
}
cache
errors
}
Reload with SIGHUP: pkill -HUP coredns
fail2ban in Proxy Container
Proxy container needs its own fail2ban for HTTP flood protection:
- Filter: match 4xx errors from Caddy JSON access log
- Jail:
caddy-http-flood— 100 errors/60s → ban 1h - Disable default
sshdjail (no SSH in proxy container) viajail.d/defaults-debian.conf
CI/CD (Forgejo Runner)
- ALWAYS use CI for deployment — NEVER manually scp binaries. CI ensures consistent, auditable deployments.
- Runner container must have no cross-container disk mounts
- Deploy via SSH:
scp binary <system-container>:/opt/gbo/bin/botserver(only from CI, not manually) - SSH key from runner → system container must be pre-authorized
- sccache + cargo registry cache accumulates — daily cleanup cron required
- ZFS snapshots of CI container can be huge if taken while cross-mounts were active — delete stale snapshots after removing mounts
Forgejo Workflow Location
Each submodule has its own workflow at .forgejo/workflows/<name>.yaml.
botserver workflow: botserver/.forgejo/workflows/botserver.yaml
CI Deployment Flow
- Push code to ALM → triggers CI workflow automatically
- CI builds binary on
pragmatismo-alm-cirunner - CI deploys to
pragmatismo-systemcontainer via SSH - CI verifies botserver process is running after deploy
- If CI fails → check logs at
/tmp/deploy-*.logon CI runner
To trigger CI manually:
# Push to ALM
cd botserver && git push alm main
# Or via API
curl -X POST "http://alm.pragmatismo.com.br/api/v1/repos/GeneralBots/BotServer/actions/workflows/botserver.yaml/runs"
SSH Hostname Setup (CI Runner)
The CI runner must resolve system hostname. Add to /etc/hosts once (manual step on host):
incus exec alm-ci -- bash -c 'echo "10.16.164.33 system" >> /etc/hosts'
Deploy Step — CRITICAL
The deploy step must kill the running botserver process before scp, otherwise scp fails with dest open: Failure (binary is locked by running process):
- name: Deploy via SSH
run: |
ssh pragmatismo-system "pkill -f /opt/gbo/bin/botserver || true; sleep 2"
scp target/debug/botserver pragmatismo-system:/opt/gbo/bin/botserver
ssh pragmatismo-system "chmod +x /opt/gbo/bin/botserver && cd /opt/gbo/bin && nohup sudo -u gbuser ./botserver --noconsole >> /opt/gbo/logs/error.log 2>&1 &"
Never use systemctl stop system.service — botserver is not managed by systemd, it runs as a process under gbuser.
Binary Ownership
The binary at /opt/gbo/bin/botserver must be owned by gbuser, not root:
incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver
If owned by root, scp as gbuser will fail even after killing the process.
ZFS Disk Space
- Check snapshots:
zfs list -t snapshot -o name,used | sort -k2 -rh - Snapshots retain data from device mounts at time of snapshot — removing mounts doesn't free space until snapshot is deleted
- Delete snapshot:
zfs destroy <pool>/containers/<name>@<snapshot> - Daily rolling snapshots (7-day retention) via cron
Git Workflow
Push to both remotes after every change:
cd <submodule>
git push origin main
git push alm main
cd ..
git add <submodule>
git commit -m "Update submodule"
git push alm main
Failure to push the root gb repo will not trigger CI/CD pipelines.
Useful Commands
# Check all containers (Incus)
incus list
# Check disk device mounts per container
for c in $(incus list --format csv -c n); do
devices=$(incus config device show $c | grep 'type: disk' | grep -v 'pool:' | wc -l)
[ $devices -gt 0 ] && echo "=== $c ===" && incus config device show $c | grep -E 'source:|path:' | grep -v pool
done
# Tail Caddy errors
incus exec proxy -- tail -f /opt/gbo/logs/access.log
# Restart botserver + botui
incus exec system -- systemctl restart system.service ui.service
# Check iptables in system container
incus exec system -- iptables -L -n | grep -E 'DROP|ACCEPT.*lo'
# ZFS snapshot usage
zfs list -t snapshot -o name,used | sort -k2 -rh | head -20
# Unseal Vault (use actual unseal key from init.json)
incus exec system -- bash -c "
export VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=true
/opt/gbo/bin/botserver-stack/bin/vault/vault operator unseal \$UNSEAL_KEY
"
# Check rsync transfer progress (on destination)
du -sh /home/administrator/gbo
CI/CD Debugging
Check CI Runner Container
# From production host, SSH to CI runner
ssh root@alm-ci
# Check CI workspace for cloned repos
ls /root/workspace/
# Test SSH to system container
ssh -o ConnectTimeout=5 system 'hostname'
Query CI Runs via Forgejo API
# List recent workflow runs for a repo
curl -s "http://alm.pragmatismo.com.br/api/v1/repos/GeneralBots/<repo>/actions/runs?limit=5"
# Trigger workflow manually (if token available)
curl -X POST "http://alm.pragmatismo.com.br/api/v1/repos/GeneralBots/<repo>/actions/workflows/<workflow>.yaml/runs"
Check Binary Deployed
# From production host
incus exec system -- stat /opt/gbo/bin/<binary> | grep Modify
incus exec system -- strings /opt/gbo/bin/<binary> | grep '<expected_code_string>'
CI Build Logs Location
# On CI runner (alm-ci)
# Logs saved via: sudo cp /tmp/build.log /opt/gbo/logs/
# Access from production host
ssh root@alm-ci -- cat /opt/gbo/logs/*.log 2>/dev/null
Common CI Issues
SSH Connection Refused:
- CI runner must have
systemin/root/.ssh/configwith correct IP - Check:
ssh -o ConnectTimeout=5 system 'hostname'
Binary Not Updated After Deploy:
- Verify binary modification time matches CI run time
- Check CI build source: Clone on CI runner and verify code
- Ensure
embed-uifeature includes the file (RustEmbed embeds at compile time)
# Rebuild with correct features
cargo build --release -p botui --features embed-ui