docs: add prod.md

This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2026-04-02 15:46:28 -03:00
parent 82f5f24b44
commit f9aa5dacd0

259
prompts/prod.md Normal file
View file

@ -0,0 +1,259 @@
# Production Environment Guide
## Infrastructure
### Servers
| Host | IP | Purpose |
|------|-----|---------|
| `system` | `10.157.134.196` | Main botserver + botui container |
| `alm-ci` | `10.157.134.200` | CI/CD runner (Forgejo Actions) |
| `alm` | `10.157.134.34` | Forgejo git server |
| `dns` | `10.157.134.214` | DNS container |
| `drive` | `10.157.134.206` | Drive storage |
| `email` | `10.157.134.40` | Email service |
| `proxy` | `10.157.134.241` | Reverse proxy |
| `tables` | `10.157.134.174` | PostgreSQL |
| `table-editor` | `10.157.134.184` | Table editor |
| `webmail` | `10.157.134.86` | Webmail |
### Port Mapping (system container)
| Service | Internal Port | External URL |
|---------|--------------|--------------|
| botserver | `5858` | `https://system.pragmatismo.com.br` |
| botui | `5859` | `https://chat.pragmatismo.com.br` |
### Access
```bash
# SSH to host
ssh administrator@63.141.255.9
# Execute inside system container
sudo incus exec system -- bash -c 'command'
# SSH from host to container (used by CI)
ssh -o StrictHostKeyChecking=no system "command"
```
## Services
### botserver.service
- **Binary**: `/opt/gbo/bin/botserver`
- **Port**: `5858`
- **User**: `gbuser`
- **Logs**: `/opt/gbo/logs/out.log`, `/opt/gbo/logs/err.log`
- **Config**: `/etc/systemd/system/botserver.service`
- **Env**: `PORT=5858`
### ui.service
- **Binary**: `/opt/gbo/bin/botui`
- **Port**: `5859`
- **Config**: `/etc/systemd/system/ui.service`
- **Env**: `BOTSERVER_URL=http://localhost:5858`
- ⚠️ MUST be `http://localhost:5858` — NOT `https://system.pragmatismo.com.br`
- Rust proxy runs server-side, needs direct localhost access
- JS client uses relative URLs through `chat.pragmatismo.com.br`
### Data Directory
- **Path**: `/opt/gbo/data/`
- **Structure**: `<botname>.gbai/<botname>.gbdialog/*.bas`
- **Bots**: cristo, fema, jucees, oerlabs, poupatempo, pragmatismogb, salesianos, sentient, seplagse
- **Work dir**: `/opt/gbo/work/` (compiled .ast cache)
### Stack Services (managed by botserver bootstrap)
- **Vault**: Secrets management
- **PostgreSQL**: Database (port 5432)
- **Valkey**: Cache (port 6379, password auth)
- **MinIO**: Object storage
- **Zitadel**: Identity provider
- **LLM**: llama.cpp
## CI/CD Pipeline
### Repositories
| Repo | ALM URL | GitHub URL |
|------|---------|------------|
| gb | `https://alm.pragmatismo.com.br/GeneralBots/gb.git` | `git@github.com:GeneralBots/gb.git` |
| botserver | `https://alm.pragmatismo.com.br/GeneralBots/BotServer.git` | `git@github.com:GeneralBots/botserver.git` |
| botui | `https://alm.pragmatismo.com.br/GeneralBots/BotUI.git` | `git@github.com:GeneralBots/botui.git` |
| botlib | `https://alm.pragmatismo.com.br/GeneralBots/botlib.git` | `git@github.com:GeneralBots/botlib.git` |
### Push Order
```bash
# 1. Push submodules first
cd botserver && git push alm main && git push origin main && cd ..
cd botui && git push alm main && git push origin main && cd ..
# 2. Update root workspace references
git add botserver botui botlib
git commit -m "Update submodules: <description>"
git push alm main && git push origin main
```
### Build Environment
- **CI runner**: `alm-ci` container (Debian Trixie, glibc 2.41)
- **Target**: `system` container (Debian 12 Bookworm, glibc 2.36)
- **⚠️ GLIBC MISMATCH**: Building on CI runner produces binaries incompatible with system container
- **Solution**: CI workflow transfers source to system container and builds there via SSH
### Workflow File
- **Location**: `botserver/.forgejo/workflows/botserver.yaml`
- **Triggers**: Push to `main` branch
- **Steps**:
1. Setup workspace on CI runner (clone repos)
2. Transfer source to system container via `tar | ssh`
3. Build inside system container (matches glibc 2.36)
4. Deploy binary inside container
5. Verify botserver is running
## Common Operations
### Check Service Status
```bash
# From host
sudo incus exec system -- systemctl status botserver --no-pager
sudo incus exec system -- systemctl status ui --no-pager
# Check if running
sudo incus exec system -- pgrep -f botserver
sudo incus exec system -- pgrep -f botui
```
### View Logs
```bash
# Systemd journal
sudo incus exec system -- journalctl -u botserver --no-pager -n 50
sudo incus exec system -- journalctl -u ui --no-pager -n 50
# Application logs
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log
sudo incus exec system -- tail -50 /opt/gbo/logs/err.log
# Live tail
sudo incus exec system -- tail -f /opt/gbo/logs/out.log
```
### Restart Services
```bash
sudo incus exec system -- systemctl restart botserver
sudo incus exec system -- systemctl restart ui
```
### Manual Deploy (emergency)
```bash
# Kill old process
sudo incus exec system -- killall botserver
# Copy binary (from host CI workspace or local)
sudo incus exec system -- cp /opt/gbo/ci/botserver/target/debug/botserver /opt/gbo/bin/botserver
sudo incus exec system -- chmod +x /opt/gbo/bin/botserver
sudo incus exec system -- chown gbuser:gbuser /opt/gbo/bin/botserver
# Start service
sudo incus exec system -- systemctl start botserver
```
### Transfer Bot Files to Production
```bash
# From local to prod host
tar czf /tmp/bots.tar.gz -C /opt/gbo/data <botname>.gbai
scp /tmp/bots.tar.gz administrator@63.141.255.9:/tmp/
# From host to container
sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'
# Clear compiled cache
sudo incus exec system -- find /opt/gbo/data -name "*.ast" -delete
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
```
### Snapshots
```bash
# List snapshots
sudo incus snapshot list system
# Restore snapshot
sudo incus snapshot restore system <snapshot-name>
```
## Troubleshooting
### GLIBC Version Mismatch
**Symptom**: `GLIBC_2.39 not found` or `GLIBC_2.38 not found`
**Cause**: Binary compiled on CI runner (glibc 2.41) but runs in system container (glibc 2.36)
**Fix**: CI workflow must build inside the system container. Check `botserver.yaml` uses SSH to build in container.
### botserver Not Starting
```bash
# Check binary
sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found"
# Check direct execution
sudo incus exec system -- timeout 10 /opt/gbo/bin/botserver 2>&1
# Check data directory
sudo incus exec system -- ls -la /opt/gbo/data/
```
### botui Can't Reach botserver
```bash
# Check BOTSERVER_URL
sudo incus exec system -- grep BOTSERVER_URL /etc/systemd/system/ui.service
# Must be http://localhost:5858, NOT https://system.pragmatismo.com.br
# Fix:
sudo incus exec system -- sed -i 's|BOTSERVER_URL=.*|BOTSERVER_URL=http://localhost:5858|' /etc/systemd/system/ui.service
sudo incus exec system -- systemctl daemon-reload
sudo incus exec system -- systemctl restart ui
```
### Suggestions Not Showing
```bash
# Check bot files exist
sudo incus exec system -- ls -la /opt/gbo/data/<bot>.gbai/<bot>.gbdialog/
# Check for compilation errors
sudo incus exec system -- tail -50 /opt/gbo/logs/out.log | grep -i "error\|fail\|compile"
# Clear cache and restart
sudo incus exec system -- find /opt/gbo/work -name "*.ast" -delete
sudo incus exec system -- systemctl restart botserver
```
### IPv6 DNS Issues
**Symptom**: External API calls (Groq, Cloudflare) timeout
**Cause**: Container DNS returns AAAA records but no IPv6 connectivity
**Fix**: Container has `IPV6=no` in network config and `gai.conf` labels. If issues persist, check `RES_OPTIONS=inet4` in botserver.service.
## Security
- **NEVER** push secrets to git
- **NEVER** commit files to root with credentials
- **Vault** is single source of truth for secrets
- **CI/CD** is the only deployment method — never manually scp binaries
- **ALM** is production — ask before pushing