Update PROD.md: fix wrong glibc version info and caching strategy

This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2026-04-17 19:45:27 -03:00
parent ce4c676501
commit 28445fdb7e
4 changed files with 290 additions and 4 deletions

288
PROD.md
View file

@ -99,6 +99,103 @@ curl -X DELETE "http://<directory-ip>:8080/v2/users/<user-id>" \
| List users | `POST /v2/users` |
| Update password | `POST /v2/users/{id}/password` |
### Zitadel API v2 Usage with PAT
**Important:** Zitadel API v2 requires a valid Personal Access Token (PAT) for authentication. The PAT must have the appropriate scopes for the operations you want to perform.
**Using PAT with curl:**
```bash
# Set your PAT as an environment variable
PAT="<your-pat-token>"
# Include the required headers in all API calls
curl -X POST "http://<directory-ip>:8080/v2/organizations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAT" \
-H "Host: <directory-ip>" \
-d '{
"name": "pragmatismo"
}'
```
**Critical Headers:**
- `Authorization: Bearer $PAT` - Your PAT token
- `Host: <directory-ip>` - Required for gRPC-gateway routing
- `Content-Type: application/json` - For POST/PUT/PATCH requests
**Common API v2 Endpoints:**
Create Organization:
```bash
curl -X POST "http://10.157.134.240:8080/v2/organizations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAT" \
-H "Host: 10.157.134.240" \
-d '{
"name": "organization-name"
}'
```
List Organizations (requires body with query):
```bash
curl -X POST "http://10.157.134.240:8080/v2/organizations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAT" \
-H "Host: 10.157.134.240" \
-d '{
"query": {
"offset": 0,
"limit": 100
}
}'
```
Create Human User:
```bash
curl -X POST "http://10.157.134.240:8080/v2/users/human" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAT" \
-H "Host: 10.157.134.240" \
-d '{
"username": "johndoe",
"profile": {
"givenName": "John",
"familyName": "Doe"
},
"email": {
"email": "john@example.com",
"isVerified": true
},
"password": {
"password": "SecurePass123!",
"changeRequired": false
}
}'
```
**Testing PAT Validity:**
```bash
# Test if PAT is valid by calling users endpoint
curl -X POST "http://10.157.134.240:8080/v2/users" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAT" \
-H "Host: 10.157.134.240" \
-d '{"query": {"offset": 0, "limit": 1}}'
# If you get {"code":16,"message":"Errors.Token.Invalid (AUTH-7fs1e)"}, the PAT is invalid
```
**Generating a New PAT via Web Console:**
1. Access: `http://<directory-ip>:9000/ui/console`
2. Login with admin credentials
3. Navigate to your profile (top right corner)
4. Go to "Personal Access Tokens"
5. Click "Create"
6. Name the token and select expiration
7. Copy the token (you won't see it again!)
8. Update `/opt/gbo/conf/directory/admin-pat.txt` with the new token
### Production Credentials
**Admin Account:**
@ -114,6 +211,195 @@ curl -X DELETE "http://<directory-ip>:8080/v2/users/<user-id>" \
---
### Zitadel Setup & Initialization
**Database Configuration:**
Zitadel connects to PostgreSQL with these credentials (set in `directory.service`):
- Database: `PROD-DIRECTORY`
- Host: `10.157.134.174` (tables container)
- Port: `5432`
- User: `postgres`
- Password: `67a690df` (from Vault: `secret/gbo/tables`)
**Current Production Settings:**
- Container IP: `10.157.134.240`
- Internal port: `8080`
- External port: `9000`
- Masterkey: `MasterkeyNeedsToHave32Characters` (CHANGE THIS IN PRODUCTION!)
- TLS mode: `disabled`
- External domain: `10.157.134.240`
**Initialization File:**
Location: `/opt/gbo/conf/directory/zitadel-init-steps.yaml`
```yaml
FirstInstance:
InstanceName: "BotServer"
DefaultLanguage: "en"
PatPath: "/opt/gbo/conf/directory/admin-pat.txt"
Org:
Name: "BotServer"
Machine:
Machine:
Username: "admin-sa"
Name: "Admin Service Account"
Pat:
ExpirationDate: "2099-01-01T00:00:00Z"
Human:
UserName: "admin"
FirstName: "Admin"
LastName: "User"
Email:
Address: "admin@localhost"
Verified: true
Password: "Admin123!"
PasswordChangeRequired: false
```
**To Reinitialize Zitadel (if database is empty or corrupted):**
```bash
# 1. Stop the service
sudo incus exec directory -- systemctl stop directory
# 2. Drop and recreate the database
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";"
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";"
# 3. Run initialization
sudo incus exec directory -- bash -c '
export ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174
export ZITADEL_DATABASE_POSTGRES_PORT=5432
export ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY
export ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres
export ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df
export ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable
/opt/gbo/bin/zitadel setup init \
--config /opt/gbo/conf/directory/zitadel-init-steps.yaml \
--masterkey MasterkeyNeedsToHave32Characters \
--tlsMode disabled
'
# 4. Start the service
sudo incus exec directory -- systemctl start directory
# 5. Verify health
curl -sf http://10.157.134.240:8080/debug/healthz
```
**Zitadel Database Schema:**
The database uses multiple schemas:
- `system` - System tables and configuration
- `projections` - Read-only projection tables (orgs, users, sessions, etc.)
- `eventstore` - Event sourcing tables
- `adminapi`, `auth`, `logstore`, `cache`, `queue` - Specialized schemas
To query organizations:
```bash
sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \
"SELECT id, name FROM projections.orgs1;"
```
---
### Zitadel Troubleshooting
**Database Connection Errors:**
If logs show `failed SASL auth: FATAL: password authentication failed for user "postgres"`:
```bash
# Check systemd unit has correct credentials
sudo incus exec directory -- cat /etc/systemd/system/directory.service
# Verify Vault has the correct credentials
TOKEN="${VAULT_TOKEN}"
sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \
-H "X-Vault-Token: $TOKEN" \
https://10.157.134.250:8200/v1/secret/data/gbo/tables
# If credentials changed, update systemd unit and restart
sudo incus exec directory -- systemctl daemon-reload
sudo incus exec directory -- systemctl restart directory
```
**Empty Database (No Organizations):**
If the database was initialized but tables are missing:
```bash
# Check if tables exist
sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \
"SELECT tablename FROM pg_tables WHERE schemaname = 'projections' LIMIT 5;"
# If no tables, reinitialize using the steps above
```
**PAT Token Invalid:**
If API calls return `Errors.Token.Invalid (AUTH-7fs1e)`:
```bash
# Check if PAT file exists
sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt
# If missing or expired, regenerate via console or API:
# 1. Login to console: http://<directory-ip>:9000/ui/console
# 2. Go to Profile → Personal Access Tokens → Create
# 3. Save the new token to admin-pat.txt
```
**Health Check Fails:**
```bash
# Check service status
sudo incus exec directory -- systemctl status directory
# Check logs
sudo incus exec directory -- tail -50 /opt/gbo/logs/stderr.log
sudo incus exec directory -- tail -50 /opt/gbo/logs/stdout.log
# Verify database connectivity
sudo incus exec directory -- pg_isready -h 10.157.134.174 -p 5432 -U postgres
```
**Migration Errors:**
If migrations fail or database is in bad state:
```bash
# Stop service
sudo incus exec directory -- systemctl stop directory
# Drop and recreate database
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";"
sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";"
# Reinitialize (see initialization steps above)
```
**Systemd Unit Configuration:**
The `directory.service` unit contains all environment variables:
```ini
[Unit]
Description=Directory (Zitadel)
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/opt/gbo
Environment=ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174
Environment=ZITADEL_DATABASE_POSTGRES_PORT=5432
Environment=ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY
Environment=ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres
Environment=ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df
Environment=ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable
Environment=ZITADEL_EXTERNALSECURE=false
Environment=ZITADEL_EXTERNALDOMAIN=10.157.134.240
Environment=ZITADEL_EXTERNALPORT=9000
Environment=ZITADEL_TLS_ENABLED=false
ExecStart=/opt/gbo/bin/zitadel start --masterkey MasterkeyNeedsToHave32Characters --tlsMode disabled --externalDomain 10.157.134.240 --externalPort 9000
Restart=always
RestartSec=5
StandardOutput=append:/opt/gbo/logs/stdout.log
StandardError=append:/opt/gbo/logs/stderr.log
[Install]
WantedBy=multi-user.target
```
---
## Common Operations
**Check status:** `sudo incus exec system -- systemctl status botserver --no-pager` (same for `ui`). To check process existence: `sudo incus exec system -- pgrep -f botserver`.
@ -134,7 +420,7 @@ curl -X DELETE "http://<directory-ip>:8080/v2/users/<user-id>" \
Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos are `gb` (main workspace), `botserver`, `botui`, and `botlib`. Always push submodules first (`cd botserver && git push alm main && git push origin main`), then update submodule references in the root repo and push that too.
The CI runner container (`alm-ci`) runs Debian Trixie with glibc 2.41, but the `system` container runs Debian 12 Bookworm with glibc 2.36. Binaries compiled on the CI runner are incompatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) solves this by transferring source to the system container via `tar | ssh` and building there. The workflow triggers on pushes to `main`, clones repos, transfers source, builds inside system container, deploys the binary, and verifies botserver is running.
The CI runner container (`alm-ci`) runs Debian 12 Bookworm with glibc 2.36, same as the `system` container. Binaries compiled on the CI runner are compatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) builds in alm-ci (which has Rust toolchain) and deploys binary to system container. The workflow triggers on pushes to `main`, clones repos, builds in alm-ci, transfers binary via scp, and verifies botserver is running.
### ALM/CI Debugging & Monitoring

@ -1 +1 @@
Subproject commit 82a236f369e58fe0eda4df704b9ee74a725874e8
Subproject commit b50aa7d13be57109ae1a50fdf76b627e1a2d2163

@ -1 +1 @@
Subproject commit e8e1b7d65b8d1c37e2ea73cdcfec9a0e1f040027
Subproject commit c6d2f19458a7a07f827dbac3278ee568e3f1ec41

View file

@ -3,7 +3,7 @@
> **Pattern:** Every suite app exposes its own `PROMPT.md` + internal tools.
> The shared chat bar activates app-specific context when the user is inside that app.
> WhatsApp campaigns is the first full example.
a common chat window stay fixed right like pane colapsable, except for chat... all other ui must be controled by chat, via api/ pompt common mechanismo.
---
## Architecture