Commit graph

4093 commits

Author SHA1 Message Date
55366b1f16 fix: revert to original workflow
All checks were successful
BotServer CI/CD / build (push) Successful in 29s
2026-04-02 15:31:53 -03:00
bd81fb34ec chore: revert to original workflow
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-02 15:28:33 -03:00
b4acc1f228 fix: Use musl target for static linking (glibc 2.36 compatible)
Some checks failed
BotServer CI/CD / build (push) Failing after 1m18s
2026-04-02 15:25:19 -03:00
a8a7951b19 fix: Build inside system container to match glibc 2.36
The CI runner host has glibc 2.41 but the system container has glibc 2.36.
Build on the host produces binaries that fail with GLIBC_2.38/2.39 not found.

This workflow transfers source from host to container via tar+ssh, then
builds inside the container where the binary will run, ensuring compatibility.
2026-04-02 14:33:11 -03:00
3b8acde6fb fix: Build botserver inside container to match glibc target
Some checks failed
BotServer CI/CD / build (push) Failing after 1s
The CI runner (host) has glibc 2.41 but the prod container has glibc 2.36.
Building on the host causes GLIBC_2.38/2.39 not found errors when running
inside the container.

This change makes the CI build inside the container where the binary will
actually run, ensuring glibc compatibility.
2026-04-02 13:46:12 -03:00
7b4753af0d fix: init_redis tries both no-password and password URLs for Valkey
All checks were successful
BotServer CI/CD / build (push) Successful in 27s
- Root cause: Valkey in prod runs without password but Vault stores one
- Previous code only tried password URL, got AUTH failed
- Fix: try no-password URL first, then password URL as fallback
- Also removed unused cache_url variable and cleaned up retry logic
2026-04-02 07:36:16 -03:00
dae0feb6a5 fix: SecretPaths match Vault seeding paths (gbo/cache not gbo/system/cache)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m49s
- Root cause: Vault seeding writes to secret/gbo/cache but code reads gbo/system/cache
- kv2::read prepends secret/ so it looks for secret/gbo/system/cache (wrong)
- Fix: update SecretPaths to match seeding paths (gbo/cache, gbo/drive, etc.)
- Testing: compiles clean, paths now match vault kv list output
2026-04-02 07:16:32 -03:00
f118c74cf1 fix: init_redis uses async Vault call instead of sync block_on (fixes panic)
All checks were successful
BotServer CI/CD / build (push) Successful in 5m40s
- Root cause: get_cache_config() uses runtime.block_on() which panics
  when called from within an async runtime
- Fix: call SecretsManager::get_secret() directly with .await
- Testing: compiles clean, no runtime nesting issues
2026-04-02 06:59:21 -03:00
b3edf21d21 fix: init_redis fetches cache password from Vault (fixes connection timeout)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m59s
- Root cause: init_redis() used redis://localhost:6379 without password
- Valkey requires authentication, causing connection timeouts
- Fix: use get_cache_config() from SecretsManager to build URL with password
- Falls back to env vars (CACHE_URL/REDIS_URL/VALKEY_URL) if set
2026-04-01 20:17:37 -03:00
3c9e4ba6e7 fix: cache_health_check uses ss instead of nc (nc missing in prod container)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m42s
- Root cause: prod container lacks nc (netcat), causing fallback to valkey-cli ping
- valkey-cli ping hangs indefinitely when Valkey requires password auth
- Fix: use ss -tlnp as primary check (always available), nc as fallback
- Testing: verified ss is available in prod, nc is not
2026-04-01 20:06:13 -03:00
d098961142 fix: Bootstrap checks stack/.env path in addition to ./.env
All checks were successful
BotServer CI/CD / build (push) Successful in 4m39s
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Fixes E0716: use let bindings for Path borrows
2026-04-01 19:30:08 -03:00
8fd3254334 fix: Bootstrap checks stack/.env path in addition to ./.env
Some checks failed
BotServer CI/CD / build (push) Failing after 1m25s
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Prevents full re-bootstrap on restart in production
2026-04-01 19:26:32 -03:00
318367d439 fix: Valkey health check uses nc first (avoids password hang)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m58s
- nc -z checks port connectivity instantly (no auth needed)
- valkey-cli ping as fallback (hangs when password required)
- Fixes bootstrap hang on production where Valkey has Vault password
2026-04-01 18:52:04 -03:00
c26e483cc9 fix: All services check health before starting (idempotent bootstrap)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m9s
- Tables (PostgreSQL): pg_isready health check before start
- Drive (MinIO): /minio/health/live check before start
- ALM (Forgejo): HTTP health check before start
- ALM CI (Forgejo Runner): pgrep check before start
- Valkey: health check uses absolute path to valkey-cli
- Vault, Qdrant, Zitadel: already had health checks
- Result: no duplicate starts, no hangs on restart
2026-04-01 18:28:54 -03:00
b47d928608 fix: CI system dependencies check before apt-get update
All checks were successful
BotServer CI/CD / build (push) Successful in 3m47s
- Check if packages already installed with dpkg before running apt-get
- Skip apt-get update/install if all dependencies present
- Reduces CI time from ~30s to instant on subsequent runs
2026-04-01 18:19:10 -03:00
ba7f1ba5eb fix: Valkey health check uses absolute path to valkey-cli
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Use BOTSERVER_STACK_PATH/bin/cache/bin/valkey-cli instead of relying on PATH
- Remove bash /dev/tcp fallback (unreliable in restricted environments)
- Falls back to redis-cli and nc if valkey-cli unavailable
2026-04-01 18:11:26 -03:00
68ef554132 fix: Vault as single source of truth - credentials + location for all services
All checks were successful
BotServer CI/CD / build (push) Successful in 4m53s
- Qdrant health check: recognize 'healthz check passed' response (fixes 45s timeout)
- seed_vault_defaults: add host/port/url/grpc_port for ALL 10 services
- fetch_vault_credentials: fetch ALL services via generic loop (drive, cache, tables, vectordb, directory, llm, meet, alm, encryption)
- vectordb URL: fix https://localhost:6334 -> http://localhost:6333 in all config getters
- get_from_env: add host/port/grpc_port for vectordb fallback
- Tested: .reset (fresh install) + .restart (idempotent) - zero errors
2026-04-01 16:46:16 -03:00
fb2e5242da fix: Vault seeding, service health checks, and restart idempotency
All checks were successful
BotServer CI/CD / build (push) Successful in 55m52s
- Replace hardcoded passwords with generate_random_string() for all Vault-seeded services
- Add valkey-cli, nc to SafeCommand allowlist; fix PATH in all 4 execution methods
- Fix empty Vault KV values ('none' placeholder) preventing 'Failed to parse K=V' errors
- Fix special chars in generated passwords triggering shell injection false positives
- Add ALM app.ini creation with absolute paths for Forgejo CLI
- Increase Qdrant timeout 15s→45s, ALM wait 5s→20s
- Persist file_states and kb_states to disk for .bas/KB idempotency across restarts
- Add duplicate check to use_website registration (debug log for existing)
- Remove dead code (SERVER_START_EPOCH, server_epoch)
- Add generate_random_string() to shared mod.rs, remove duplicates
2026-04-01 12:22:57 -03:00
3e46a16469 fix: Seed default credentials into Vault after initialization
Some checks failed
BotServer CI/CD / build (push) Failing after 3h13m28s
- Add seed_vault_defaults() to write default creds for all components
  (drive, cache, tables, directory, email, llm, encryption, meet, vectordb, alm)
- Call seed_vault_defaults() after KV2 enable in initialize_vault_local()
- Call seed_vault_defaults() in recover_existing_vault() for recovery path
- Rewrite fetch_vault_credentials() to use SafeCommand directly instead of
  safe_sh_command, avoiding '//' shell injection false positive on URLs
- Components like Drive now get credentials from Vault instead of 403 errors
2026-03-31 22:19:09 -03:00
9919a8321c fix: Use SafeCommand directly for vault health check to avoid shell injection false positive
All checks were successful
BotServer CI/CD / build (push) Successful in 6m46s
- Replace safe_sh_command with SafeCommand::new("curl").args() in vault_health_check()
- The URL contains https:// which triggered '//' pattern detection in shell command
- Direct SafeCommand bypasses shell parsing, URL passed as single argument
- Add vault data directory existence check before recovery attempt
- Prevents 'Dangerous pattern // detected' errors during bootstrap
2026-03-31 21:34:04 -03:00
07a6c1edb3 Merge commit '582ea634'
All checks were successful
BotServer CI/CD / build (push) Successful in 7m38s
2026-03-31 21:10:25 -03:00
582ea634e7 fix: Vault bootstrap recovery for sealed but initialized instances
- Fix vault_health_check() stub that always returned false
- Add recover_existing_vault() to handle Vault with existing data but no init.json
- Add unseal_vault() helper to unseal with existing vault-unseal-keys
- Detect initialized Vault via health endpoint or data directory presence
- Prevents bootstrap failure when reset.sh deletes init.json but Vault data persists

Root cause: vault_health_check() was a stub returning false, causing bootstrap
to always try vault operator init on already-initialized (but sealed) Vault,
which failed with connection refused. This cascaded to all services failing
to fetch credentials from Vault.
2026-03-31 20:49:29 -03:00
4ae16017ff Merge commit '644dfe2d'
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-03-31 19:57:57 -03:00
644dfe2d19 fix: Improve .gbdialog file detection for nested paths 2026-03-31 19:57:33 -03:00
2fa59057fa fix: Resolve migration error, Vault 403, cache timeout, and shell injection false positives
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Fix migration 6.2.5: Create lost_reason column before VIEW that references it
- Fix Vault 403: Enable KV2 secrets engine after initialization
- Fix cache timeout: Increase Valkey readiness wait from 12s to 30s
- Fix command_guard: Remove () from forbidden chars (safe in std::process::Command)
2026-03-31 19:55:16 -03:00
26684e2db3 fix(ci): Separate deploy steps, use killall instead of pkill, add numbered traces
All checks were successful
BotServer CI/CD / build (push) Successful in 3m24s
2026-03-31 19:09:01 -03:00
dfa78890ab fix(ci): Use tar+zstd over ssh for binary deploy
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 19:07:31 -03:00
ab620aa729 fix(ci): Separate deploy steps with timeouts
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 19:02:21 -03:00
1fdb0df634 fix(ci): Add verbose traces to rsync deploy step
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 18:58:24 -03:00
1e14bd8b3b fix(ci): Use rsync for deploy instead of incus
Some checks failed
BotServer CI/CD / build (push) Failing after 14s
2026-03-31 18:35:16 -03:00
64fea8ea4e fix(ci): Direct incus file push between containers
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 18:15:33 -03:00
dca097f63b fix(ci): Use incus file transfer for large binary deploy
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 18:14:26 -03:00
307aa3180f fix(ci): Add /root/.cargo/bin to PATH for alm-ci runner
All checks were successful
BotServer CI/CD / build (push) Successful in 20s
2026-03-31 15:32:37 -03:00
fa8701d69e fix(ci): Remove hardcoded hosts entry
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 15:28:53 -03:00
a5dca2216b fix(ci): Update alm IP to 63.141.255.9 and simplify setup step 2026-03-31 15:28:43 -03:00
4c0cfb72eb fix(ci): Simplify SSL setup step, remove fragile git config parsing 2026-03-31 15:28:11 -03:00
e3faf7711c fix(ci): Remove internet-dependent rust/sccache installation steps
Some checks failed
BotServer CI/CD / build (push) Failing after 0s
- alm-ci has Rust and tools pre-installed offline
- Remove sccache download from GitHub releases
- Remove sccache server startup and stats display
2026-03-31 15:21:35 -03:00
3da26f497c fix: Use pre-installed Rust tools instead of downloading
- Remove Install Rust step (pre-installed in /root/.cargo/bin)
- Remove Install sccache step (pre-installed)
- Use pre-installed tools path instead
2026-03-31 12:55:21 -03:00
071f66d96e revert: Remove Google DNS workaround, fixed at DNS root 2026-03-31 12:35:46 -03:00
492e3c9dc8 fix: Add Google DNS for external domains 2026-03-31 12:26:27 -03:00
bc16a00e4d fix: Update container name and remove bad git url replacements 2026-03-31 12:15:49 -03:00
885b20a626 fix: Update container name and remove bad git url replacements 2026-03-31 12:15:25 -03:00
291b7e2e65 ci: Trigger build
Some checks failed
BotServer CI/CD / build (push) Failing after 13s
2026-03-21 21:09:49 -03:00
a30aeada83 revert: Remove IP hardcode from workflow 2026-03-21 21:08:50 -03:00
4893a01375 ci: Add hostname setup step 2026-03-21 21:08:11 -03:00
43f2eb7e5c ci: Add health check after botserver deploy 2026-03-21 20:47:13 -03:00
b83b4ffc4d fix: Remove server_epoch() from start_bas_executed Redis key
The epoch caused a new key to be created every second, bypassing
the 'already executed' check and running start.bas multiple times,
resulting in triplicated suggestions.
2026-03-21 20:40:25 -03:00
059ecb7187 feat(salesianos): add gbdialog tools with AST and tool.json
- inscricao, consultar_inscricao, agendamento_visita, informacoes_curso
- documentos_necessarios, contato_secretaria, calendario_letivo
- start.bas with USE_KB carta, proc
2026-03-21 19:21:46 -03:00
1132983064 feat(kb): add with_bot_config to load embedding from bot config
- Adds KnowledgeBaseManager::with_default_config() as alias to new()
- Adds KnowledgeBaseManager::with_bot_config() to load embedding_url,
  embedding_model, and qdrant config from bot's config.csv
- Updates bootstrap to use with_bot_config with default_bot_id
- Enables per-bot embedding configuration instead of global env vars
2026-03-21 18:55:36 -03:00
9641842e8d fix(migration): add WHERE org_id IS NOT NULL to SLA policy inserts to prevent crash on bots with no org 2026-03-21 17:00:46 -03:00