Commit graph

4148 commits

Author SHA1 Message Date
f6a864aa67 fix: replace nested runtime block_on with new_current_thread to prevent panic
All checks were successful
BotServer CI/CD / build (push) Successful in 5m32s
Root cause: Handle::current().block_on() panics inside multi_thread runtime
with 'Cannot start a runtime from within a runtime' error.

Fix: All sync-to-async bridges now use tokio::runtime::Builder::new_current_thread()
instead of Handle::current().block_on(). Also changed SECRETS_MANAGER from
tokio::sync::RwLock to std::sync::RwLock to eliminate unnecessary async overhead.

Files: 14 files across keywords, secrets, utils, state, calendar, analytics, email
Impact: Fixes production crash during bot loading phase
2026-04-03 09:17:23 -03:00
eece6831b4 Fix: initialize secrets manager when remote Vault detected, even without init.json
All checks were successful
BotServer CI/CD / build (push) Successful in 5m9s
- main.rs: Skip init.json check when VAULT_ADDR points to remote server
- This allows botserver to read database credentials from Vault in production
- Without this fix, database URL falls back to localhost and connection fails
2026-04-03 08:22:06 -03:00
65e7db5acd Skip local service install/start when remote Vault detected
All checks were successful
BotServer CI/CD / build (push) Successful in 5m48s
- install_all() returns early if VAULT_ADDR is remote
- start_all() returns early if VAULT_ADDR is remote
- bootstrap.rs treats remote VAULT_ADDR as bootstrap_completed=true
- Prevents botserver from trying to install/start local services
  when all services are running in separate containers
2026-04-03 07:36:15 -03:00
e992ed3b39 Enforce Vault-only secrets: remove env var fallbacks, all secrets from Vault
Some checks are pending
BotServer CI/CD / build (push) Waiting to run
- Remove all std::env::var calls except VAULT_* and PORT
- get_from_env returns hardcoded defaults only (no env var reading)
- Auth config, rate limits, email, analytics, calendar all use Vault
- WORK_PATH replaced with get_work_path() helper reading from Vault
- .env on production cleaned to only VAULT_ADDR, VAULT_TOKEN, VAULT_CACERT, PORT
- All service IPs/credentials stored in Vault secret/gbo/*
2026-04-03 07:11:40 -03:00
5d88013ee3 Fix get_from_env: read actual env vars instead of hardcoded localhost values
All checks were successful
BotServer CI/CD / build (push) Successful in 4m3s
2026-04-02 21:17:19 -03:00
98b204b12e Fix health checks: replace nc with ss -tln for non-root environments
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-02 18:15:07 -03:00
11c161fc1d Update botserver
All checks were successful
BotServer CI/CD / build (push) Successful in 27s
2026-04-02 17:03:12 -03:00
521b9b7da4 Update forgejo workflow
Some checks failed
BotServer CI/CD / build (push) Failing after 19s
2026-04-02 16:15:01 -03:00
00e5a3a5ff ci: add Step 7 to restart botserver service after deploy
Some checks failed
BotServer CI/CD / build (push) Failing after 17s
2026-04-02 16:01:15 -03:00
c041fe9cd3 ci: trigger build
All checks were successful
BotServer CI/CD / build (push) Successful in 28s
2026-04-02 15:48:36 -03:00
7b4753af0d fix: init_redis tries both no-password and password URLs for Valkey
All checks were successful
BotServer CI/CD / build (push) Successful in 27s
- Root cause: Valkey in prod runs without password but Vault stores one
- Previous code only tried password URL, got AUTH failed
- Fix: try no-password URL first, then password URL as fallback
- Also removed unused cache_url variable and cleaned up retry logic
2026-04-02 07:36:16 -03:00
dae0feb6a5 fix: SecretPaths match Vault seeding paths (gbo/cache not gbo/system/cache)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m49s
- Root cause: Vault seeding writes to secret/gbo/cache but code reads gbo/system/cache
- kv2::read prepends secret/ so it looks for secret/gbo/system/cache (wrong)
- Fix: update SecretPaths to match seeding paths (gbo/cache, gbo/drive, etc.)
- Testing: compiles clean, paths now match vault kv list output
2026-04-02 07:16:32 -03:00
f118c74cf1 fix: init_redis uses async Vault call instead of sync block_on (fixes panic)
All checks were successful
BotServer CI/CD / build (push) Successful in 5m40s
- Root cause: get_cache_config() uses runtime.block_on() which panics
  when called from within an async runtime
- Fix: call SecretsManager::get_secret() directly with .await
- Testing: compiles clean, no runtime nesting issues
2026-04-02 06:59:21 -03:00
b3edf21d21 fix: init_redis fetches cache password from Vault (fixes connection timeout)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m59s
- Root cause: init_redis() used redis://localhost:6379 without password
- Valkey requires authentication, causing connection timeouts
- Fix: use get_cache_config() from SecretsManager to build URL with password
- Falls back to env vars (CACHE_URL/REDIS_URL/VALKEY_URL) if set
2026-04-01 20:17:37 -03:00
3c9e4ba6e7 fix: cache_health_check uses ss instead of nc (nc missing in prod container)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m42s
- Root cause: prod container lacks nc (netcat), causing fallback to valkey-cli ping
- valkey-cli ping hangs indefinitely when Valkey requires password auth
- Fix: use ss -tlnp as primary check (always available), nc as fallback
- Testing: verified ss is available in prod, nc is not
2026-04-01 20:06:13 -03:00
d098961142 fix: Bootstrap checks stack/.env path in addition to ./.env
All checks were successful
BotServer CI/CD / build (push) Successful in 4m39s
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Fixes E0716: use let bindings for Path borrows
2026-04-01 19:30:08 -03:00
8fd3254334 fix: Bootstrap checks stack/.env path in addition to ./.env
Some checks failed
BotServer CI/CD / build (push) Failing after 1m25s
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Prevents full re-bootstrap on restart in production
2026-04-01 19:26:32 -03:00
318367d439 fix: Valkey health check uses nc first (avoids password hang)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m58s
- nc -z checks port connectivity instantly (no auth needed)
- valkey-cli ping as fallback (hangs when password required)
- Fixes bootstrap hang on production where Valkey has Vault password
2026-04-01 18:52:04 -03:00
c26e483cc9 fix: All services check health before starting (idempotent bootstrap)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m9s
- Tables (PostgreSQL): pg_isready health check before start
- Drive (MinIO): /minio/health/live check before start
- ALM (Forgejo): HTTP health check before start
- ALM CI (Forgejo Runner): pgrep check before start
- Valkey: health check uses absolute path to valkey-cli
- Vault, Qdrant, Zitadel: already had health checks
- Result: no duplicate starts, no hangs on restart
2026-04-01 18:28:54 -03:00
b47d928608 fix: CI system dependencies check before apt-get update
All checks were successful
BotServer CI/CD / build (push) Successful in 3m47s
- Check if packages already installed with dpkg before running apt-get
- Skip apt-get update/install if all dependencies present
- Reduces CI time from ~30s to instant on subsequent runs
2026-04-01 18:19:10 -03:00
ba7f1ba5eb fix: Valkey health check uses absolute path to valkey-cli
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Use BOTSERVER_STACK_PATH/bin/cache/bin/valkey-cli instead of relying on PATH
- Remove bash /dev/tcp fallback (unreliable in restricted environments)
- Falls back to redis-cli and nc if valkey-cli unavailable
2026-04-01 18:11:26 -03:00
68ef554132 fix: Vault as single source of truth - credentials + location for all services
All checks were successful
BotServer CI/CD / build (push) Successful in 4m53s
- Qdrant health check: recognize 'healthz check passed' response (fixes 45s timeout)
- seed_vault_defaults: add host/port/url/grpc_port for ALL 10 services
- fetch_vault_credentials: fetch ALL services via generic loop (drive, cache, tables, vectordb, directory, llm, meet, alm, encryption)
- vectordb URL: fix https://localhost:6334 -> http://localhost:6333 in all config getters
- get_from_env: add host/port/grpc_port for vectordb fallback
- Tested: .reset (fresh install) + .restart (idempotent) - zero errors
2026-04-01 16:46:16 -03:00
fb2e5242da fix: Vault seeding, service health checks, and restart idempotency
All checks were successful
BotServer CI/CD / build (push) Successful in 55m52s
- Replace hardcoded passwords with generate_random_string() for all Vault-seeded services
- Add valkey-cli, nc to SafeCommand allowlist; fix PATH in all 4 execution methods
- Fix empty Vault KV values ('none' placeholder) preventing 'Failed to parse K=V' errors
- Fix special chars in generated passwords triggering shell injection false positives
- Add ALM app.ini creation with absolute paths for Forgejo CLI
- Increase Qdrant timeout 15s→45s, ALM wait 5s→20s
- Persist file_states and kb_states to disk for .bas/KB idempotency across restarts
- Add duplicate check to use_website registration (debug log for existing)
- Remove dead code (SERVER_START_EPOCH, server_epoch)
- Add generate_random_string() to shared mod.rs, remove duplicates
2026-04-01 12:22:57 -03:00
3e46a16469 fix: Seed default credentials into Vault after initialization
Some checks failed
BotServer CI/CD / build (push) Failing after 3h13m28s
- Add seed_vault_defaults() to write default creds for all components
  (drive, cache, tables, directory, email, llm, encryption, meet, vectordb, alm)
- Call seed_vault_defaults() after KV2 enable in initialize_vault_local()
- Call seed_vault_defaults() in recover_existing_vault() for recovery path
- Rewrite fetch_vault_credentials() to use SafeCommand directly instead of
  safe_sh_command, avoiding '//' shell injection false positive on URLs
- Components like Drive now get credentials from Vault instead of 403 errors
2026-03-31 22:19:09 -03:00
9919a8321c fix: Use SafeCommand directly for vault health check to avoid shell injection false positive
All checks were successful
BotServer CI/CD / build (push) Successful in 6m46s
- Replace safe_sh_command with SafeCommand::new("curl").args() in vault_health_check()
- The URL contains https:// which triggered '//' pattern detection in shell command
- Direct SafeCommand bypasses shell parsing, URL passed as single argument
- Add vault data directory existence check before recovery attempt
- Prevents 'Dangerous pattern // detected' errors during bootstrap
2026-03-31 21:34:04 -03:00
07a6c1edb3 Merge commit '582ea634'
All checks were successful
BotServer CI/CD / build (push) Successful in 7m38s
2026-03-31 21:10:25 -03:00
582ea634e7 fix: Vault bootstrap recovery for sealed but initialized instances
- Fix vault_health_check() stub that always returned false
- Add recover_existing_vault() to handle Vault with existing data but no init.json
- Add unseal_vault() helper to unseal with existing vault-unseal-keys
- Detect initialized Vault via health endpoint or data directory presence
- Prevents bootstrap failure when reset.sh deletes init.json but Vault data persists

Root cause: vault_health_check() was a stub returning false, causing bootstrap
to always try vault operator init on already-initialized (but sealed) Vault,
which failed with connection refused. This cascaded to all services failing
to fetch credentials from Vault.
2026-03-31 20:49:29 -03:00
4ae16017ff Merge commit '644dfe2d'
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-03-31 19:57:57 -03:00
644dfe2d19 fix: Improve .gbdialog file detection for nested paths 2026-03-31 19:57:33 -03:00
2fa59057fa fix: Resolve migration error, Vault 403, cache timeout, and shell injection false positives
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Fix migration 6.2.5: Create lost_reason column before VIEW that references it
- Fix Vault 403: Enable KV2 secrets engine after initialization
- Fix cache timeout: Increase Valkey readiness wait from 12s to 30s
- Fix command_guard: Remove () from forbidden chars (safe in std::process::Command)
2026-03-31 19:55:16 -03:00
26684e2db3 fix(ci): Separate deploy steps, use killall instead of pkill, add numbered traces
All checks were successful
BotServer CI/CD / build (push) Successful in 3m24s
2026-03-31 19:09:01 -03:00
dfa78890ab fix(ci): Use tar+zstd over ssh for binary deploy
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 19:07:31 -03:00
ab620aa729 fix(ci): Separate deploy steps with timeouts
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 19:02:21 -03:00
1fdb0df634 fix(ci): Add verbose traces to rsync deploy step
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 18:58:24 -03:00
1e14bd8b3b fix(ci): Use rsync for deploy instead of incus
Some checks failed
BotServer CI/CD / build (push) Failing after 14s
2026-03-31 18:35:16 -03:00
64fea8ea4e fix(ci): Direct incus file push between containers
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 18:15:33 -03:00
dca097f63b fix(ci): Use incus file transfer for large binary deploy
Some checks failed
BotServer CI/CD / build (push) Failing after 8s
2026-03-31 18:14:26 -03:00
307aa3180f fix(ci): Add /root/.cargo/bin to PATH for alm-ci runner
All checks were successful
BotServer CI/CD / build (push) Successful in 20s
2026-03-31 15:32:37 -03:00
fa8701d69e fix(ci): Remove hardcoded hosts entry
Some checks failed
BotServer CI/CD / build (push) Failing after 7s
2026-03-31 15:28:53 -03:00
a5dca2216b fix(ci): Update alm IP to 63.141.255.9 and simplify setup step 2026-03-31 15:28:43 -03:00
4c0cfb72eb fix(ci): Simplify SSL setup step, remove fragile git config parsing 2026-03-31 15:28:11 -03:00
e3faf7711c fix(ci): Remove internet-dependent rust/sccache installation steps
Some checks failed
BotServer CI/CD / build (push) Failing after 0s
- alm-ci has Rust and tools pre-installed offline
- Remove sccache download from GitHub releases
- Remove sccache server startup and stats display
2026-03-31 15:21:35 -03:00
3da26f497c fix: Use pre-installed Rust tools instead of downloading
- Remove Install Rust step (pre-installed in /root/.cargo/bin)
- Remove Install sccache step (pre-installed)
- Use pre-installed tools path instead
2026-03-31 12:55:21 -03:00
071f66d96e revert: Remove Google DNS workaround, fixed at DNS root 2026-03-31 12:35:46 -03:00
492e3c9dc8 fix: Add Google DNS for external domains 2026-03-31 12:26:27 -03:00
bc16a00e4d fix: Update container name and remove bad git url replacements 2026-03-31 12:15:49 -03:00
885b20a626 fix: Update container name and remove bad git url replacements 2026-03-31 12:15:25 -03:00
291b7e2e65 ci: Trigger build
Some checks failed
BotServer CI/CD / build (push) Failing after 13s
2026-03-21 21:09:49 -03:00
a30aeada83 revert: Remove IP hardcode from workflow 2026-03-21 21:08:50 -03:00
4893a01375 ci: Add hostname setup step 2026-03-21 21:08:11 -03:00