Root cause: AuthConfig::from_env() was creating a new tokio runtime
with Runtime::new() inside an existing runtime during initialization.
Impact: Botserver crashed with "Cannot start a runtime from within a
runtime" panic right after CORS layer initialization.
Fix: Use new_current_thread() + std:🧵:spawn pattern (same as
get_database_url_sync fix) to create an isolated thread for async operations.
Files: src/security/auth_api/config.rs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root cause: block_in_place + new_current_thread().block_on() panics when
called from within tokio runtime (including spawn_blocking). Tokio doesn't
allow nested block_on() calls.
Fix: Replace ALL block_in_place patterns with std:🧵:spawn + mpsc channel.
This creates a completely separate OS thread with its own runtime, avoiding
any nesting issues. Works from any context: async, spawn_blocking, or sync.
Files: 14 files across secrets, utils, state, calendar, analytics, email,
and all keyword handlers (universal_messaging, search, book, create_draft,
create_site, hearing/syntax, use_tool, find, admin_email, goals)
Root cause: new_current_thread().block_on() panics when called from within
an existing tokio runtime (including from spawn_blocking). Tokio doesn't
allow nested block_on() calls.
Fix: Use std:🧵:spawn to create a completely separate OS thread
with its own runtime, communicating via mpsc channel. This works from
any context: async, spawn_blocking, or sync.
Root cause: block_in_place + new_current_thread().block_on() panics when
called from within tokio::task::spawn_blocking because block_in_place is
designed for async worker threads, not blocking threads.
Fix: Remove all block_in_place wrappers and use new_current_thread().build().block_on()
directly. This works from both async contexts and spawn_blocking contexts.
Affected: utils.rs (get_database_url_sync, get_work_path)
- Add SSH_KEY variable with -i flag for gbuser identity
- Fix all ssh commands in deploy and verify steps
- Job 902 proved build works with sccache (106s)
- Deploy was failing because gbuser had no SSH key auth to system container
- Change runner service from root to gbuser
- Add sccache for build caching (RUSTC_WRAPPER=sccache)
- Rename /opt/gbo/ci to /opt/gbo/data for consistency
- Persist gb-ws clone instead of re-cloning every build
- Add sccache --show-stats to build output for monitoring
Root cause: Handle::current().block_on() panics inside multi_thread runtime
with 'Cannot start a runtime from within a runtime' error.
Fix: All sync-to-async bridges now use tokio::runtime::Builder::new_current_thread()
instead of Handle::current().block_on(). Also changed SECRETS_MANAGER from
tokio::sync::RwLock to std::sync::RwLock to eliminate unnecessary async overhead.
Files: 14 files across keywords, secrets, utils, state, calendar, analytics, email
Impact: Fixes production crash during bot loading phase
- main.rs: Skip init.json check when VAULT_ADDR points to remote server
- This allows botserver to read database credentials from Vault in production
- Without this fix, database URL falls back to localhost and connection fails
- install_all() returns early if VAULT_ADDR is remote
- start_all() returns early if VAULT_ADDR is remote
- bootstrap.rs treats remote VAULT_ADDR as bootstrap_completed=true
- Prevents botserver from trying to install/start local services
when all services are running in separate containers
- Remove all std::env::var calls except VAULT_* and PORT
- get_from_env returns hardcoded defaults only (no env var reading)
- Auth config, rate limits, email, analytics, calendar all use Vault
- WORK_PATH replaced with get_work_path() helper reading from Vault
- .env on production cleaned to only VAULT_ADDR, VAULT_TOKEN, VAULT_CACERT, PORT
- All service IPs/credentials stored in Vault secret/gbo/*
- Root cause: Valkey in prod runs without password but Vault stores one
- Previous code only tried password URL, got AUTH failed
- Fix: try no-password URL first, then password URL as fallback
- Also removed unused cache_url variable and cleaned up retry logic
- Root cause: Vault seeding writes to secret/gbo/cache but code reads gbo/system/cache
- kv2::read prepends secret/ so it looks for secret/gbo/system/cache (wrong)
- Fix: update SecretPaths to match seeding paths (gbo/cache, gbo/drive, etc.)
- Testing: compiles clean, paths now match vault kv list output
- Root cause: get_cache_config() uses runtime.block_on() which panics
when called from within an async runtime
- Fix: call SecretsManager::get_secret() directly with .await
- Testing: compiles clean, no runtime nesting issues
- Root cause: init_redis() used redis://localhost:6379 without password
- Valkey requires authentication, causing connection timeouts
- Fix: use get_cache_config() from SecretsManager to build URL with password
- Falls back to env vars (CACHE_URL/REDIS_URL/VALKEY_URL) if set
- Root cause: prod container lacks nc (netcat), causing fallback to valkey-cli ping
- valkey-cli ping hangs indefinitely when Valkey requires password auth
- Fix: use ss -tlnp as primary check (always available), nc as fallback
- Testing: verified ss is available in prod, nc is not
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Fixes E0716: use let bindings for Path borrows
- Production has .env in botserver-stack/.env not ./.env
- Checks both locations to detect completed bootstrap
- Prevents full re-bootstrap on restart in production
- nc -z checks port connectivity instantly (no auth needed)
- valkey-cli ping as fallback (hangs when password required)
- Fixes bootstrap hang on production where Valkey has Vault password
- Tables (PostgreSQL): pg_isready health check before start
- Drive (MinIO): /minio/health/live check before start
- ALM (Forgejo): HTTP health check before start
- ALM CI (Forgejo Runner): pgrep check before start
- Valkey: health check uses absolute path to valkey-cli
- Vault, Qdrant, Zitadel: already had health checks
- Result: no duplicate starts, no hangs on restart
- Check if packages already installed with dpkg before running apt-get
- Skip apt-get update/install if all dependencies present
- Reduces CI time from ~30s to instant on subsequent runs
- Use BOTSERVER_STACK_PATH/bin/cache/bin/valkey-cli instead of relying on PATH
- Remove bash /dev/tcp fallback (unreliable in restricted environments)
- Falls back to redis-cli and nc if valkey-cli unavailable