Commit graph

22 commits

Author SHA1 Message Date
552d58376f fix: Fix compilation errors from path refactoring
Some checks failed
BotServer CI/CD / build (push) Failing after 1m27s
- bootstrap_utils.rs: Change Vec<(&'static str,...)> to Vec<(String,...)> to avoid dangling references
- bootstrap_manager.rs: Use name.as_str() for safe_pkill
- setup.rs: Use PathBuf instead of Path::new with format!
- directory/bootstrap.rs: Use PathBuf for pat_dir
- main.rs: Use PathBuf for vault_init_path_early
2026-04-04 10:04:00 -03:00
7d8f141fc2 refactor: Replace all hardcoded ./botserver-stack paths with get_stack_path()/get_work_path()
Some checks failed
BotServer CI/CD / build (push) Failing after 1m28s
- Adds get_stack_path() helper: returns /opt/gbo in production (.env without botserver-stack), ./botserver-stack in dev
- Adds get_work_path() helper: returns /opt/gbo/work in production, ./botserver-stack/data/system/work in dev
- Updated 35+ files to use dynamic path resolution
- Production system container no longer needs botserver-stack directory
- Work files go to /opt/gbo/work instead of /opt/gbo/bin/botserver-stack
2026-04-04 09:24:44 -03:00
6f183c63d2 feat: dual-mode service configs - Vault first, fallback to DB/localhost
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
All services now try Vault first (remote/distributed mode), then fall back
to database config, then localhost defaults (local/dev mode).

Services fixed:
- Qdrant/VectorDB: kb_indexer.rs, kb_statistics.rs, bootstrap_utils.rs, kb_context.rs
- LLM/Embedding: email/vectordb.rs (was hardcoded localhost:8082)
- All services: security/integration.rs (postgres, cache, drive, directory, qdrant, llm)

Pattern: SecretsManager::get_X_config_sync() → DB config → localhost default
2026-04-03 15:01:37 -03:00
98b204b12e Fix health checks: replace nc with ss -tln for non-root environments
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-02 18:15:07 -03:00
3c9e4ba6e7 fix: cache_health_check uses ss instead of nc (nc missing in prod container)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m42s
- Root cause: prod container lacks nc (netcat), causing fallback to valkey-cli ping
- valkey-cli ping hangs indefinitely when Valkey requires password auth
- Fix: use ss -tlnp as primary check (always available), nc as fallback
- Testing: verified ss is available in prod, nc is not
2026-04-01 20:06:13 -03:00
318367d439 fix: Valkey health check uses nc first (avoids password hang)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m58s
- nc -z checks port connectivity instantly (no auth needed)
- valkey-cli ping as fallback (hangs when password required)
- Fixes bootstrap hang on production where Valkey has Vault password
2026-04-01 18:52:04 -03:00
c26e483cc9 fix: All services check health before starting (idempotent bootstrap)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m9s
- Tables (PostgreSQL): pg_isready health check before start
- Drive (MinIO): /minio/health/live check before start
- ALM (Forgejo): HTTP health check before start
- ALM CI (Forgejo Runner): pgrep check before start
- Valkey: health check uses absolute path to valkey-cli
- Vault, Qdrant, Zitadel: already had health checks
- Result: no duplicate starts, no hangs on restart
2026-04-01 18:28:54 -03:00
ba7f1ba5eb fix: Valkey health check uses absolute path to valkey-cli
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Use BOTSERVER_STACK_PATH/bin/cache/bin/valkey-cli instead of relying on PATH
- Remove bash /dev/tcp fallback (unreliable in restricted environments)
- Falls back to redis-cli and nc if valkey-cli unavailable
2026-04-01 18:11:26 -03:00
68ef554132 fix: Vault as single source of truth - credentials + location for all services
All checks were successful
BotServer CI/CD / build (push) Successful in 4m53s
- Qdrant health check: recognize 'healthz check passed' response (fixes 45s timeout)
- seed_vault_defaults: add host/port/url/grpc_port for ALL 10 services
- fetch_vault_credentials: fetch ALL services via generic loop (drive, cache, tables, vectordb, directory, llm, meet, alm, encryption)
- vectordb URL: fix https://localhost:6334 -> http://localhost:6333 in all config getters
- get_from_env: add host/port/grpc_port for vectordb fallback
- Tested: .reset (fresh install) + .restart (idempotent) - zero errors
2026-04-01 16:46:16 -03:00
9919a8321c fix: Use SafeCommand directly for vault health check to avoid shell injection false positive
All checks were successful
BotServer CI/CD / build (push) Successful in 6m46s
- Replace safe_sh_command with SafeCommand::new("curl").args() in vault_health_check()
- The URL contains https:// which triggered '//' pattern detection in shell command
- Direct SafeCommand bypasses shell parsing, URL passed as single argument
- Add vault data directory existence check before recovery attempt
- Prevents 'Dangerous pattern // detected' errors during bootstrap
2026-03-31 21:34:04 -03:00
582ea634e7 fix: Vault bootstrap recovery for sealed but initialized instances
- Fix vault_health_check() stub that always returned false
- Add recover_existing_vault() to handle Vault with existing data but no init.json
- Add unseal_vault() helper to unseal with existing vault-unseal-keys
- Detect initialized Vault via health endpoint or data directory presence
- Prevents bootstrap failure when reset.sh deletes init.json but Vault data persists

Root cause: vault_health_check() was a stub returning false, causing bootstrap
to always try vault operator init on already-initialized (but sealed) Vault,
which failed with connection refused. This cascaded to all services failing
to fetch credentials from Vault.
2026-03-31 20:49:29 -03:00
ef426b7a50 LXD proxy and container improvements
Some checks failed
BotServer CI / build (push) Failing after 7m5s
2026-03-15 15:50:02 -03:00
82bfd0a443 Fix Bedrock config for OpenAI GPT-OSS models
All checks were successful
BotServer CI / build (push) Successful in 12m35s
2026-03-10 12:36:24 -03:00
c326581a9e fix(zitadel): resolve OAuth client initialization timing issue
- Fix PAT extraction timing with retry loop (waits up to 60s for PAT in logs)
- Add sync command to flush filesystem buffers before extraction
- Improve logging with progress messages and PAT verification
- Refactor setup code into consolidated setup.rs module
- Fix YAML indentation for PatPath and MachineKeyPath
- Change Zitadel init parameter from --config to --steps

The timing issue occurred because:
1. Zitadel writes PAT to logs at startup (~18:08:59)
2. Post-install extraction ran too early (~18:09:35)
3. PAT file wasn't created until ~18:10:38 (63s after installation)
4. OAuth client creation failed because PAT file didn't exist yet

With the retry loop:
- Waits for PAT to appear in logs with sync+grep check
- Extracts PAT immediately when found
- OAuth client creation succeeds
- directory_config.json saved with valid credentials
- Login flow works end-to-end

Tested: Full reset.sh and login verification successful
2026-03-01 19:06:09 -03:00
eb5c12c466 fix(directory): add .await to ensure_admin_token() calls
Some checks failed
BotServer CI / build (push) Failing after 5m28s
Fixed compilation errors by adding .await to all ensure_admin_token() calls:
- create_organization()
- create_user()
- save_config()

The method was made async but the calls weren't updated.
2026-03-01 09:52:31 -03:00
8f495c75ec WIP: Local changes before merging master into main 2026-03-01 07:40:11 -03:00
0c9665dd8b fix: Enable vector_db by default with health check and fallback to local LLM
- Add vector_db_health_check() function to verify Qdrant availability
- Add wait loop for vector_db startup in bootstrap (15 seconds)
- Fallback to local LLM when external URL configured but no API key provided
- Prevent external LLM (api.z.ai) usage without authentication key

This fixes the production issues:
- Qdrant vector database not available at https://localhost:6333
- External LLM being used instead of local when no key is configured
- Ensures vector_db is properly started and ready before use

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:54:17 +00:00
fc34461b2f fix: Improve cache_health_check TCP test to eliminate false positives
All checks were successful
BotServer CI / build (push) Successful in 8m28s
The previous /dev/tcp test was giving false positives, reporting that
Valkey was running when it was actually down. This caused bootstrap to
skip starting Valkey, leading to botserver hanging on cache connection.

Changes:
- Use nc (netcat) with -z flag for reliable port checking
- Final fallback: /dev/tcp with actual PING/PONG verification
- Only returns true if port is open AND responds correctly

This ensures cache_health_check() accurately reports Valkey status.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 12:02:23 +00:00
d243666cc2 fix: Support valkey-cli in cache_health_check for Valkey-only setups
All checks were successful
BotServer CI / build (push) Successful in 8m19s
- Try valkey-cli first (preferred for Valkey installations)
- Fall back to redis-cli (for Redis installations)
- Fall back to TCP connection test (works for both)

This fixes environments that only have Valkey installed without
Redis symlinks or redis-cli.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 11:48:02 +00:00
ba70bd97bc fix: Bootstrap should check Valkey is running and use correct component name
All checks were successful
BotServer CI / build (push) Successful in 8m56s
- Fix component name mismatch: "redis" -> "cache" in bootstrap_manager
- Add cache_health_check() function to verify Valkey is responding
- Add health check loop after starting cache (12s wait with PING test)
- Ensures cache is ready before proceeding with bootstrap

This fixes the issue where botserver would hang waiting for cache
connection because the cache component was never started.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 11:26:27 +00:00
17cb4ef147 Fix: Update source files and fix notify dependency reference
All checks were successful
BotServer CI / build (push) Successful in 11m35s
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 22:30:57 +00:00
5ea171d126 Refactor: Split large files into modular subdirectories
Some checks failed
BotServer CI / build (push) Failing after 1m34s
Split 20+ files over 1000 lines into focused subdirectories for better
maintainability and code organization. All changes maintain backward
compatibility through re-export wrappers.

Major splits:
- attendance/llm_assist.rs (2074→7 modules)
- basic/keywords/face_api.rs → face_api/ (7 modules)
- basic/keywords/file_operations.rs → file_ops/ (8 modules)
- basic/keywords/hear_talk.rs → hearing/ (6 modules)
- channels/wechat.rs → wechat/ (10 modules)
- channels/youtube.rs → youtube/ (5 modules)
- contacts/mod.rs → contacts_api/ (6 modules)
- core/bootstrap/mod.rs → bootstrap/ (5 modules)
- core/shared/admin.rs → admin_*.rs (5 modules)
- designer/canvas.rs → canvas_api/ (6 modules)
- designer/mod.rs → designer_api/ (6 modules)
- docs/handlers.rs → handlers_api/ (11 modules)
- drive/mod.rs → drive_handlers.rs, drive_types.rs
- learn/mod.rs → types.rs
- main.rs → main_module/ (7 modules)
- meet/webinar.rs → webinar_api/ (8 modules)
- paper/mod.rs → (10 modules)
- security/auth.rs → auth_api/ (7 modules)
- security/passkey.rs → (4 modules)
- sources/mod.rs → sources_api/ (5 modules)
- tasks/mod.rs → task_api/ (5 modules)

Stats: 38,040 deletions, 1,315 additions across 318 files

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-12 21:09:30 +00:00