Commit graph

111 commits

Author SHA1 Message Date
dd4c780c4d Fix Zitadel health check and add ss command to allowed commands
All checks were successful
BotServer CI/CD / build (push) Successful in 3m20s
- Add 'ss' to ALLOWED_COMMANDS for port checking
- Fix Zitadel health check URL to include full address
2026-04-11 07:49:11 -03:00
5576378b3f Update botserver: Multiple improvements across core modules
All checks were successful
BotServer CI/CD / build (push) Successful in 10m41s
2026-04-11 07:33:32 -03:00
90c14bcd09 Fix DETECT: use bot-specific DB pool, add anonymous auth when directory disabled
All checks were successful
BotServer CI/CD / build (push) Successful in 12m42s
2026-04-06 13:37:23 -03:00
552d58376f fix: Fix compilation errors from path refactoring
Some checks failed
BotServer CI/CD / build (push) Failing after 1m27s
- bootstrap_utils.rs: Change Vec<(&'static str,...)> to Vec<(String,...)> to avoid dangling references
- bootstrap_manager.rs: Use name.as_str() for safe_pkill
- setup.rs: Use PathBuf instead of Path::new with format!
- directory/bootstrap.rs: Use PathBuf for pat_dir
- main.rs: Use PathBuf for vault_init_path_early
2026-04-04 10:04:00 -03:00
7d8f141fc2 refactor: Replace all hardcoded ./botserver-stack paths with get_stack_path()/get_work_path()
Some checks failed
BotServer CI/CD / build (push) Failing after 1m28s
- Adds get_stack_path() helper: returns /opt/gbo in production (.env without botserver-stack), ./botserver-stack in dev
- Adds get_work_path() helper: returns /opt/gbo/work in production, ./botserver-stack/data/system/work in dev
- Updated 35+ files to use dynamic path resolution
- Production system container no longer needs botserver-stack directory
- Work files go to /opt/gbo/work instead of /opt/gbo/bin/botserver-stack
2026-04-04 09:24:44 -03:00
6f183c63d2 feat: dual-mode service configs - Vault first, fallback to DB/localhost
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
All services now try Vault first (remote/distributed mode), then fall back
to database config, then localhost defaults (local/dev mode).

Services fixed:
- Qdrant/VectorDB: kb_indexer.rs, kb_statistics.rs, bootstrap_utils.rs, kb_context.rs
- LLM/Embedding: email/vectordb.rs (was hardcoded localhost:8082)
- All services: security/integration.rs (postgres, cache, drive, directory, qdrant, llm)

Pattern: SecretsManager::get_X_config_sync() → DB config → localhost default
2026-04-03 15:01:37 -03:00
65e7db5acd Skip local service install/start when remote Vault detected
All checks were successful
BotServer CI/CD / build (push) Successful in 5m48s
- install_all() returns early if VAULT_ADDR is remote
- start_all() returns early if VAULT_ADDR is remote
- bootstrap.rs treats remote VAULT_ADDR as bootstrap_completed=true
- Prevents botserver from trying to install/start local services
  when all services are running in separate containers
2026-04-03 07:36:15 -03:00
98b204b12e Fix health checks: replace nc with ss -tln for non-root environments
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-02 18:15:07 -03:00
3c9e4ba6e7 fix: cache_health_check uses ss instead of nc (nc missing in prod container)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m42s
- Root cause: prod container lacks nc (netcat), causing fallback to valkey-cli ping
- valkey-cli ping hangs indefinitely when Valkey requires password auth
- Fix: use ss -tlnp as primary check (always available), nc as fallback
- Testing: verified ss is available in prod, nc is not
2026-04-01 20:06:13 -03:00
318367d439 fix: Valkey health check uses nc first (avoids password hang)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m58s
- nc -z checks port connectivity instantly (no auth needed)
- valkey-cli ping as fallback (hangs when password required)
- Fixes bootstrap hang on production where Valkey has Vault password
2026-04-01 18:52:04 -03:00
c26e483cc9 fix: All services check health before starting (idempotent bootstrap)
All checks were successful
BotServer CI/CD / build (push) Successful in 4m9s
- Tables (PostgreSQL): pg_isready health check before start
- Drive (MinIO): /minio/health/live check before start
- ALM (Forgejo): HTTP health check before start
- ALM CI (Forgejo Runner): pgrep check before start
- Valkey: health check uses absolute path to valkey-cli
- Vault, Qdrant, Zitadel: already had health checks
- Result: no duplicate starts, no hangs on restart
2026-04-01 18:28:54 -03:00
ba7f1ba5eb fix: Valkey health check uses absolute path to valkey-cli
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Use BOTSERVER_STACK_PATH/bin/cache/bin/valkey-cli instead of relying on PATH
- Remove bash /dev/tcp fallback (unreliable in restricted environments)
- Falls back to redis-cli and nc if valkey-cli unavailable
2026-04-01 18:11:26 -03:00
68ef554132 fix: Vault as single source of truth - credentials + location for all services
All checks were successful
BotServer CI/CD / build (push) Successful in 4m53s
- Qdrant health check: recognize 'healthz check passed' response (fixes 45s timeout)
- seed_vault_defaults: add host/port/url/grpc_port for ALL 10 services
- fetch_vault_credentials: fetch ALL services via generic loop (drive, cache, tables, vectordb, directory, llm, meet, alm, encryption)
- vectordb URL: fix https://localhost:6334 -> http://localhost:6333 in all config getters
- get_from_env: add host/port/grpc_port for vectordb fallback
- Tested: .reset (fresh install) + .restart (idempotent) - zero errors
2026-04-01 16:46:16 -03:00
fb2e5242da fix: Vault seeding, service health checks, and restart idempotency
All checks were successful
BotServer CI/CD / build (push) Successful in 55m52s
- Replace hardcoded passwords with generate_random_string() for all Vault-seeded services
- Add valkey-cli, nc to SafeCommand allowlist; fix PATH in all 4 execution methods
- Fix empty Vault KV values ('none' placeholder) preventing 'Failed to parse K=V' errors
- Fix special chars in generated passwords triggering shell injection false positives
- Add ALM app.ini creation with absolute paths for Forgejo CLI
- Increase Qdrant timeout 15s→45s, ALM wait 5s→20s
- Persist file_states and kb_states to disk for .bas/KB idempotency across restarts
- Add duplicate check to use_website registration (debug log for existing)
- Remove dead code (SERVER_START_EPOCH, server_epoch)
- Add generate_random_string() to shared mod.rs, remove duplicates
2026-04-01 12:22:57 -03:00
9919a8321c fix: Use SafeCommand directly for vault health check to avoid shell injection false positive
All checks were successful
BotServer CI/CD / build (push) Successful in 6m46s
- Replace safe_sh_command with SafeCommand::new("curl").args() in vault_health_check()
- The URL contains https:// which triggered '//' pattern detection in shell command
- Direct SafeCommand bypasses shell parsing, URL passed as single argument
- Add vault data directory existence check before recovery attempt
- Prevents 'Dangerous pattern // detected' errors during bootstrap
2026-03-31 21:34:04 -03:00
07a6c1edb3 Merge commit '582ea634'
All checks were successful
BotServer CI/CD / build (push) Successful in 7m38s
2026-03-31 21:10:25 -03:00
582ea634e7 fix: Vault bootstrap recovery for sealed but initialized instances
- Fix vault_health_check() stub that always returned false
- Add recover_existing_vault() to handle Vault with existing data but no init.json
- Add unseal_vault() helper to unseal with existing vault-unseal-keys
- Detect initialized Vault via health endpoint or data directory presence
- Prevents bootstrap failure when reset.sh deletes init.json but Vault data persists

Root cause: vault_health_check() was a stub returning false, causing bootstrap
to always try vault operator init on already-initialized (but sealed) Vault,
which failed with connection refused. This cascaded to all services failing
to fetch credentials from Vault.
2026-03-31 20:49:29 -03:00
2fa59057fa fix: Resolve migration error, Vault 403, cache timeout, and shell injection false positives
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Fix migration 6.2.5: Create lost_reason column before VIEW that references it
- Fix Vault 403: Enable KV2 secrets engine after initialization
- Fix cache timeout: Increase Valkey readiness wait from 12s to 30s
- Fix command_guard: Remove () from forbidden chars (safe in std::process::Command)
2026-03-31 19:55:16 -03:00
ef426b7a50 LXD proxy and container improvements
Some checks failed
BotServer CI / build (push) Failing after 7m5s
2026-03-15 15:50:02 -03:00
13892b3157 Fix tenant-org-bot relationship and CRM lead form 2026-03-12 18:19:18 -03:00
260a13e77d refactor: apply various fixes across botserver
Some checks failed
BotServer CI / build (push) Has been cancelled
2026-03-10 15:15:21 -03:00
82bfd0a443 Fix Bedrock config for OpenAI GPT-OSS models
All checks were successful
BotServer CI / build (push) Successful in 12m35s
2026-03-10 12:36:24 -03:00
c072fb936e fix(llm): load system-prompt from config.csv correctly
All checks were successful
BotServer CI / build (push) Successful in 17m27s
- Move system_prompt retrieval inside spawn_blocking closure
- Include system_prompt in the return tuple to fix scope issue
- Add trace logging for debugging system-prompt loading
- GLM-5 and other LLM providers now correctly receive custom system prompts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-09 11:55:05 -03:00
5404e3e7ba feat: Enhance KB context, embedding generator, and website crawler
- Improved kb_context with better context management
- Enhanced embedding_generator with extended functionality (+231 lines)
- Updated kb_indexer with improved indexing logic
- Expanded website_crawler_service capabilities (+230 lines)
- Updated use_website keyword implementation
- Refined bootstrap_manager and utils
- Improved drive monitoring and local file monitor
- Added server enhancements
2026-03-04 15:43:37 -03:00
1e71c9be09 feat: Add comprehensive stage progress logging
Some checks failed
BotServer CI / build (push) Failing after 2m39s
- Add detailed logging for all 5 pipeline stages (PLAN, BUILD, REVIEW, DEPLOY, MONITOR)
- Log stage start/complete events with agent IDs and progress details
- Add resource creation/deletion logging in drive_handlers
- Improve pipeline summary logging with task ID, nodes, resources, and URL

This addresses the requirement for textual progress in console logs.
2026-03-01 22:36:14 -03:00
c326581a9e fix(zitadel): resolve OAuth client initialization timing issue
- Fix PAT extraction timing with retry loop (waits up to 60s for PAT in logs)
- Add sync command to flush filesystem buffers before extraction
- Improve logging with progress messages and PAT verification
- Refactor setup code into consolidated setup.rs module
- Fix YAML indentation for PatPath and MachineKeyPath
- Change Zitadel init parameter from --config to --steps

The timing issue occurred because:
1. Zitadel writes PAT to logs at startup (~18:08:59)
2. Post-install extraction ran too early (~18:09:35)
3. PAT file wasn't created until ~18:10:38 (63s after installation)
4. OAuth client creation failed because PAT file didn't exist yet

With the retry loop:
- Waits for PAT to appear in logs with sync+grep check
- Extracts PAT immediately when found
- OAuth client creation succeeds
- directory_config.json saved with valid credentials
- Login flow works end-to-end

Tested: Full reset.sh and login verification successful
2026-03-01 19:06:09 -03:00
8016caa790 fix(bootstrap): create OAuth client when Zitadel is already running
Some checks failed
BotServer CI / build (push) Failing after 3m20s
- Added OAuth client creation to the 'already running' branch
- Previously, OAuth client was only created when Zitadel was started
- This fixes the issue where OAuth client wasn't created on subsequent runs

Fixes the OAuth client creation issue discovered during testing.
2026-03-01 10:11:42 -03:00
eb5c12c466 fix(directory): add .await to ensure_admin_token() calls
Some checks failed
BotServer CI / build (push) Failing after 5m28s
Fixed compilation errors by adding .await to all ensure_admin_token() calls:
- create_organization()
- create_user()
- save_config()

The method was made async but the calls weren't updated.
2026-03-01 09:52:31 -03:00
8f495c75ec WIP: Local changes before merging master into main 2026-03-01 07:40:11 -03:00
bbdf243c86 fix: OAuth client creation during bootstrap
Some checks failed
BotServer CI / build (push) Failing after 6m2s
- Add password grant authentication support in DirectorySetup
- Extract initial admin credentials from Zitadel log file
- Fix race condition in Zitadel startup (wait for health check before starting)
- Create parent directories before saving config
- Add retry logic for OAuth client creation
- Improve error handling with detailed messages

Fixes authentication service not configured error after bootstrap.
2026-03-01 07:31:17 -03:00
b1118f977d fix: Correct parameter names in tool .bas files to match database schema
- Tool 06: Change tipoExibicao to tipoDescricao (matches pedidos_uso_imagem table)
- Tool 07: Change tipoExibicao to categoriaDescricao (matches licenciamentos table)
- Both tools now compile and execute successfully with database inserts

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 17:51:47 +00:00
0c9665dd8b fix: Enable vector_db by default with health check and fallback to local LLM
- Add vector_db_health_check() function to verify Qdrant availability
- Add wait loop for vector_db startup in bootstrap (15 seconds)
- Fallback to local LLM when external URL configured but no API key provided
- Prevent external LLM (api.z.ai) usage without authentication key

This fixes the production issues:
- Qdrant vector database not available at https://localhost:6333
- External LLM being used instead of local when no key is configured
- Ensures vector_db is properly started and ready before use

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:54:17 +00:00
fc34461b2f fix: Improve cache_health_check TCP test to eliminate false positives
All checks were successful
BotServer CI / build (push) Successful in 8m28s
The previous /dev/tcp test was giving false positives, reporting that
Valkey was running when it was actually down. This caused bootstrap to
skip starting Valkey, leading to botserver hanging on cache connection.

Changes:
- Use nc (netcat) with -z flag for reliable port checking
- Final fallback: /dev/tcp with actual PING/PONG verification
- Only returns true if port is open AND responds correctly

This ensures cache_health_check() accurately reports Valkey status.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 12:02:23 +00:00
d243666cc2 fix: Support valkey-cli in cache_health_check for Valkey-only setups
All checks were successful
BotServer CI / build (push) Successful in 8m19s
- Try valkey-cli first (preferred for Valkey installations)
- Fall back to redis-cli (for Redis installations)
- Fall back to TCP connection test (works for both)

This fixes environments that only have Valkey installed without
Redis symlinks or redis-cli.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 11:48:02 +00:00
ba70bd97bc fix: Bootstrap should check Valkey is running and use correct component name
All checks were successful
BotServer CI / build (push) Successful in 8m56s
- Fix component name mismatch: "redis" -> "cache" in bootstrap_manager
- Add cache_health_check() function to verify Valkey is responding
- Add health check loop after starting cache (12s wait with PING test)
- Ensures cache is ready before proceeding with bootstrap

This fixes the issue where botserver would hang waiting for cache
connection because the cache component was never started.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 11:26:27 +00:00
4b44602d39 Update bootstrap and core modules
All checks were successful
BotServer CI / build (push) Successful in 10m27s
2026-02-14 09:54:14 +00:00
17cb4ef147 Fix: Update source files and fix notify dependency reference
All checks were successful
BotServer CI / build (push) Successful in 11m35s
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 22:30:57 +00:00
5ea171d126 Refactor: Split large files into modular subdirectories
Some checks failed
BotServer CI / build (push) Failing after 1m34s
Split 20+ files over 1000 lines into focused subdirectories for better
maintainability and code organization. All changes maintain backward
compatibility through re-export wrappers.

Major splits:
- attendance/llm_assist.rs (2074→7 modules)
- basic/keywords/face_api.rs → face_api/ (7 modules)
- basic/keywords/file_operations.rs → file_ops/ (8 modules)
- basic/keywords/hear_talk.rs → hearing/ (6 modules)
- channels/wechat.rs → wechat/ (10 modules)
- channels/youtube.rs → youtube/ (5 modules)
- contacts/mod.rs → contacts_api/ (6 modules)
- core/bootstrap/mod.rs → bootstrap/ (5 modules)
- core/shared/admin.rs → admin_*.rs (5 modules)
- designer/canvas.rs → canvas_api/ (6 modules)
- designer/mod.rs → designer_api/ (6 modules)
- docs/handlers.rs → handlers_api/ (11 modules)
- drive/mod.rs → drive_handlers.rs, drive_types.rs
- learn/mod.rs → types.rs
- main.rs → main_module/ (7 modules)
- meet/webinar.rs → webinar_api/ (8 modules)
- paper/mod.rs → (10 modules)
- security/auth.rs → auth_api/ (7 modules)
- security/passkey.rs → (4 modules)
- sources/mod.rs → sources_api/ (5 modules)
- tasks/mod.rs → task_api/ (5 modules)

Stats: 38,040 deletions, 1,315 additions across 318 files

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-12 21:09:30 +00:00
7da1442c91 Fix PostgreSQL readiness check timeout in bootstrap
All checks were successful
BotServer CI / build (push) Successful in 13m13s
- Use local pg_isready path when available (./botserver-stack/bin/tables/bin/pg_isready)
- Fall back to system pg_isready if local binary not found
- Prevents 30-second timeout during bootstrap when PostgreSQL is actually running
- Applied to both readiness checks in start_all() method
2026-02-05 22:23:04 -03:00
07ff7a6297 Fix PostgreSQL health check - remove database-specific check
All checks were successful
BotServer CI / build (push) Successful in 12m18s
- Removed -d 'postgres' parameter from pg_isready health checks
- Health check now only verifies server connection on port 5432
- Fixes false positive failures when PostgreSQL is running but specific database has issues
- PostgreSQL logs showed 'database system is ready' but health check was failing
2026-02-05 21:55:47 -03:00
de3e5c4fec Fix PostgreSQL ready check for already-running instances
All checks were successful
BotServer CI / build (push) Successful in 14m23s
Add pg_isready health check to the 'already running' branch to ensure
PostgreSQL is properly detected as ready, even when running as a
non-interactive user (sudo -u gbuser).

This complements the previous fix for fresh PostgreSQL starts.
2026-02-05 21:09:23 -03:00
3258f3ef85 Fix PostgreSQL startup failure for non-interactive users
All checks were successful
BotServer CI / build (push) Successful in 11m58s
Changed pg_isready checks from '-U gbuser' to '-d postgres' to properly
detect PostgreSQL readiness during bootstrap. The gbuser database doesn't
exist yet during startup, causing pg_isready to fail and bootstrap to timeout.

This fixes the issue when running botserver as a non-interactive user
(e.g., sudo -u gbuser).
2026-02-05 20:40:52 -03:00
717ca8c35b Fix: Remove undefined db_password variable in database creation
Some checks failed
BotServer CI / build (push) Failing after 11m39s
2026-02-05 19:06:59 -03:00
309c5bc55a Fix: Ensure botserver database exists before running migrations
Some checks are pending
BotServer CI / build (push) Waiting to run
2026-02-05 19:02:01 -03:00
35af28a041 Fix compilation errors: duplicate imports, feature-gated functions, type mismatch
Some checks failed
BotServer CI / build (push) Failing after 7m28s
2026-02-05 18:31:14 -03:00
24b516406a Add automatic log dump when component fails to start 2026-02-05 14:53:37 -03:00
dfb18f75e6 Fix bootstrap database creation - use BOOTSTRAP_DB_PASSWORD env var 2026-02-05 14:42:13 -03:00
025e6fe823 Improve error messages: show PostgreSQL log contents on startup failure 2026-02-05 14:13:31 -03:00
355215c2a2 Update: refactor migrations, update source files, and add new features 2026-02-04 13:29:29 -03:00
39c4dba838 feat: Add template validation system with .valid file
- Modify bootstrap to read .valid file and validate templates before loading
- Templates not in .valid file are skipped during bootstrap
- Backward compatible: if .valid file missing, all templates are loaded
- Enables controlled template loading during bootstrap
2026-02-01 14:20:35 -03:00