- Show thinking indicator while LLM is in reasoning mode
- Skip reasoning content (thinking text) from user response
- Only show actual HTML content after thinking ends
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- GLM4.7 and Kimi K2.5 send response in 'reasoning_content' field, 'content' is null
- Prefer 'content' for normal models, fallback to 'reasoning_content' for reasoning models
- Fixes blank white screen when using z-ai/glm4.7 model
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
- Add tokio timeout to SSE stream reads in OpenAI client (60s)
- Prevents indefinite hang when Kimi/Nvidia stops responding
- Add scanning AtomicBool to prevent concurrent check_gbkb_changes calls
- Skip GBKB scan entirely when all KBs already indexed in Qdrant
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add BOTSERVER_BUILD_DATE env var to /api/health response
- Set build date during CI compilation via environment variable
- Enables checking deployed binary age without SSH access
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- drive_monitor: replace hardcoded salesianos.gbot with dynamic bot_name
- llm/mod.rs: stop falling back to reasoning_content as content
- llm/claude.rs: same fix for Claude handler
- deepseek_r3: export strip_think_tags for reuse
- gpt_oss_20b: use strip_think_tags so all models strip tags
- gpt_oss_120b: use strip_think_tags so all models strip tags
Previously, ensure_llama_servers_running() would return early
when both LLM and embedding servers were already running, without
calling set_embedding_server_ready(true). This caused DriveMonitor
to skip KB indexing with 'Embedding server not yet marked ready'.
Fix: call set_embedding_server_ready(true) before returning early
when servers are already running.
1. Fix model.starts_with('') always true - was limiting ALL models to 768 tokens
(local llama limit), truncating system prompts and KB context. Now only
applies when model=='local' or empty string, default is 32k tokens.
2. Fix create_bot_from_drive missing NOT NULL columns (llm_provider,
context_provider) - bots auto-created from S3 buckets failed to persist.
3. Fix S3 endpoint URL construction missing port 9100.
4. Fix Vault seed: vectordb.url was empty string, now defaults to
http://localhost:6333.
5. Fix Vault credential regeneration on recovery - added vault_seeds_exist().
6. Fix CA cert path for Vault TLS (botserver-stack vs botserver-stack).
7. Add bot verification after insert in create_bot_from_drive.
- Adds get_stack_path() helper: returns /opt/gbo in production (.env without botserver-stack), ./botserver-stack in dev
- Adds get_work_path() helper: returns /opt/gbo/work in production, ./botserver-stack/data/system/work in dev
- Updated 35+ files to use dynamic path resolution
- Production system container no longer needs botserver-stack directory
- Work files go to /opt/gbo/work instead of /opt/gbo/bin/botserver-stack
- Store last_sent timestamp in Redis (whatsapp:last_sent:<phone>)
- Always wait 6 seconds between messages to same recipient
- Persists across restarts
- Move system_prompt retrieval inside spawn_blocking closure
- Include system_prompt in the return tuple to fix scope issue
- Add trace logging for debugging system-prompt loading
- GLM-5 and other LLM providers now correctly receive custom system prompts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Lower score_threshold in kb_indexer.rs from 0.5 to 0.3
- Lower website search threshold in kb_context.rs from 0.6 to 0.4
- Lower KB search threshold in kb_context.rs from 0.7 to 0.5
- Add Cloudflare AI (/ai/run/) URL detection in cache.rs
- Add Cloudflare AI request format ({"text": ...}) in cache.rs
- Add Cloudflare AI response parsing (result.data) in cache.rs
This fixes the issue where KB search returned 0 results even with
114 chunks indexed. The high thresholds were filtering out all results.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added api_key field to LocalEmbeddingService for authentication
- Added embedding-key config parameter support in bootstrap
- Smart URL handling: doesn't append /embedding for full URLs (HuggingFace, OpenAI, etc.)
- Detects HuggingFace URLs and uses correct request format (inputs instead of input/model)
- Handles multiple response formats:
- HuggingFace: direct array [0.1, 0.2, ...]
- Standard/OpenAI: {"data": [{"embedding": [...]}]}
- Added Authorization header with Bearer token when api_key is provided
- Improved error messages with full response details
Fixes embedding errors when using HuggingFace Inference API
- Enhanced error messages in LocalEmbeddingService to show actual HTTP status and response
- Added semantic-cache-enabled config parameter to disable semantic matching when embedding service unavailable
- Improved error logging with full response details for debugging production issues
- Prevents 'Invalid embedding response' errors by allowing graceful fallback
- Fixed RCE vulnerability in trusted_shell_script_arg execution
- Fixed SSRF vulnerability in GET command with internal IP blocking
- Updated SafeCommand to use explicit positional arguments
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DeepSeek R1 model outputs reasoning_content first, then content.
With n_predict=50, responses were truncated during reasoning phase.
Increased to 512 to allow full reasoning + response.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change model paths to use ./data/llm/ instead of relative paths from build dir
- Remove cd command when starting llama-server to keep botserver root as cwd
- This fixes model loading when servers are started from different directories
- Both LLM and embedding servers now start successfully
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add vector_db_health_check() function to verify Qdrant availability
- Add wait loop for vector_db startup in bootstrap (15 seconds)
- Fallback to local LLM when external URL configured but no API key provided
- Prevent external LLM (api.z.ai) usage without authentication key
This fixes the production issues:
- Qdrant vector database not available at https://localhost:6333
- External LLM being used instead of local when no key is configured
- Ensures vector_db is properly started and ready before use
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The config_manager.get_config() can return Ok("") for empty config values,
which would pass through unwrap_or_else() without using the default.
Added checks after config retrieval to use defaults when config values
are empty strings:
- gpu_layers: "20" (default for GPU layers)
- n_moe: "4" (default for MoE)
- parallel: "1" (default for parallel)
- n_predict: "50" (default for predict)
- n_ctx_size: "32000" (default for context size)
This fixes the error: "error while handling argument --n-gpu-layers: stoi"
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When no default.gbai/config.csv exists, the system now:
- Sets default llm_server_path to ./botserver-stack/bin/llm/build/bin
- Uses correct relative paths to model files: ../../../../data/llm/
- Uses actual model filenames from 3rdparty.toml
This fixes the issue where LLM/embedding servers couldn't find model files
because the paths were constructed incorrectly.
Model filenames:
- LLM: DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf
- Embedding: bge-small-en-v1.5-f32.gguf
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed the default LLM model from glm-4 to deepseek-small to match
the model defined in 3rdparty.toml ([models.deepseek_small]).
This ensures that when no default.gbai/config.csv exists, the system
uses the correct default local model.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When no default.gbai/config.csv exists or when llm-model/embedding-model
config is empty, the system now uses default local models instead of
skipping server startup.
Changes:
- Default LLM model: glm-4
- Default Embedding model: bge-small-en-v1.5
- Logs when using defaults
This fixes the issue where the "default" bot would fail to load LLM
and Embedding services when no config.csv was present, causing the
error: "not loading embedding neither llm local for default bot"
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>