- Remove excessive trace/debug logging in hot loops
- Fix broadcast_theme_change lock contention by cloning channels before iterating
- Increase default sleep interval from 10s to 30s
- Remove [MODULE] prefixes from log messages
- Fix PDF re-download bug by using only last_modified (not ETag) for change detection
- Re-enable DriveMonitor in bootstrap (was disabled for testing)
- Kimi factory: add max_tokens=16384, temperature=1.0, top_p=1.0,
and chat_template_kwargs.thinking=true for kimi models
- Add chunk count traces in stream_response so we see LLM progress
immediately in logs: 'LLM chunk #N received (len=X)'
- Keep generic stream parser clean — model-specific logic lives in
the request builder (Kimi factory pattern)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
1. CI: restart system container instead of just systemctl restart botserver
— ensures full env reload, Vault re-auth, DriveMonitor fresh state
2. Health endpoint: add 'commit' field with short git SHA
— build.rs passes BOTSERVER_COMMIT from CI via rustc-env
- Both /health and /api/health now report the running commit
3. WebSocket recv_task: spawn stream_response in separate tokio task
- prevents one hung LLM from freezing all message processing
- each WebSocket connection can now handle multiple messages
concurrently regardless of LLM latency
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
DriveMonitor polling may be consuming resources and interfering with
LLM response delivery. Disabling to isolate the chat pipeline.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Previously the recv_task awaited stream_response() directly, which
froze the entire WebSocket message receiver while the LLM ran (30s+).
This meant a second user message couldn't be processed until the
first LLM call finished — a race condition that locked the session.
Now stream_response runs in its own tokio::spawn, keeping recv_task
free to handle new messages immediately. Also fixed borrow/lifetime
issue by cloning the response channel sender out of the lock scope.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Without connect_timeout, reqwest can hang for the full 60s timeout
when the remote server is unreachable (DNS, TCP connect, etc.).
Now fails in 5s max for connection issues, 30s for full request.
This means one user's LLM failure no longer blocks new users for
a full minute — the channel closes quickly and the WebSocket is freed.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
reqwest::Client::new() has no timeout — when external APIs (NVIDIA,
Groq, etc.) hang or throttle, the request blocks forever, freezing the
entire response pipeline for the user.
Also add std::time::Duration import to llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The previous fix used Handle::current().block_on() which deadlocks when
the Rhai engine runs on a Tokio worker thread — it blocks the very
thread the async task needs to make progress.
New approach: spawn a dedicated background thread with its own
single-threaded Tokio runtime, communicate via mpsc channel with a
45s timeout. This completely isolates the LLM runtime from the
caller's runtime, eliminating any possibility of thread starvation
or nested-runtime deadlock.
Also remove unused 'trace' import from llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add fallback: skip files from indexed KB folders even when file_states is empty
- Add file_states_count to debug log to detect load failures
- Add indexed_kb_names set for quick KB folder lookup
- This prevents the infinite download loop when file_states.json fails to deserialize
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Don't skip entire GBKB scan when all KBs are indexed
- Instead, skip individual files that are already tracked (not new)
- This allows new PDFs added to existing KB folders to be detected and indexed
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Implement ADD_SWITCHER keyword following the same pattern as ADD_SUGGESTION_TOOL:
- Created switcher.rs module with add_switcher_keyword() and clear_switchers_keyword()
- Added preprocessing to convert "ADD SWITCHER" to "ADD_SWITCHER"
- Added to keyword patterns and get_all_keywords()
- Stores switcher suggestions in Redis with type "switcher" and action "switch_context"
- Supports both "ADD SWITCHER" and "ADD_SWITCHER" syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Show thinking indicator while LLM is in reasoning mode
- Skip reasoning content (thinking text) from user response
- Only show actual HTML content after thinking ends
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- GLM4.7 and Kimi K2.5 send response in 'reasoning_content' field, 'content' is null
- Prefer 'content' for normal models, fallback to 'reasoning_content' for reasoning models
- Fixes blank white screen when using z-ai/glm4.7 model
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
SADD stores suggestions in a set (deduplicated) instead of a list (accumulates).
get_suggestions now uses SMEMBERS instead of LRANGE. Removed the TODO about
clearing suggestions since SADD inherently prevents duplicates.
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
- Add tokio timeout to SSE stream reads in OpenAI client (60s)
- Prevents indefinite hang when Kimi/Nvidia stops responding
- Add scanning AtomicBool to prevent concurrent check_gbkb_changes calls
- Skip GBKB scan entirely when all KBs already indexed in Qdrant
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>