DriveMonitor polling may be consuming resources and interfering with
LLM response delivery. Disabling to isolate the chat pipeline.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Previously the recv_task awaited stream_response() directly, which
froze the entire WebSocket message receiver while the LLM ran (30s+).
This meant a second user message couldn't be processed until the
first LLM call finished — a race condition that locked the session.
Now stream_response runs in its own tokio::spawn, keeping recv_task
free to handle new messages immediately. Also fixed borrow/lifetime
issue by cloning the response channel sender out of the lock scope.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Without connect_timeout, reqwest can hang for the full 60s timeout
when the remote server is unreachable (DNS, TCP connect, etc.).
Now fails in 5s max for connection issues, 30s for full request.
This means one user's LLM failure no longer blocks new users for
a full minute — the channel closes quickly and the WebSocket is freed.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
reqwest::Client::new() has no timeout — when external APIs (NVIDIA,
Groq, etc.) hang or throttle, the request blocks forever, freezing the
entire response pipeline for the user.
Also add std::time::Duration import to llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The previous fix used Handle::current().block_on() which deadlocks when
the Rhai engine runs on a Tokio worker thread — it blocks the very
thread the async task needs to make progress.
New approach: spawn a dedicated background thread with its own
single-threaded Tokio runtime, communicate via mpsc channel with a
45s timeout. This completely isolates the LLM runtime from the
caller's runtime, eliminating any possibility of thread starvation
or nested-runtime deadlock.
Also remove unused 'trace' import from llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add fallback: skip files from indexed KB folders even when file_states is empty
- Add file_states_count to debug log to detect load failures
- Add indexed_kb_names set for quick KB folder lookup
- This prevents the infinite download loop when file_states.json fails to deserialize
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Don't skip entire GBKB scan when all KBs are indexed
- Instead, skip individual files that are already tracked (not new)
- This allows new PDFs added to existing KB folders to be detected and indexed
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Implement ADD_SWITCHER keyword following the same pattern as ADD_SUGGESTION_TOOL:
- Created switcher.rs module with add_switcher_keyword() and clear_switchers_keyword()
- Added preprocessing to convert "ADD SWITCHER" to "ADD_SWITCHER"
- Added to keyword patterns and get_all_keywords()
- Stores switcher suggestions in Redis with type "switcher" and action "switch_context"
- Supports both "ADD SWITCHER" and "ADD_SWITCHER" syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Show thinking indicator while LLM is in reasoning mode
- Skip reasoning content (thinking text) from user response
- Only show actual HTML content after thinking ends
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- GLM4.7 and Kimi K2.5 send response in 'reasoning_content' field, 'content' is null
- Prefer 'content' for normal models, fallback to 'reasoning_content' for reasoning models
- Fixes blank white screen when using z-ai/glm4.7 model
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
SADD stores suggestions in a set (deduplicated) instead of a list (accumulates).
get_suggestions now uses SMEMBERS instead of LRANGE. Removed the TODO about
clearing suggestions since SADD inherently prevents duplicates.
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
- Add tokio timeout to SSE stream reads in OpenAI client (60s)
- Prevents indefinite hang when Kimi/Nvidia stops responding
- Add scanning AtomicBool to prevent concurrent check_gbkb_changes calls
- Skip GBKB scan entirely when all KBs already indexed in Qdrant
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Log full LLM response preview (500 chars) with has_html detection
- Log WebSocket send with message type, completeness, and content preview
- Use clone() for chunk in BotResponse to ensure accurate logging
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add BOTSERVER_BUILD_DATE env var to /api/health response
- Set build date during CI compilation via environment variable
- Enables checking deployed binary age without SSH access
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Use last_modified timestamp instead of ETag for change detection
- Skip re-queueing KBs that are already indexed in Qdrant
- Preserve indexed status across scans when content unchanged
- Add normalize_etag helper for consistent ETag comparison
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
When a tool button like Cartas activates a KB via USE KB, instead of
returning just the tool result (empty/label), the handler now checks
if session has active KBs. If so and result is empty/trivial,
falls through to the full LLM pipeline which injects KB context.