- Add AzureGPT5Client struct for Responses API
- Add AzureGPT5 to LLMProviderType enum
- Detect provider via azuregpt5 or gpt5 in llm-provider config
- Fix gpt_oss_120b.rs chars.peek() issue
Move logs detalhados de LLM e DriveMonitor de info! para trace!
para reduzir poluição nos logs de produção:
- bot/mod.rs: LLM chunk logs, streaming start, abort
- llm/mod.rs: LLM Request Details, provider creation logs
Estes logs são úteis para debug mas geram muito ruído em produção.
Com trace! só aparecem quando RUST_LOG=trace está configurado.
- Remove excessive trace/debug logging in hot loops
- Fix broadcast_theme_change lock contention by cloning channels before iterating
- Increase default sleep interval from 10s to 30s
- Remove [MODULE] prefixes from log messages
- Fix PDF re-download bug by using only last_modified (not ETag) for change detection
- Re-enable DriveMonitor in bootstrap (was disabled for testing)
- Kimi factory: add max_tokens=16384, temperature=1.0, top_p=1.0,
and chat_template_kwargs.thinking=true for kimi models
- Add chunk count traces in stream_response so we see LLM progress
immediately in logs: 'LLM chunk #N received (len=X)'
- Keep generic stream parser clean — model-specific logic lives in
the request builder (Kimi factory pattern)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Without connect_timeout, reqwest can hang for the full 60s timeout
when the remote server is unreachable (DNS, TCP connect, etc.).
Now fails in 5s max for connection issues, 30s for full request.
This means one user's LLM failure no longer blocks new users for
a full minute — the channel closes quickly and the WebSocket is freed.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
reqwest::Client::new() has no timeout — when external APIs (NVIDIA,
Groq, etc.) hang or throttle, the request blocks forever, freezing the
entire response pipeline for the user.
Also add std::time::Duration import to llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The previous fix used Handle::current().block_on() which deadlocks when
the Rhai engine runs on a Tokio worker thread — it blocks the very
thread the async task needs to make progress.
New approach: spawn a dedicated background thread with its own
single-threaded Tokio runtime, communicate via mpsc channel with a
45s timeout. This completely isolates the LLM runtime from the
caller's runtime, eliminating any possibility of thread starvation
or nested-runtime deadlock.
Also remove unused 'trace' import from llm/mod.rs.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Show thinking indicator while LLM is in reasoning mode
- Skip reasoning content (thinking text) from user response
- Only show actual HTML content after thinking ends
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- GLM4.7 and Kimi K2.5 send response in 'reasoning_content' field, 'content' is null
- Prefer 'content' for normal models, fallback to 'reasoning_content' for reasoning models
- Fixes blank white screen when using z-ai/glm4.7 model
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
- Add tokio timeout to SSE stream reads in OpenAI client (60s)
- Prevents indefinite hang when Kimi/Nvidia stops responding
- Add scanning AtomicBool to prevent concurrent check_gbkb_changes calls
- Skip GBKB scan entirely when all KBs already indexed in Qdrant
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add BOTSERVER_BUILD_DATE env var to /api/health response
- Set build date during CI compilation via environment variable
- Enables checking deployed binary age without SSH access
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- drive_monitor: replace hardcoded salesianos.gbot with dynamic bot_name
- llm/mod.rs: stop falling back to reasoning_content as content
- llm/claude.rs: same fix for Claude handler
- deepseek_r3: export strip_think_tags for reuse
- gpt_oss_20b: use strip_think_tags so all models strip tags
- gpt_oss_120b: use strip_think_tags so all models strip tags
Previously, ensure_llama_servers_running() would return early
when both LLM and embedding servers were already running, without
calling set_embedding_server_ready(true). This caused DriveMonitor
to skip KB indexing with 'Embedding server not yet marked ready'.
Fix: call set_embedding_server_ready(true) before returning early
when servers are already running.