Move logs detalhados de LLM e DriveMonitor de info! para trace!
para reduzir poluição nos logs de produção:
- bot/mod.rs: LLM chunk logs, streaming start, abort
- llm/mod.rs: LLM Request Details, provider creation logs
Estes logs são úteis para debug mas geram muito ruído em produção.
Com trace! só aparecem quando RUST_LOG=trace está configurado.
Aumenta a abrangência da busca em KB para capturar mais contexto
relevante, especialmente em documentos com múltiplas entidades
como listas de ramais.
- inject_kb_context: 5 -> 20 resultados
- think_kb: 10 -> 25 resultados
- search_active_websites: 5 -> 20 resultados
Bug: Using break instead of continue when encountering low-relevance
results caused the search to stop prematurely, missing potentially
relevant chunks in subsequent results.
- Changed break to continue when score < 0.4 in search_single_collection
- Changed break to continue when score < 0.4 in search_single_kb
- Lowered threshold from 0.5 to 0.4 for consistency
This ensures all search results are evaluated, not just those before
the first low-relevance result.
- Broadcast channel allows multiple subscribers for cancellation
- Aborts LLM task when user sends new message
- Properly stops LLM generation when cancelled
- Add active_streams HashMap to AppState to track streaming sessions
- Create cancellation channel for each streaming session
- Cancel existing streaming when new message arrives
- Prevents overlapping responses and improves UX
Thinking signals ({"type":"thinking"} and {"type":"thinking_clear"})
were leaking into the final HTML response when they appeared in the
middle or end of chunks, concatenated with regular content.
The previous check only looked at the start of chunks with
chunk.trim().starts_with('{'), which missed embedded signals.
Solution:
- Use regex to find ALL thinking signal JSON objects anywhere in the chunk
- Send each thinking signal separately to the frontend
- Remove thinking signals from the chunk before content processing
- Skip to next iteration if chunk contained only thinking signals
This prevents thinking signals from appearing in the final HTML output
and ensures they're properly handled by the frontend thinking indicator.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Kimi factory: add max_tokens=16384, temperature=1.0, top_p=1.0,
and chat_template_kwargs.thinking=true for kimi models
- Add chunk count traces in stream_response so we see LLM progress
immediately in logs: 'LLM chunk #N received (len=X)'
- Keep generic stream parser clean — model-specific logic lives in
the request builder (Kimi factory pattern)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Previously the recv_task awaited stream_response() directly, which
froze the entire WebSocket message receiver while the LLM ran (30s+).
This meant a second user message couldn't be processed until the
first LLM call finished — a race condition that locked the session.
Now stream_response runs in its own tokio::spawn, keeping recv_task
free to handle new messages immediately. Also fixed borrow/lifetime
issue by cloning the response channel sender out of the lock scope.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
- Log full LLM response preview (500 chars) with has_html detection
- Log WebSocket send with message type, completeness, and content preview
- Use clone() for chunk in BotResponse to ensure accurate logging
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
When a tool button like Cartas activates a KB via USE KB, instead of
returning just the tool result (empty/label), the handler now checks
if session has active KBs. If so and result is empty/trivial,
falls through to the full LLM pipeline which injects KB context.
When a bot lacks embedding-url in its own config, from_bot_config now
falls back to the default bot's config via ConfigManager::get_config.
Previously it returned empty string, causing embedding server connection
failures for bots without explicit embedding configuration.
Header detection was only checking for key,value but the actual
CSV uses name,value as header row. Now both are detected and skipped.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Removed unconditional .skip(1) that was skipping first config line
- Added header detection: skips first line only if it looks like 'key,value' header
- Added validation to skip empty keys
- Also fixed indentation in drive_monitor gbkb file processing
This fixes the issue where config.csv changes on Drive weren't being
synced to bot_configuration database table for salesianos bot.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>