Commit graph

177 commits

Author SHA1 Message Date
f9178e947e Use max_completion_tokens for GPT-5 models
All checks were successful
BotServer CI/CD / build (push) Successful in 3m13s
2026-04-16 16:21:16 -03:00
05f2a5b2ab Add debug logging for LLM provider detection
All checks were successful
BotServer CI/CD / build (push) Successful in 9m0s
2026-04-16 15:13:15 -03:00
c603618865 Add AzureGPT5 client and provider detection
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Add AzureGPT5Client struct for Responses API
- Add AzureGPT5 to LLMProviderType enum
- Detect provider via azuregpt5 or gpt5 in llm-provider config
- Fix gpt_oss_120b.rs chars.peek() issue
2026-04-16 15:07:23 -03:00
5d1a87f145 docs: Add comment to GptOss120bHandler
All checks were successful
BotServer CI/CD / build (push) Successful in 58s
2026-04-16 09:38:20 -03:00
355854bc6b Fix: proper process_content_streaming for GptOss120bHandler to handle thinking tags during streaming
All checks were successful
BotServer CI/CD / build (push) Successful in 1m7s
2026-04-16 09:12:27 -03:00
e63c187f32 Improve LLM streaming logs: start/end with size/content, first/last bytes; fix html2md function name
Some checks failed
BotServer CI/CD / build (push) Failing after 16m34s
2026-04-15 18:15:42 -03:00
adbf84f812 refactor: mover logs verbose de info! para trace!
All checks were successful
BotServer CI/CD / build (push) Successful in 3m25s
Move logs detalhados de LLM e DriveMonitor de info! para trace!
para reduzir poluição nos logs de produção:

- bot/mod.rs: LLM chunk logs, streaming start, abort
- llm/mod.rs: LLM Request Details, provider creation logs

Estes logs são úteis para debug mas geram muito ruído em produção.
Com trace! só aparecem quando RUST_LOG=trace está configurado.
2026-04-15 12:41:31 -03:00
21591e22dd fix: remove unused backend code
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-14 17:56:56 -03:00
62cdf1c638 fix: handle GLM 4.7 reasoning_content and chat_template_kwargs
All checks were successful
BotServer CI/CD / build (push) Successful in 3m29s
2026-04-14 17:14:04 -03:00
ba3e2675ef feat: stateful thinking tag stripping for Kimi, Minimax and DeepSeek stream
All checks were successful
BotServer CI/CD / build (push) Successful in 3m42s
2026-04-14 16:15:31 -03:00
8ccc4e1c5e fix: update Minimax and DeepSeek handlers to strip unclosed thinking tags
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-14 16:13:41 -03:00
a6f825526f fix: LLM duplicate URL path and episodic memory bot context/defaults
All checks were successful
BotServer CI/CD / build (push) Successful in 3m53s
2026-04-14 15:55:49 -03:00
32f8a10825 fix: normalize episodic/compact roles to system in all LLM providers
All checks were successful
BotServer CI/CD / build (push) Successful in 4m1s
2026-04-14 13:47:18 -03:00
d6527a438b fix: normalize roles to system for bedrock and vertex LLM providers 2026-04-14 13:44:12 -03:00
f04745ae1c fix: DriveMonitor loop performance and WebSocket blocking
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Remove excessive trace/debug logging in hot loops

- Fix broadcast_theme_change lock contention by cloning channels before iterating

- Increase default sleep interval from 10s to 30s

- Remove [MODULE] prefixes from log messages

- Fix PDF re-download bug by using only last_modified (not ETag) for change detection

- Re-enable DriveMonitor in bootstrap (was disabled for testing)
2026-04-14 13:42:23 -03:00
679bf05504 fix: Kimi K2.5 factory + LLM chunk traces
All checks were successful
BotServer CI/CD / build (push) Successful in 4m35s
- Kimi factory: add max_tokens=16384, temperature=1.0, top_p=1.0,
  and chat_template_kwargs.thinking=true for kimi models
- Add chunk count traces in stream_response so we see LLM progress
  immediately in logs: 'LLM chunk #N received (len=X)'
- Keep generic stream parser clean — model-specific logic lives in
  the request builder (Kimi factory pattern)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-14 10:20:02 -03:00
ed3406dd80 revert: restore working LLM streaming code from 260a13e7
All checks were successful
BotServer CI/CD / build (push) Successful in 11m59s
The recent LLM changes (timeouts, tool call accumulation, extra logging)
broke the WebSocket message flow. Reverting to the known working version.
2026-04-14 01:15:20 -03:00
301a7dda33 Add LLM stream timeout and debug logs
All checks were successful
BotServer CI/CD / build (push) Successful in 4m8s
2026-04-14 00:55:43 -03:00
da9facf036 fix: add 5s connect_timeout to LLM HTTP client so unreachable APIs fail fast
All checks were successful
BotServer CI/CD / build (push) Successful in 3m52s
Without connect_timeout, reqwest can hang for the full 60s timeout
when the remote server is unreachable (DNS, TCP connect, etc.).
Now fails in 5s max for connection issues, 30s for full request.

This means one user's LLM failure no longer blocks new users for
a full minute — the channel closes quickly and the WebSocket is freed.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 23:54:50 -03:00
3ec72f6121 fix: add 60s timeout to OpenAI-compatible HTTP client preventing LLM deadlock
All checks were successful
BotServer CI/CD / build (push) Successful in 4m2s
reqwest::Client::new() has no timeout — when external APIs (NVIDIA,
Groq, etc.) hang or throttle, the request blocks forever, freezing the
entire response pipeline for the user.

Also add std::time::Duration import to llm/mod.rs.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 23:31:12 -03:00
25d6d2fd57 fix: eliminate LLM keyword deadlock with isolated worker thread
All checks were successful
BotServer CI/CD / build (push) Successful in 3m32s
The previous fix used Handle::current().block_on() which deadlocks when
the Rhai engine runs on a Tokio worker thread — it blocks the very
thread the async task needs to make progress.

New approach: spawn a dedicated background thread with its own
single-threaded Tokio runtime, communicate via mpsc channel with a
45s timeout. This completely isolates the LLM runtime from the
caller's runtime, eliminating any possibility of thread starvation
or nested-runtime deadlock.

Also remove unused 'trace' import from llm/mod.rs.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 23:20:10 -03:00
7d911194f3 fix: disable all thinking detection to prevent deadlock
All checks were successful
BotServer CI/CD / build (push) Successful in 3m36s
2026-04-13 22:47:27 -03:00
3ddcc5a1d1 fix: simplify MinimaxHandler without regex
All checks were successful
BotServer CI/CD / build (push) Successful in 3m17s
2026-04-13 21:35:41 -03:00
6acf5fb4c0 fix: add reasoning_split=false to Minimax API calls
All checks were successful
BotServer CI/CD / build (push) Successful in 3m8s
2026-04-13 21:22:02 -03:00
2c82a8bd2e fix: add MinimaxHandler to strip thinking tags from content
All checks were successful
BotServer CI/CD / build (push) Successful in 4m8s
2026-04-13 21:17:01 -03:00
ea8857ec8a debug: trace LLM delta content for Minimax
All checks were successful
BotServer CI/CD / build (push) Successful in 4m7s
2026-04-13 21:08:46 -03:00
22e94f32ed fix: filter reasoning when content exists (Minimax/GLM/Kimi)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m57s
2026-04-13 20:58:58 -03:00
517d5435a9 fix: add max_tokens 131072 to OpenAI provider for minimax
All checks were successful
BotServer CI/CD / build (push) Successful in 4m25s
2026-04-13 20:26:03 -03:00
1b040a4278 fix: kimi stream - handle channel close, add parse error trace, fix buffer handling
All checks were successful
BotServer CI/CD / build (push) Successful in 3m27s
2026-04-13 19:59:00 -03:00
79997fc3b3 fix: kimi max_tokens 131072, add stream traces, content chars tracking
All checks were successful
BotServer CI/CD / build (push) Successful in 4m12s
2026-04-13 19:42:31 -03:00
4d9d38ffda fix: enable chat_template_kwargs for GLM thinking mode, add stream traces, fix config_manager scope
All checks were successful
BotServer CI/CD / build (push) Successful in 3m55s
2026-04-13 19:23:19 -03:00
d6ffe265ef fix: GLM max_tokens 131072, disable thinking kwargs - model sends content after reasoning naturally
All checks were successful
BotServer CI/CD / build (push) Successful in 3m11s
2026-04-13 18:52:02 -03:00
87df733db0 fix: GLM client - add chat_template_kwargs, handle reasoning_content, increase max_tokens to 16384
All checks were successful
BotServer CI/CD / build (push) Successful in 5m52s
2026-04-13 18:33:16 -03:00
8a65afbfc5 feat: add [BASIC_EXEC] traces for start, tool, scheduler, webhook execution
All checks were successful
BotServer CI/CD / build (push) Successful in 3m18s
2026-04-13 18:16:01 -03:00
5a24137a5b fix: remove extra_body param from GLM client - NVIDIA API rejects it
All checks were successful
BotServer CI/CD / build (push) Successful in 5m58s
2026-04-13 17:57:02 -03:00
81c60ceb25 feat: add Kimi client and GLM thinking mode support, fix tool exec direct return
All checks were successful
BotServer CI/CD / build (push) Successful in 6m22s
2026-04-13 17:36:31 -03:00
60fd3dbbc4 fix: handle both reasoning_content and reasoning fields for NVIDIA API
All checks were successful
BotServer CI/CD / build (push) Successful in 3m12s
2026-04-13 16:21:05 -03:00
8ddcde4830 fix: detect NVIDIA API as GLM provider, handle full URL path
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-13 16:18:00 -03:00
498c771d7b feat: add thinking indicator for reasoning models (GLM4.7, Kimi K2.5)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m27s
- Show thinking indicator while LLM is in reasoning mode
- Skip reasoning content (thinking text) from user response
- Only show actual HTML content after thinking ends

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 15:35:22 -03:00
3e99235a49 fix: support reasoning models (GLM4.7, Kimi K2.5) - use reasoning_content when content is null
All checks were successful
BotServer CI/CD / build (push) Successful in 3m19s
- GLM4.7 and Kimi K2.5 send response in 'reasoning_content' field, 'content' is null
- Prefer 'content' for normal models, fallback to 'reasoning_content' for reasoning models
- Fixes blank white screen when using z-ai/glm4.7 model

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 15:18:21 -03:00
c5d30adebe revert: restore llm/mod.rs to stable April 9 version
All checks were successful
BotServer CI/CD / build (push) Successful in 3m26s
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 15:07:19 -03:00
f8b47d1ac2 refactor: unify BASIC compilation into BasicCompiler only, runtime uses ScriptService::run() on pre-compiled .ast
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Move all preprocessing transforms (convert_multiword_keywords, preprocess_llm_keyword,
  convert_while_wend_syntax, predeclare_variables) into BasicCompiler::preprocess_basic
  so .ast files are fully preprocessed by Drive Monitor
- Replace ScriptService compile/compile_preprocessed/compile_tool_script with
  single run(ast_content) that does engine.compile() + eval_ast_with_scope()
- Remove .bas fallback in tool_executor and start.bas paths - .ast only
- Remove dead code: preprocess_basic_script, normalize_variables_to_lowercase,
  convert_save_for_tools, parse_save_parts, normalize_word
- Fix: USE KB 'cartas' in tool .ast now correctly converted to USE_KB('cartas')
  during compilation, ensuring KB context injection works after tool execution
- Fix: add trace import in llm/mod.rs
2026-04-13 14:05:55 -03:00
723407cfd6 fix: add 60s timeout to LLM stream reads and add concurrent scan guard
All checks were successful
BotServer CI/CD / build (push) Successful in 3m53s
- Add tokio timeout to SSE stream reads in OpenAI client (60s)
- Prevents indefinite hang when Kimi/Nvidia stops responding
- Add scanning AtomicBool to prevent concurrent check_gbkb_changes calls
- Skip GBKB scan entirely when all KBs already indexed in Qdrant

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 12:58:11 -03:00
d1652fc413 feat: add build_date to health endpoint for CI deploy verification
All checks were successful
BotServer CI/CD / build (push) Successful in 4m21s
- Add BOTSERVER_BUILD_DATE env var to /api/health response
- Set build date during CI compilation via environment variable
- Enables checking deployed binary age without SSH access

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-13 11:49:10 -03:00
dd68cdbe6c fix: remove hardcoded salesianos, strip think tags globally, block reasoning_content leak
All checks were successful
BotServer CI/CD / build (push) Successful in 6m38s
- drive_monitor: replace hardcoded salesianos.gbot with dynamic bot_name
- llm/mod.rs: stop falling back to reasoning_content as content
- llm/claude.rs: same fix for Claude handler
- deepseek_r3: export strip_think_tags for reuse
- gpt_oss_20b: use strip_think_tags so all models strip tags
- gpt_oss_120b: use strip_think_tags so all models strip tags
2026-04-13 09:04:22 -03:00
1977c4c0af fix: extract base URL for embedding health checks
All checks were successful
BotServer CI/CD / build (push) Successful in 4m2s
- Add extract_base_url() helper to parse scheme://host:port from full URLs
- Fix health check to use base URL instead of full endpoint path
- Allows embedding-url config like http://host:port/v1/embeddings to work correctly
- Health check now goes to http://host:port/health instead of http://host:port/v1/embeddings/health
2026-04-12 19:33:35 -03:00
2f3dd957e3 fix: resolve kb_collections and kb_group_associations imports for directory feature
All checks were successful
BotServer CI/CD / build (push) Successful in 7m50s
- Extract kb_collections and kb_group_associations into dedicated schema/kb.rs module
- Gate kb module behind rbac feature (directory depends on rbac)
- Remove duplicate definitions from research.rs
- Fix import paths in directory/groups/kbs.rs
- Remove dead rbac_kb imports from settings/rbac.rs
- Gate llm::local module behind llm feature to fix missing set_embedding_server_ready

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-12 12:48:42 -03:00
180bab0358 fix: mark embedding server ready when already running
All checks were successful
BotServer CI/CD / build (push) Successful in 3m36s
Previously, ensure_llama_servers_running() would return early
when both LLM and embedding servers were already running, without
calling set_embedding_server_ready(true). This caused DriveMonitor
to skip KB indexing with 'Embedding server not yet marked ready'.

Fix: call set_embedding_server_ready(true) before returning early
when servers are already running.
2026-04-12 10:27:23 -03:00
be3e4c4e54 Fix: Handle 'reasoning' field from NVIDIA kimi-k2.5 model
All checks were successful
BotServer CI/CD / build (push) Successful in 3m6s
2026-04-11 22:58:50 -03:00
47cb470c8e Fix: Handle reasoning_content from NVIDIA reasoning models (gpt-oss-120b)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m16s
2026-04-11 22:30:39 -03:00