fix: Skip health check for remote HTTPS embedding APIs and wait for server in single-file indexing
All checks were successful
Botlib CI / build (push) Successful in 6s
BotServer CI / build (push) Successful in 5m26s
Bottest CI / build (push) Successful in 24s
BotUI CI / build (push) Successful in 11s

Two fixes for KB indexing failures with Cloudflare Workers AI:
1. check_health() now short-circuits for HTTPS URLs (remote APIs like
   Cloudflare don't have /health endpoints and return 401/301/403 on
   probes, which were incorrectly treated as 'unreachable')
2. index_single_file_with_id() now calls wait_for_server(30) instead
   of immediately failing, giving the embedding server time to become
   ready

Root cause: EMBEDDING_SERVER_READY is a global flag. When the default
bot's local embedding server check fails, it blocks ALL bots including
those using remote HTTPS APIs that don't need a local health check.
This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2026-04-22 17:06:03 +00:00
parent 248165c3cb
commit 6bf879a78a
2 changed files with 18 additions and 7 deletions

View file

@ -299,7 +299,16 @@ impl KbEmbeddingGenerator {
} }
pub async fn check_health(&self) -> bool { pub async fn check_health(&self) -> bool {
// Strategy: try /health endpoint on BASE URL first. // Remote HTTPS APIs (Cloudflare Workers AI, OpenAI, etc.) are assumed available
// — they don't have /health endpoints and return 401/403/301 on probe.
// Only local servers need TCP health checks.
if self.config.embedding_url.starts_with("https://") {
info!("Embedding server is remote HTTPS API ({}), assuming available", self.config.embedding_url);
set_embedding_server_ready(true);
return true;
}
// Strategy for local servers: try /health endpoint on BASE URL first.
// - 200 OK → local server with health endpoint, ready // - 200 OK → local server with health endpoint, ready
// - 404/405 etc → server is reachable but has no /health (remote API or llama.cpp) // - 404/405 etc → server is reachable but has no /health (remote API or llama.cpp)
// - Connection refused/timeout → server truly unavailable // - Connection refused/timeout → server truly unavailable

View file

@ -549,12 +549,14 @@ pub async fn index_single_file_with_id(
kb_name: &str, kb_name: &str,
file_path: &Path, file_path: &Path,
document_id: Option<&str>, document_id: Option<&str>,
) -> Result<IndexingResult> { ) -> Result<IndexingResult> {
if !is_embedding_server_ready() { if !is_embedding_server_ready() {
return Err(anyhow::anyhow!( if !self.embedding_generator.wait_for_server(30).await {
"Embedding server not available. Cannot index file." return Err(anyhow::anyhow!(
)); "Embedding server not available. Cannot index file."
} ));
}
}
if !self.check_qdrant_health().await.unwrap_or(false) { if !self.check_qdrant_health().await.unwrap_or(false) {
return Err(anyhow::anyhow!( return Err(anyhow::anyhow!(