Reduce embedding batch_size from 16 to 2 to prevent llama-server crash
All checks were successful
BotServer CI/CD / build (push) Successful in 2m3s
All checks were successful
BotServer CI/CD / build (push) Successful in 2m3s
The bge-small-en-v1.5-f32.gguf model has n_ctx_train=512. With batch_size=16 and ~300+ tokens per chunk, total tokens exceed 512 causing GGML_ASSERT crash. Now with batch_size=2, embeddings are processed safely.
This commit is contained in:
parent
f48fa6d5f0
commit
cd0e049e81
1 changed files with 1 additions and 1 deletions
|
|
@ -43,7 +43,7 @@ impl Default for EmbeddingConfig {
|
||||||
embedding_model: "BAAI/bge-multilingual-gemma2".to_string(),
|
embedding_model: "BAAI/bge-multilingual-gemma2".to_string(),
|
||||||
embedding_key: None,
|
embedding_key: None,
|
||||||
dimensions: 2048,
|
dimensions: 2048,
|
||||||
batch_size: 16,
|
batch_size: 2,
|
||||||
timeout_seconds: 60,
|
timeout_seconds: 60,
|
||||||
max_concurrent_requests: 1,
|
max_concurrent_requests: 1,
|
||||||
connect_timeout_seconds: 10,
|
connect_timeout_seconds: 10,
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue