3.9 KiB
3.9 KiB
LLM Package - Large Language Model Integration
Purpose
Manages large language model integration and operations. Provides unified interface for working with various LLM providers.
Key Files
- bedrock.rs: AWS Bedrock integration
- cache.rs: LLM response caching
- claude.rs: Anthropic Claude integration
- context/: Context management for LLM conversations
- episodic_memory.rs: Episodic memory for LLM interactions
- glm.rs: GLM model integration
- hallucination_detector.rs: Hallucination detection
- llm_models/: Supported LLM model definitions
- local.rs: Local LLM integration
- mod.rs: Module entry point and exports
- observability.rs: LLM observability and logging
- prompt_manager/: Prompt management system
- rate_limiter.rs: LLM API rate limiting
- smart_router.rs: Smart routing for LLM requests
- vertex.rs: Google Vertex AI integration
Features
Multi-Provider Support
use crate::llm::LLMService;
use crate::llm::models::ModelType;
let llm_service = LLMService::new();
// Generate text with specific model
let result = llm_service.generate_text(
ModelType::Claude3,
"Write a poem about technology".to_string(),
None
).await?;
Context Management
use crate::llm::context::ConversationContext;
let mut context = ConversationContext::new();
context.add_user_message("What's the capital of France?");
context.add_assistant_message("The capital of France is Paris.");
// Get context for next message
let context_text = context.get_context();
Episodic Memory
use crate::llm::episodic_memory::EpisodicMemory;
let memory = EpisodicMemory::new();
// Store memory
memory.store_memory(
user_id,
"user asked about France".to_string(),
"Paris is the capital".to_string()
).await?;
// Retrieve relevant memories
let memories = memory.retrieve_relevant_memories(
user_id,
"capital of France"
).await?;
Supported Models
- Claude (Anthropic): Claude 3 family
- Bedrock: AWS Bedrock models (Claude 3, Titan, etc.)
- Vertex AI: Google Cloud LLM models
- Local Models: Local inference support
- GLM: Chinese language models
Prompt Management
use crate::llm::prompt_manager::PromptManager;
let prompt_manager = PromptManager::new();
// Get prompt template
let template = prompt_manager.get_prompt_template("code_review").await?;
// Render prompt with variables
let prompt = template.render(&[("code", code_snippet)]);
Hallucination Detection
use crate::llm::hallucination_detector::HallucinationDetector;
let detector = HallucinationDetector::new();
// Check response for hallucinations
let result = detector.detect_hallucinations(response_text).await?;
if result.is_hallucination {
log::warn!("Hallucination detected: {}", result.reason);
}
Rate Limiting
use crate::llm::rate_limiter::RateLimiter;
let rate_limiter = RateLimiter::new();
// Check rate limit before request
if rate_limiter.is_rate_limited(user_id).await? {
return Err(Error::RateLimited);
}
// Make LLM request
let response = make_llm_request().await?;
// Update rate limit
rate_limiter.update_rate_limit(user_id).await?;
Observability
use crate::llm::observability::LLMObservability;
let observability = LLMObservability::new();
// Log LLM request
observability.log_request(
user_id,
model_type,
prompt_text,
response_text,
duration_ms
).await?;
Configuration
LLM settings are configured in:
botserver/.env- API keys and endpointsconfig/llm/- Model configuration- Database for dynamic settings
Error Handling
Use LLMError type which includes:
- Provider-specific errors
- Rate limiting errors
- API errors
- Validation errors
Testing
LLM package is tested with:
- Unit tests for core functionality
- Integration tests with real APIs
- Mocked tests for fast execution
- Error handling tests