LLM Package - Large Language Model Integration

Purpose

Manages large language model integration and operations. Provides unified interface for working with various LLM providers.

Key Files

bedrock.rs: AWS Bedrock integration
cache.rs: LLM response caching
claude.rs: Anthropic Claude integration
context/: Context management for LLM conversations
episodic_memory.rs: Episodic memory for LLM interactions
glm.rs: GLM model integration
hallucination_detector.rs: Hallucination detection
llm_models/: Supported LLM model definitions
local.rs: Local LLM integration
mod.rs: Module entry point and exports
observability.rs: LLM observability and logging
prompt_manager/: Prompt management system
rate_limiter.rs: LLM API rate limiting
smart_router.rs: Smart routing for LLM requests
vertex.rs: Google Vertex AI integration

Features

Multi-Provider Support

use crate::llm::LLMService;
use crate::llm::models::ModelType;

let llm_service = LLMService::new();

// Generate text with specific model
let result = llm_service.generate_text(
    ModelType::Claude3,
    "Write a poem about technology".to_string(),
    None
).await?;

Context Management

use crate::llm::context::ConversationContext;

let mut context = ConversationContext::new();
context.add_user_message("What's the capital of France?");
context.add_assistant_message("The capital of France is Paris.");

// Get context for next message
let context_text = context.get_context();

Episodic Memory

use crate::llm::episodic_memory::EpisodicMemory;

let memory = EpisodicMemory::new();

// Store memory
memory.store_memory(
    user_id,
    "user asked about France".to_string(),
    "Paris is the capital".to_string()
).await?;

// Retrieve relevant memories
let memories = memory.retrieve_relevant_memories(
    user_id,
    "capital of France"
).await?;

Supported Models

Claude (Anthropic): Claude 3 family
Bedrock: AWS Bedrock models (Claude 3, Titan, etc.)
Vertex AI: Google Cloud LLM models
Local Models: Local inference support
GLM: Chinese language models

Prompt Management

use crate::llm::prompt_manager::PromptManager;

let prompt_manager = PromptManager::new();

// Get prompt template
let template = prompt_manager.get_prompt_template("code_review").await?;

// Render prompt with variables
let prompt = template.render(&[("code", code_snippet)]);

Hallucination Detection

use crate::llm::hallucination_detector::HallucinationDetector;

let detector = HallucinationDetector::new();

// Check response for hallucinations
let result = detector.detect_hallucinations(response_text).await?;

if result.is_hallucination {
    log::warn!("Hallucination detected: {}", result.reason);
}

Rate Limiting

use crate::llm::rate_limiter::RateLimiter;

let rate_limiter = RateLimiter::new();

// Check rate limit before request
if rate_limiter.is_rate_limited(user_id).await? {
    return Err(Error::RateLimited);
}

// Make LLM request
let response = make_llm_request().await?;

// Update rate limit
rate_limiter.update_rate_limit(user_id).await?;

Observability

use crate::llm::observability::LLMObservability;

let observability = LLMObservability::new();

// Log LLM request
observability.log_request(
    user_id,
    model_type,
    prompt_text,
    response_text,
    duration_ms
).await?;

Configuration

LLM settings are configured in:

botserver/.env - API keys and endpoints
config/llm/ - Model configuration
Database for dynamic settings

Error Handling

Use LLMError type which includes:

Provider-specific errors
Rate limiting errors
API errors
Validation errors

Testing

LLM package is tested with:

Unit tests for core functionality
Integration tests with real APIs
Mocked tests for fast execution
Error handling tests

3.9 KiB Raw Blame History