botbook/src/chapter-03/episodic-memory.md
Rodrigo Rodriguez (Pragmatismo) 4b74d0e314 docs: add email-read-pixel parameter and tracking API documentation
- Document email-read-pixel in parameters.md
- Add Email Read Tracking section to email-api.md
- Include all 4 tracking endpoints with examples
- Add database schema for sent_email_tracking
- Include privacy considerations (GDPR/LGPD)
2025-12-04 18:15:18 -03:00

6.5 KiB

Episodic Memory

Episodic memory automatically manages conversation history to stay within LLM token limits while preserving important information through intelligent summarization. This system handles context compaction transparently, ensuring conversations remain coherent without manual intervention.

Overview

Large Language Models have fixed context windows (e.g., 8K, 32K, 128K tokens). Long conversations can exceed these limits, causing truncation or errors. Episodic memory solves this by:

  1. Monitoring conversation length
  2. Summarizing older exchanges when thresholds are reached
  3. Keeping recent messages in full detail
  4. Storing summaries as "episodic memory" for continuity

Configuration

Episodic memory is controlled by parameters in config.csv:

name,value
episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-model,fast
episodic-memory-max-episodes,100
episodic-memory-retention-days,365
episodic-memory-auto-summarize,true

Parameter Reference

Parameter Default Type Description
episodic-memory-enabled true Boolean Enable/disable episodic memory system
episodic-memory-threshold 4 Integer Number of exchanges before compaction triggers
episodic-memory-history 2 Integer Recent exchanges to keep in full detail
episodic-memory-model fast String Model for generating summaries (fast, quality, or model name)
episodic-memory-max-episodes 100 Integer Maximum episode summaries per user
episodic-memory-retention-days 365 Integer Days to retain episode summaries
episodic-memory-auto-summarize true Boolean Automatically summarize when threshold reached

How It Works

Context Compaction Process

  1. Monitor: System tracks message count since last summary
  2. Trigger: When count reaches episodic-memory-threshold, compaction starts
  3. Summarize: Older messages are summarized using the configured LLM
  4. Preserve: Last episodic-memory-history exchanges remain in full
  5. Store: Summary saved with role "episodic" for future context

Example Timeline

With defaults (episodic-memory-threshold=4, episodic-memory-history=2):

Exchange Action Context State
1-2 Normal Messages 1-2 in full
3-4 Normal Messages 1-4 in full
5 Compaction Summary of 1-2 + Messages 3-5 in full
6-7 Normal Summary + Messages 3-7 in full
8 Compaction Summary of 1-5 + Messages 6-8 in full

Automatic Behavior

The system automatically:

  1. Tracks conversation length
  2. Triggers compaction when exchanges exceed episodic-memory-threshold
  3. Summarizes older messages using the configured LLM
  4. Keeps only the last episodic-memory-history exchanges in full
  5. Stores the summary as an "episodic memory" for future context

The scheduler runs every 60 seconds, checking all active sessions and processing those that exceed the threshold.

Tuning Guidelines

High-Context Conversations

For complex discussions requiring more history:

name,value
episodic-memory-history,5
episodic-memory-threshold,10

Token-Constrained Environments

For smaller context windows or cost optimization:

name,value
episodic-memory-history,1
episodic-memory-threshold,2

Disable Compaction

Set threshold to 0 to disable automatic compaction:

name,value
episodic-memory-threshold,0

Extended Retention

For long-term memory across sessions:

name,value
episodic-memory-max-episodes,500
episodic-memory-retention-days,730

Use Case Recommendations

Use Case History Threshold Rationale
FAQ Bot 1 2 Questions are independent
Customer Support 2 4 Some context needed
Technical Discussion 4 8 Complex topics require history
Therapy/Coaching 5 10 Continuity is critical
Long-term Assistant 3 6 Balance memory and context

Token Savings

Compaction significantly reduces token usage:

Scenario Without Compaction With Compaction Savings
10 exchanges ~5,000 tokens ~2,000 tokens 60%
20 exchanges ~10,000 tokens ~3,000 tokens 70%
50 exchanges ~25,000 tokens ~5,000 tokens 80%

Actual savings depend on message length and summary quality.

Summary Storage

Summaries are stored with special role identifiers:

  • Role episodic or compact marks summary messages
  • Summaries include key points from compacted exchanges
  • Original messages are not deleted, just excluded from active context
  • Episodes are searchable for context retrieval across sessions

Benefits

  • Automatic management - No manual intervention needed
  • Token efficiency - Stay within model context limits
  • Context preservation - Important information kept via summaries
  • Relevant context - Recent exchanges kept in full detail
  • Cost savings - Fewer tokens = lower API costs
  • Long-term memory - Episode storage enables recall across sessions

Interaction with Caching

Episodic memory works alongside semantic caching:

  • Caching: Reuses responses for similar queries (see Semantic Caching)
  • Episodic Memory: Manages conversation length over time

Both features reduce costs and improve performance independently.

Best Practices

  1. Start with defaults - Work well for most use cases
  2. Monitor token usage - Adjust if hitting context limits
  3. Consider conversation type - Support vs complex discussion
  4. Test different values - Find optimal balance for your users
  5. Set retention appropriately - Balance memory vs privacy requirements

Troubleshooting

Issue Cause Solution
Context too long Threshold too high Lower episodic-memory-threshold
Lost context History too low Increase episodic-memory-history
Summaries missing info Model limitations Use quality instead of fast
No compaction occurring Threshold is 0 or disabled Set positive threshold, enable feature
Old episodes not deleted Retention too long Lower episodic-memory-retention-days

See Also