sifaka

ADR-003: Memory Management and Result Storage

Status

Accepted

Context

Sifaka processes text through multiple iterations, with each iteration generating:

This can lead to significant memory usage, especially for:

We need a strategy to manage memory efficiently while maintaining functionality.

Decision

We will implement a bounded memory management system with configurable limits and intelligent cleanup.

# Configuration
config = Config(
    max_generations=10,      # Keep last 10 text versions
    max_critiques=50,        # Keep last 50 critiques
    max_validations=20,      # Keep last 20 validation results
    memory_limit_mb=100      # Total memory limit
)

# Automatic cleanup
result = await improve("text", config=config)

Memory Management Strategy

1. Bounded Collections

Use fixed-size collections that automatically evict old items:

2. Lazy Loading

3. Garbage Collection

4. Storage Backends

Multiple storage options for different use cases:

Implementation Details

Bounded Result Storage

class SifakaResult:
    def __init__(self, max_generations=10, max_critiques=50):
        self.generations = deque(maxlen=max_generations)
        self.critiques = deque(maxlen=max_critiques)
        self.validations = deque(maxlen=20)

Memory Monitoring

class MemoryMonitor:
    def __init__(self, limit_mb=100):
        self.limit_bytes = limit_mb * 1024 * 1024

    def check_memory_usage(self, result: SifakaResult):
        if self.get_memory_usage() > self.limit_bytes:
            self.cleanup_old_data(result)

Storage Abstraction

class StorageBackend(ABC):
    @abstractmethod
    async def save(self, result: SifakaResult) -> str:
        pass

    @abstractmethod
    async def load(self, result_id: str) -> SifakaResult:
        pass

Configuration Options

Memory Limits

config = Config(
    # Collection size limits
    max_generations=10,
    max_critiques=50,
    max_validations=20,

    # Memory limits
    memory_limit_mb=100,
    gc_threshold=0.8,  # Trigger cleanup at 80% usage

    # Storage options
    storage_backend="memory",  # "memory", "file", "redis"
    persistent_storage=True,
)

Cleanup Strategies

Consequences

Positive

Negative

Mitigation

Storage Backend Comparison

Backend Speed Persistence Scalability Use Case
Memory Fast No Limited Development, testing
File Medium Yes Medium Single-user, persistence
Redis Fast Yes High Multi-user, production
Database Medium Yes High Enterprise, analytics

Memory Optimization Techniques

1. Lazy Evaluation

# Don't compute expensive metrics until needed
@property
def similarity_score(self):
    if not hasattr(self, '_similarity_score'):
        self._similarity_score = self._compute_similarity()
    return self._similarity_score

2. Weak References

# Use weak references for cached objects
import weakref
self._cache = weakref.WeakValueDictionary()

3. Compression

# Compress large text objects
import gzip
self.compressed_text = gzip.compress(text.encode('utf-8'))