sifaka

Sifaka Architecture

This document provides a comprehensive overview of Sifakaโ€™s architecture, design principles, and component relationships.

๐ŸŽฏ Design Philosophy

Sifaka is built around three core principles:

  1. Research-Backed Critique - All improvement methods implement peer-reviewed academic papers
  2. Alpha Software with Production Practices - Memory-bounded and observable operations
  3. Developer Experience - Simple API, clear errors, comprehensive testing

๐Ÿ—๏ธ High-Level Architecture

graph LR
    A["Input Text"] --> B["LLM Model"]
    B --> C["Generated Text"]
    C --> D["Critics & Validators"]
    D --> E{"Needs Improvement?"}
    E -->|Yes| F["Revision Prompt"]
    F --> B
    E -->|No| G["Final Text"]

    C --> H["Thought History"]
    D --> H
    F --> H
    H --> I["File Storage"]

    style A fill:#e1f5fe
    style G fill:#c8e6c9
    style H fill:#fff3e0
    style I fill:#f3e5f5

๐Ÿงฉ Component Architecture

1. User Interface Layer

improve() Function

The single entry point for all text improvement operations.

async def improve(
    text: str,
    *,
    max_iterations: int = 3,
    model: str = "gpt-4o-mini",
    critics: list[str] = ["reflexion"],
    validators: list[Validator] = None,
    temperature: float = 0.7,
    timeout_seconds: int = 300,
    storage: StorageBackend = None
) -> SifakaResult

Design Decisions:

2. Core Engine

SifakaEngine

The central orchestrator that coordinates all components.

Responsibilities:

Key Features:

3. Critique System

Each critic implements a specific research methodology:

Reflexion Critic

Research: Reflexion: Language Agents with Verbal Reinforcement Learning

Approach: Self-reflection on previous outputs to identify and correct mistakes.

Implementation:

class ReflexionCritic(Critic):
    async def critique(self, text: str, result: SifakaResult) -> CritiqueResult:
        # Analyze previous iterations for learning opportunities
        # Generate reflection-based feedback
        # Provide specific improvement suggestions

Constitutional AI Critic

Research: Constitutional AI: Harmlessness from AI Feedback

Approach: Principle-based evaluation against ethical and quality guidelines.

Self-Refine Critic

Research: Self-Refine: Iterative Refinement with Self-Feedback

Approach: Iterative self-improvement through quality-focused feedback.

N-Critics Critic

Research: N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics

Approach: Multi-perspective ensemble evaluation for comprehensive analysis.

Self-RAG Critic

Research: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Approach: Retrieval-augmented critique for factual accuracy verification.

Meta-Rewarding Critic

Research: Meta-Rewarding: Learning to Judge Judges with Self-Generated Meta-Judgments

Approach: Two-stage judgment with meta-evaluation of evaluation quality.

Self-Consistency Critic

Research: Self-Consistency Improves Chain of Thought Reasoning in Language Models

Approach: Multiple independent evaluations with consensus building.

4. Validation System

Abstract Validator Interface

class Validator(ABC):
    @property
    @abstractmethod
    def name(self) -> str:
        pass

    @abstractmethod
    async def validate(self, text: str, result: SifakaResult) -> ValidationResult:
        pass

Built-in Validators

Custom Validators

Users can implement custom validation logic by extending the Validator interface.

5. Storage System

Plugin Architecture

class StorageBackend(ABC):
    @abstractmethod
    async def save(self, result: SifakaResult) -> str:
        pass

    @abstractmethod
    async def load(self, result_id: str) -> Optional[SifakaResult]:
        pass

Built-in Storage

Plugin System

6. Model Provider Abstraction

Unified Interface

All model providers (OpenAI, Anthropic, Google) are accessed through a unified interface that handles:

๐Ÿ”„ Data Flow

1. Request Flow

User Input โ†’ Configuration Validation โ†’ Engine Initialization โ†’ Iteration Loop

2. Iteration Loop

Text Generation โ†’ Critique Evaluation โ†’ Validation Check โ†’ Storage Update โ†’ Continue/Stop Decision

3. Response Flow

Final Result Assembly โ†’ Audit Trail Creation โ†’ Result Return

๐Ÿ›ก๏ธ Error Handling Strategy

Hierarchical Error System

SifakaError
โ”œโ”€โ”€ ConfigurationError      # Invalid parameters
โ”œโ”€โ”€ ModelProviderError      # API failures
โ”œโ”€โ”€ ResourceLimitError     # Resource limits exceeded
โ”œโ”€โ”€ TimeoutError           # Operation timeout
โ”œโ”€โ”€ ValidationError        # Validation failures
โ””โ”€โ”€ CriticError           # Critique failures

Recovery Strategies

๐Ÿ“Š Observability

Complete Audit Trail

Every operation generates a comprehensive audit trail including:

Memory Management

๐Ÿ”Œ Extensibility Points

1. Custom Critics

Implement the Critic interface to add new critique methodologies.

2. Custom Validators

Extend the Validator interface for domain-specific validation.

3. Storage Plugins

Create new storage backends for different persistence needs.

4. Model Providers

Add support for new LLM providers through the unified interface.

๐Ÿš€ Performance Characteristics

Scalability

Efficiency

๐Ÿ”’ Security Considerations

API Key Management

Input Validation

Output Safety

๐Ÿงช Testing Strategy

Comprehensive Coverage

Test Categories

This architecture provides a solid foundation for reliable, scalable, and maintainable text improvement operations while remaining simple to use and extend.