This document provides a comprehensive overview of Sifakaโs architecture, design principles, and component relationships.
Sifaka is built around three core principles:
graph LR
A["Input Text"] --> B["LLM Model"]
B --> C["Generated Text"]
C --> D["Critics & Validators"]
D --> E{"Needs Improvement?"}
E -->|Yes| F["Revision Prompt"]
F --> B
E -->|No| G["Final Text"]
C --> H["Thought History"]
D --> H
F --> H
H --> I["File Storage"]
style A fill:#e1f5fe
style G fill:#c8e6c9
style H fill:#fff3e0
style I fill:#f3e5f5
improve()
FunctionThe single entry point for all text improvement operations.
async def improve(
text: str,
*,
max_iterations: int = 3,
model: str = "gpt-4o-mini",
critics: list[str] = ["reflexion"],
validators: list[Validator] = None,
temperature: float = 0.7,
timeout_seconds: int = 300,
storage: StorageBackend = None
) -> SifakaResult
Design Decisions:
The central orchestrator that coordinates all components.
Responsibilities:
Key Features:
Each critic implements a specific research methodology:
Research: Reflexion: Language Agents with Verbal Reinforcement Learning
Approach: Self-reflection on previous outputs to identify and correct mistakes.
Implementation:
class ReflexionCritic(Critic):
async def critique(self, text: str, result: SifakaResult) -> CritiqueResult:
# Analyze previous iterations for learning opportunities
# Generate reflection-based feedback
# Provide specific improvement suggestions
Research: Constitutional AI: Harmlessness from AI Feedback
Approach: Principle-based evaluation against ethical and quality guidelines.
Research: Self-Refine: Iterative Refinement with Self-Feedback
Approach: Iterative self-improvement through quality-focused feedback.
Research: N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics
Approach: Multi-perspective ensemble evaluation for comprehensive analysis.
Research: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Approach: Retrieval-augmented critique for factual accuracy verification.
Research: Meta-Rewarding: Learning to Judge Judges with Self-Generated Meta-Judgments
Approach: Two-stage judgment with meta-evaluation of evaluation quality.
Research: Self-Consistency Improves Chain of Thought Reasoning in Language Models
Approach: Multiple independent evaluations with consensus building.
class Validator(ABC):
@property
@abstractmethod
def name(self) -> str:
pass
@abstractmethod
async def validate(self, text: str, result: SifakaResult) -> ValidationResult:
pass
Users can implement custom validation logic by extending the Validator interface.
class StorageBackend(ABC):
@abstractmethod
async def save(self, result: SifakaResult) -> str:
pass
@abstractmethod
async def load(self, result_id: str) -> Optional[SifakaResult]:
pass
All model providers (OpenAI, Anthropic, Google) are accessed through a unified interface that handles:
User Input โ Configuration Validation โ Engine Initialization โ Iteration Loop
Text Generation โ Critique Evaluation โ Validation Check โ Storage Update โ Continue/Stop Decision
Final Result Assembly โ Audit Trail Creation โ Result Return
SifakaError
โโโ ConfigurationError # Invalid parameters
โโโ ModelProviderError # API failures
โโโ ResourceLimitError # Resource limits exceeded
โโโ TimeoutError # Operation timeout
โโโ ValidationError # Validation failures
โโโ CriticError # Critique failures
Every operation generates a comprehensive audit trail including:
Implement the Critic
interface to add new critique methodologies.
Extend the Validator
interface for domain-specific validation.
Create new storage backends for different persistence needs.
Add support for new LLM providers through the unified interface.
This architecture provides a solid foundation for reliable, scalable, and maintainable text improvement operations while remaining simple to use and extend.