Synthesis

CEGIS engine, genetic algorithm, LLM clients, fitness evaluation, and entropy monitoring.

Synthesis package: CEGIS-based code patch synthesis with genetic algorithm evolution.

This package implements M4 of the curate-ipsum roadmap — the synthesis loop that transforms LLM-generated code candidates into verified patches via counterexample-guided inductive synthesis (CEGIS) and genetic algorithm population management.

Optional dependency: httpx (for cloud/local LLM backends). Core synthesis (models, fitness, AST operators) uses only stdlib.

class curate_ipsum.synthesis.CodePatch(code, source=PatchSource.LLM, diff='', region_id='', original_code='', metadata=<factory>)[source]

Bases: object

A code patch produced by synthesis.

Parameters:
code: str
source: PatchSource = 'llm'
diff: str = ''
region_id: str = ''
original_code: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

class curate_ipsum.synthesis.Counterexample(id=<factory>, input_values=<factory>, expected_output=None, actual_output=None, mutant_id='', error_message='', test_command='', metadata=<factory>)[source]

Bases: object

A counterexample that a candidate patch fails on.

Parameters:
id: str
input_values: dict[str, Any]
expected_output: Any = None
actual_output: Any = None
mutant_id: str = ''
error_message: str = ''
test_command: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

class curate_ipsum.synthesis.Individual(id=<factory>, code='', fitness=0.0, lineage=<factory>, generation=0, source=PatchSource.SEED, metadata=<factory>)[source]

Bases: object

A candidate patch in the genetic algorithm population.

Parameters:
id: str
code: str = ''
fitness: float = 0.0
lineage: list[str]
generation: int = 0
source: PatchSource = 'seed'
metadata: dict[str, Any]
is_valid()[source]

Check if the code is syntactically valid Python.

Return type:

bool

class curate_ipsum.synthesis.PatchSource(*values)[source]

Bases: StrEnum

How a code patch was produced.

LLM = 'llm'
CROSSOVER = 'crossover'
MUTATION = 'mutation'
SEED = 'seed'
class curate_ipsum.synthesis.Specification(target_region='', original_code='', surviving_mutant_ids=<factory>, test_commands=<factory>, mutation_command='', working_directory='', assertion_ids=<factory>, preconditions=<factory>, postconditions=<factory>, context_code='', metadata=<factory>)[source]

Bases: object

What a synthesized patch must satisfy.

Parameters:
target_region: str = ''
original_code: str = ''
surviving_mutant_ids: list[str]
test_commands: list[str]
mutation_command: str = ''
working_directory: str = ''
assertion_ids: list[str]
preconditions: list[str]
postconditions: list[str]
context_code: str = ''
metadata: dict[str, Any]
class curate_ipsum.synthesis.SynthesisConfig(population_size=20, max_iterations=100, mutation_rate=0.3, crossover_rate=0.7, elite_ratio=0.1, entropy_threshold=1.0, llm_backend='mock', llm_model='codellama:7b', temperature=0.8, top_k=10, ce_weight=0.4, spec_weight=0.5, complexity_weight=0.1, test_timeout_seconds=30.0, synthesis_timeout_seconds=300.0)[source]

Bases: object

Configuration for a synthesis run.

Parameters:
  • population_size (int)

  • max_iterations (int)

  • mutation_rate (float)

  • crossover_rate (float)

  • elite_ratio (float)

  • entropy_threshold (float)

  • llm_backend (str)

  • llm_model (str)

  • temperature (float)

  • top_k (int)

  • ce_weight (float)

  • spec_weight (float)

  • complexity_weight (float)

  • test_timeout_seconds (float)

  • synthesis_timeout_seconds (float)

population_size: int = 20
max_iterations: int = 100
mutation_rate: float = 0.3
crossover_rate: float = 0.7
elite_ratio: float = 0.1
entropy_threshold: float = 1.0
llm_backend: str = 'mock'
llm_model: str = 'codellama:7b'
temperature: float = 0.8
top_k: int = 10
ce_weight: float = 0.4
spec_weight: float = 0.5
complexity_weight: float = 0.1
test_timeout_seconds: float = 30.0
synthesis_timeout_seconds: float = 300.0
class curate_ipsum.synthesis.SynthesisResult(id=<factory>, status=SynthesisStatus.FAILED, patch=None, iterations=0, counterexamples_resolved=0, duration_ms=0, fitness_history=<factory>, final_entropy=0.0, total_candidates_evaluated=0, error_message='', metadata=<factory>)[source]

Bases: object

Outcome of a synthesis run.

Parameters:
id: str
status: SynthesisStatus = 'failed'
patch: CodePatch | None = None
iterations: int = 0
counterexamples_resolved: int = 0
duration_ms: int = 0
fitness_history: list[float]
final_entropy: float = 0.0
total_candidates_evaluated: int = 0
error_message: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

class curate_ipsum.synthesis.SynthesisStatus(*values)[source]

Bases: StrEnum

Outcome of a synthesis run.

SUCCESS = 'success'
FAILED = 'failed'
TIMEOUT = 'timeout'
CANCELLED = 'cancelled'

Core data models for the synthesis loop.

All models use Pydantic for validation and serialization, matching the project’s existing pattern (see models.py in the root).

class curate_ipsum.synthesis.models.SynthesisStatus(*values)[source]

Bases: StrEnum

Outcome of a synthesis run.

SUCCESS = 'success'
FAILED = 'failed'
TIMEOUT = 'timeout'
CANCELLED = 'cancelled'
class curate_ipsum.synthesis.models.PatchSource(*values)[source]

Bases: StrEnum

How a code patch was produced.

LLM = 'llm'
CROSSOVER = 'crossover'
MUTATION = 'mutation'
SEED = 'seed'
class curate_ipsum.synthesis.models.LLMBackend(*values)[source]

Bases: StrEnum

Which LLM backend to use.

CLOUD = 'cloud'
LOCAL = 'local'
MOCK = 'mock'
class curate_ipsum.synthesis.models.SynthesisConfig(population_size=20, max_iterations=100, mutation_rate=0.3, crossover_rate=0.7, elite_ratio=0.1, entropy_threshold=1.0, llm_backend='mock', llm_model='codellama:7b', temperature=0.8, top_k=10, ce_weight=0.4, spec_weight=0.5, complexity_weight=0.1, test_timeout_seconds=30.0, synthesis_timeout_seconds=300.0)[source]

Bases: object

Configuration for a synthesis run.

Parameters:
  • population_size (int)

  • max_iterations (int)

  • mutation_rate (float)

  • crossover_rate (float)

  • elite_ratio (float)

  • entropy_threshold (float)

  • llm_backend (str)

  • llm_model (str)

  • temperature (float)

  • top_k (int)

  • ce_weight (float)

  • spec_weight (float)

  • complexity_weight (float)

  • test_timeout_seconds (float)

  • synthesis_timeout_seconds (float)

population_size: int = 20
max_iterations: int = 100
mutation_rate: float = 0.3
crossover_rate: float = 0.7
elite_ratio: float = 0.1
entropy_threshold: float = 1.0
llm_backend: str = 'mock'
llm_model: str = 'codellama:7b'
temperature: float = 0.8
top_k: int = 10
ce_weight: float = 0.4
spec_weight: float = 0.5
complexity_weight: float = 0.1
test_timeout_seconds: float = 30.0
synthesis_timeout_seconds: float = 300.0
class curate_ipsum.synthesis.models.Individual(id=<factory>, code='', fitness=0.0, lineage=<factory>, generation=0, source=PatchSource.SEED, metadata=<factory>)[source]

Bases: object

A candidate patch in the genetic algorithm population.

Parameters:
id: str
code: str = ''
fitness: float = 0.0
lineage: list[str]
generation: int = 0
source: PatchSource = 'seed'
metadata: dict[str, Any]
is_valid()[source]

Check if the code is syntactically valid Python.

Return type:

bool

class curate_ipsum.synthesis.models.CodePatch(code, source=PatchSource.LLM, diff='', region_id='', original_code='', metadata=<factory>)[source]

Bases: object

A code patch produced by synthesis.

Parameters:
code: str
source: PatchSource = 'llm'
diff: str = ''
region_id: str = ''
original_code: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

class curate_ipsum.synthesis.models.Specification(target_region='', original_code='', surviving_mutant_ids=<factory>, test_commands=<factory>, mutation_command='', working_directory='', assertion_ids=<factory>, preconditions=<factory>, postconditions=<factory>, context_code='', metadata=<factory>)[source]

Bases: object

What a synthesized patch must satisfy.

Parameters:
target_region: str = ''
original_code: str = ''
surviving_mutant_ids: list[str]
test_commands: list[str]
mutation_command: str = ''
working_directory: str = ''
assertion_ids: list[str]
preconditions: list[str]
postconditions: list[str]
context_code: str = ''
metadata: dict[str, Any]
class curate_ipsum.synthesis.models.Counterexample(id=<factory>, input_values=<factory>, expected_output=None, actual_output=None, mutant_id='', error_message='', test_command='', metadata=<factory>)[source]

Bases: object

A counterexample that a candidate patch fails on.

Parameters:
id: str
input_values: dict[str, Any]
expected_output: Any = None
actual_output: Any = None
mutant_id: str = ''
error_message: str = ''
test_command: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

class curate_ipsum.synthesis.models.SynthesisResult(id=<factory>, status=SynthesisStatus.FAILED, patch=None, iterations=0, counterexamples_resolved=0, duration_ms=0, fitness_history=<factory>, final_entropy=0.0, total_candidates_evaluated=0, error_message='', metadata=<factory>)[source]

Bases: object

Outcome of a synthesis run.

Parameters:
id: str
status: SynthesisStatus = 'failed'
patch: CodePatch | None = None
iterations: int = 0
counterexamples_resolved: int = 0
duration_ms: int = 0
fitness_history: list[float]
final_entropy: float = 0.0
total_candidates_evaluated: int = 0
error_message: str = ''
metadata: dict[str, Any]
to_dict()[source]
Return type:

dict[str, Any]

CEGIS Engine: Counterexample-Guided Inductive Synthesis.

Main synthesis loop: 1. Generate initial candidates via LLM 2. Initialize genetic algorithm population 3. Iterate: evaluate → verify → extract counterexample → evolve → check entropy 4. Return verified patch or failure

Integrates with M3 belief revision for provenance tracking and failure analysis. Integrates with M5 verification backends for formal property checking. Integrates with M6-deferred RAG for context-aware prompt building.

class curate_ipsum.synthesis.cegis.CEGISEngine(config, llm_client, theory_manager=None, verification_backend=None, rag_pipeline=None)[source]

Bases: object

Counterexample-Guided Inductive Synthesis engine.

Orchestrates LLM candidate generation, genetic algorithm evolution, and counterexample-driven refinement to produce verified patches.

Parameters:
  • config (SynthesisConfig)

  • llm_client (LLMClient)

  • theory_manager ('TheoryManager' | None)

  • verification_backend ('VerificationBackend' | None)

  • rag_pipeline ('RAGPipeline' | None)

cancel()[source]

Cancel the current synthesis run.

Return type:

None

async synthesize(spec)[source]

Run the full CEGIS loop.

Returns SynthesisResult with status, patch (if successful), and metrics.

Parameters:

spec (Specification)

Return type:

SynthesisResult

AST-aware genetic operators for code synthesis.

Crossover: swap compatible subtrees between two parent ASTs. Mutation: directed modifications guided by counterexample analysis.

All operators validate output via ast.parse() — invalid results are discarded.

class curate_ipsum.synthesis.ast_operators.ASTCrossover[source]

Bases: object

AST-aware crossover between parent patches.

crossover(parent1, parent2, generation=0)[source]

Swap compatible subtrees between two parents.

Returns two children, or (None, None) if crossover fails. Both children are validated for syntactic correctness.

Parameters:
Return type:

tuple[Individual | None, Individual | None]

class curate_ipsum.synthesis.ast_operators.ASTMutator[source]

Bases: object

Directed mutation operators guided by counterexample analysis.

Operators: - constant_tweak: modify numeric/string constants - operator_swap: replace +/- with -/+, </> with >/< etc. - guard_insertion: add if-checks for edge cases - branch_flip: swap if/else branches - argument_reorder: shuffle function arguments

OPERATORS = ['constant_tweak', 'operator_swap', 'guard_insertion', 'branch_flip']
mutate(individual, generation=0, counterexample=None)[source]

Apply a single mutation operator.

If a counterexample is provided, select the most relevant operator. Otherwise, pick randomly.

Parameters:
Return type:

Individual | None

Population management for the genetic algorithm.

Handles individual selection, replacement, and population-level operations. The population is the mutable state of the GA loop — it evolves across iterations.

class curate_ipsum.synthesis.population.Population(individuals=None)[source]

Bases: object

Manages a population of candidate patches for genetic evolution.

Parameters:

individuals (list[Individual] | None)

classmethod from_candidates(candidates, generation=0, source=PatchSource.LLM)[source]

Initialize population from raw code strings (e.g., LLM outputs).

Parameters:
Return type:

Population

property individuals: list[Individual]
property best: Individual | None

Return the individual with highest fitness, or None if empty.

property average_fitness: float
select_elite(n)[source]

Select top-n individuals by fitness.

Parameters:

n (int)

Return type:

list[Individual]

tournament_select(n, k=3)[source]

Select n individuals via k-tournament selection.

For each selection: pick k random individuals, keep the fittest.

Parameters:
Return type:

list[Individual]

add_individual(individual)[source]
Parameters:

individual (Individual)

Return type:

None

add_individuals(individuals)[source]
Parameters:

individuals (list[Individual])

Return type:

None

remove_weakest(n)[source]

Remove and return the n weakest individuals.

Parameters:

n (int)

Return type:

list[Individual]

replace_with(new_generation)[source]

Replace entire population with a new generation.

Parameters:

new_generation (list[Individual])

Return type:

None

size()[source]
Return type:

int

Fitness evaluation for synthesis candidates.

Fitness = (ce_weight * CE_avoidance) + (spec_weight * spec_satisfaction) - (complexity_weight * complexity)

CE avoidance: fraction of known counterexamples NOT triggered. Spec satisfaction: fraction of test commands that pass. Complexity penalty: AST node count / 100.

Uses tools.py::run_command() for test execution. Decision: D-013 — fitness function formula.

class curate_ipsum.synthesis.fitness.FitnessEvaluator(config)[source]

Bases: object

Evaluates candidate fitness against specification and counterexamples.

Parameters:

config (SynthesisConfig)

async evaluate(individual, spec, counterexamples)[source]

Compute fitness score for an individual.

Returns float in range [0, 1] (approximately; complexity penalty can push below 0).

Parameters:
Return type:

float

async evaluate_population(individuals, spec, counterexamples)[source]

Evaluate fitness for all individuals in a population.

Parameters:
Return type:

None

Entropy-aware diversity maintenance for the genetic algorithm.

Monitors Shannon entropy of the population’s structural features. When entropy drops below threshold (premature convergence), injects diversity by requesting novel candidates from the LLM client.

No sklearn dependency — uses simple binning for clustering.

class curate_ipsum.synthesis.entropy.EntropyManager(config)[source]

Bases: object

Monitor and maintain population diversity.

Parameters:

config (SynthesisConfig)

compute_entropy(individuals)[source]

Compute Shannon entropy over structural feature clusters.

High entropy = diverse population (good). Low entropy = convergence, possibly premature (needs injection).

Returns entropy in bits. Max = log2(n) for n individuals.

Parameters:

individuals (list[Individual])

Return type:

float

needs_injection(individuals)[source]

Check if population entropy is below threshold.

Parameters:

individuals (list[Individual])

Return type:

bool

select_for_replacement(individuals, n)[source]

Select indices of the n most similar individuals for replacement.

Strategy: find the most common feature bin, select the n lowest-fitness individuals from that bin.

Parameters:
Return type:

list[int]

Abstract LLM client for code synthesis.

Defines the LLMClient ABC and MockLLMClient for testing. Cloud and local backends are in separate modules (cloud_llm.py, local_llm.py).

Design decision: mirrors D-001’s dual extractor pattern — abstract base class with multiple concrete backends selectable at runtime.

class curate_ipsum.synthesis.llm_client.LLMClient[source]

Bases: ABC

Abstract base class for LLM code generation backends.

abstractmethod async generate_candidates(prompt, n=5, temperature=0.8)[source]

Generate n code candidate strings from the LLM.

Returns raw code strings (not parsed). Caller is responsible for syntactic validation.

Parameters:
Return type:

list[str]

async close()[source]

Clean up resources (e.g., HTTP clients). Override if needed.

Return type:

None

class curate_ipsum.synthesis.llm_client.MockLLMClient(responses=None)[source]

Bases: LLMClient

Mock LLM client for testing.

Returns canned responses or generates simple variants of a template.

Parameters:

responses (list[str] | None)

async generate_candidates(prompt, n=5, temperature=0.8)[source]

Generate n code candidate strings from the LLM.

Returns raw code strings (not parsed). Caller is responsible for syntactic validation.

Parameters:
Return type:

list[str]

property call_count: int
curate_ipsum.synthesis.llm_client.build_synthesis_prompt(spec, counterexamples=None, context_code='')[source]

Build an LLM prompt for code synthesis.

Includes: - The original code being replaced - Test requirements - Surviving mutant information - Counterexample history (what previous attempts failed on) - Preconditions and postconditions from M3 assertions

Parameters:
Return type:

str

Cloud LLM client: Anthropic (Claude) and OpenAI (GPT) backends.

Uses httpx for async HTTP. API key from environment variable CURATE_IPSUM_LLM_API_KEY or passed directly.

Decision: D-012 — abstract LLM client with cloud/local/mock backends.

class curate_ipsum.synthesis.cloud_llm.CloudLLMClient(api_key=None, provider='anthropic', model='claude-sonnet-4-5-20250929', base_url=None, max_retries=3, requests_per_second=5.0)[source]

Bases: LLMClient

Cloud LLM backend using Anthropic or OpenAI APIs.

Supports: - anthropic: Claude models via messages API - openai: GPT models via chat completions API

Parameters:
  • api_key (str | None)

  • provider (str)

  • model (str)

  • base_url (str | None)

  • max_retries (int)

  • requests_per_second (float)

async generate_candidates(prompt, n=5, temperature=0.8)[source]

Generate n candidates by making n API calls (one per candidate).

Parameters:
Return type:

list[str]

property total_cost_estimate: float
async close()[source]

Clean up resources (e.g., HTTP clients). Override if needed.

Return type:

None

Local LLM client: Ollama backend.

Connects to a locally running Ollama instance at http://localhost:11434. Default model: codellama:7b.

Decision: D-012 — abstract LLM client with cloud/local/mock backends.

class curate_ipsum.synthesis.local_llm.LocalLLMClient(base_url='http://localhost:11434', model='codellama:7b', timeout=120.0)[source]

Bases: LLMClient

Local LLM backend using Ollama’s HTTP API.

Requires Ollama to be running locally: https://ollama.ai Default model: codellama:7b (good for code generation, runs on 8GB+ GPU).

Parameters:
async is_available()[source]

Check if Ollama is running and the model is available.

Return type:

bool

async generate_candidates(prompt, n=5, temperature=0.8)[source]

Generate n code candidate strings from the LLM.

Returns raw code strings (not parsed). Caller is responsible for syntactic validation.

Parameters:
Return type:

list[str]

async close()[source]

Clean up resources (e.g., HTTP clients). Override if needed.

Return type:

None