Curate-Ipsum — Progress Log¶
Last updated: 2026-02-09
Project Goal¶
Curate-ipsum is a mutation testing orchestration MCP server that bridges LLM-generated code and formally verified patches. It uses graph-spectral decomposition, belief revision (AGM theory), and a CEGIS/CEGAR synthesis loop to transform mutation testing from a quality metric into the foundation of a verified code synthesis pipeline.
Primary Environment¶
Python 3.11+ on any POSIX-compatible system
Dependencies:
py-brs>=2.0.0,pydantic>=2.0,mcp>=1.0.0Optional:
scipy>=1.10,networkx>=3.0(graph extras),z3-solver,sympy(Phase 5+)
Companion Repositories¶
py-brs (
github.com/egoughnour/brs, PyPI:py-brs): AGM-compliant belief revision library. v2.0.0 released with contraction + entrenchment. Import asbrs.curate-ipsum (
github.com/egoughnour/curate-ipsum): This repository.
Architecture Overview¶
┌──────────────────────────────────────────────────────────┐
│ MCP Interface │
│ (42 tools registered) │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Mutation │ │ Graph │ │ Symbolic │ │
│ │ Parsers │ │ Spectral │ │ Execution │ │
│ │ (M1 ✓) │ │ (M2 ✓) │ │ (Phase 6) │ │
│ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Belief Revision Engine (Phase 4) │ │
│ │ py-brs: AGM theory, entrenchment │ │
│ └───────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Synthesis Loop (Phase 5) │ │
│ │ CEGIS + CEGAR + Genetic Algorithm │ │
│ └───────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
Full vision: architectural_vision.md. Decisions: DECISIONS.md.
Current Status¶
M1: Multi-Framework Foundation ✓ (Complete)¶
AMENDED 2026-02-08: All 5 mutation framework parsers implemented. M1 exit criteria met: “Run any Python mutation tool through single MCP interface.”
Item |
Status |
File(s) |
Notes |
|---|---|---|---|
MCP server infrastructure |
✓ |
|
FastMCP-based, 35 tools |
Stryker report parsing |
✓ |
|
JavaScript mutation tool |
mutmut parser |
✓ |
|
Python mutation tool (SQLite cache) |
Run history + PID metrics |
✓ |
|
Precision/completeness tracking |
Flexible region model |
✓ |
|
Hierarchical: file → class → func → lines |
Framework auto-detection |
✓ |
|
Language + tool detection |
Unified parser interface |
✓ |
|
Routes to correct parser |
cosmic-ray parser |
✓ |
|
JSON dump + SQLite session DB |
poodle parser |
✓ |
|
JSON mutation-testing-report-schema |
universalmutator parser |
✓ |
|
Plain text killed.txt / not-killed.txt |
M2: Graph-Spectral Infrastructure ✓ (Complete)¶
AMENDED 2026-02-08: All 9 steps from
PHASE2_PLAN.mdimplemented. 195 tests passing. Committed asd34b411.
Item |
Status |
File(s) |
Notes |
|---|---|---|---|
Graph models (CallGraph, Node, Edge) |
✓ |
|
Backend-agnostic, serializable |
Call graph extraction (AST) |
✓ |
|
Two-pass: definitions → calls |
ASR extractor (LPython) |
✓ |
|
Optional, requires LPython |
Tarjan SCC detection |
✓ |
|
|
Graph condensation (DAG of SCCs) |
✓ |
|
|
BFS reachability |
✓ |
|
|
Topological sort |
✓ |
|
For DAG ordering |
DOT export |
✓ |
|
Graphviz visualization |
Dependency graph extraction |
✓ |
|
Module-level import graphs |
Laplacian construction |
✓ |
|
Sparse L = D − A, symmetrized |
Fiedler vector computation |
✓ |
|
|
Recursive partitioning |
✓ |
|
Binary tree of Fiedler bipartitions |
Virtual sink/source augmentation |
✓ |
|
|
Hierarchical SCC condensation |
✓ |
|
Alternating condense/partition |
Planar subgraph identification |
✓ |
|
Boyer-Myrvold + Kuratowski |
Kameda O(1) reachability |
✓ |
|
2D dominance labels + BFS fallback |
MCP tools for graph queries |
✓ |
|
5 tools: extract, partition, reach, hierarchy, find |
M3: Belief Revision Engine ✓ (Complete)¶
AMENDED 2026-02-08: All M3 items implemented. Exit criteria met: “Track belief evolution across synthesis attempts with full provenance.” 32 MCP tools total (8 new M3 tools)
Item |
Status |
File(s) |
Notes |
|---|---|---|---|
py-brs library (AGM core) |
✓ |
PyPI |
v2.0.0 released |
Evidence adapter (mutation→belief) |
✓ |
|
Mutation results → BRS Evidence |
Theory manager |
✓ |
|
High-level API, provenance wired |
AGM contraction |
✓ |
In py-brs v2.0.0 |
3 strategies: entrenchment, minimal, full_cascade |
Entrenchment calculation |
✓ |
In py-brs v2.0.0 |
|
Typed assertion model |
✓ |
|
6 kinds + contradiction detection → D-010 |
Provenance DAG |
✓ |
|
Append-only causal chain → D-010 |
Rollback mechanism |
✓ |
|
Checkpoints + undo via provenance DAG |
Failure mode analyzer |
✓ |
|
7 failure modes, heuristic classification → D-011 |
MCP tools (8 new) |
✓ |
|
store_evidence, provenance, why_believe, stability, rollback, undo, analyze_failure, world_history |
M4: Synthesis Loop ✓ (Complete)¶
AMENDED 2026-02-08: All M4 items implemented. Exit criteria met: “Generate patch that kills previously-surviving mutant, verified correct.”
Item |
Status |
File(s) |
Notes |
|---|---|---|---|
Synthesis data models |
✓ |
|
Individual, CodePatch, Specification, Counterexample, SynthesisResult |
Abstract LLM client |
✓ |
|
ABC + MockLLMClient + prompt builder |
Cloud LLM backend |
✓ |
|
Anthropic + OpenAI via httpx → D-012 |
Local LLM backend |
✓ |
|
Ollama HTTP API → D-012 |
Population management |
✓ |
|
Elite/tournament selection, add/remove |
Fitness evaluation |
✓ |
|
CE avoidance + spec satisfaction - complexity → D-013 |
AST-aware crossover |
✓ |
|
Subtree swap, directed mutation |
Entropy manager |
✓ |
|
Shannon entropy, diversity injection |
CEGIS engine |
✓ |
|
Full loop: LLM → GA → verify → CE feedback |
MCP tools (4 new) |
✓ |
|
synthesize_patch, synthesis_status, cancel_synthesis, list_synthesis_runs |
M6: Graph Persistence + RAG (Complete)¶
AMENDED 2026-02-09: Graph persistence + RAG layer fully implemented. 70+ new tests, 42 MCP tools total. Storage package with SQLite (primary) + Kuzu (optional) backends, incremental update engine, synthesis persistence, vector store ABC with ChromaDB, and RAG pipeline.
Item |
Status |
File(s) |
Notes |
|---|---|---|---|
Abstract GraphStore ABC |
✓ |
|
Factory pattern mirrors D-012 → D-014 |
SQLite graph store (primary) |
✓ |
|
7 tables, WAL mode, zero deps |
Kuzu graph store (optional) |
✓ |
|
Cypher queries, embedded graph DB |
Synthesis result persistence |
✓ |
|
JSONL append-only, multi-project |
Kameda index persistence |
✓ |
|
O(1) reachability survives restart |
Fiedler partition persistence |
✓ |
|
Materialized path encoding |
Incremental update engine |
✓ |
|
SHA-256 file hashing → D-015 |
MCP tools (3 new) |
✓ |
|
incremental_update, persistent_graph_stats, graph_query |
Server wiring |
✓ |
|
extract_call_graph, compute_partitioning, synthesize_patch persist automatically |
Code embedding model |
✓ |
|
sentence-transformers (all-MiniLM-L6-v2, 384-dim) |
Vector store ABC |
✓ |
|
Abstract interface → D-017 |
ChromaDB backend |
✓ |
|
Embedded + HTTP client modes |
RAG pipeline |
✓ |
|
Vector top-k + graph expansion decay |
CEGIS + RAG integration |
✓ |
|
Optional |
MCP tools (3 new) |
✓ |
|
store_code_embeddings, rag_search, retrieve_context |
Phases 5, 7–8: Not Started¶
Verification Backends, Production Hardening. See ROADMAP.md for details.
What’s Next¶
RAG + Embeddings (M6 Follow-Up)¶
Exit criteria: Natural language queries over codebase with graph-backed retrieval.
Key tasks: Code embedding model, vector index, semantic search, text-to-Cypher pipeline.
M5: Verification Backends¶
Exit criteria: Verify patch correctness against specification with proof certificate.
Key tasks: Z3 integration, CEGAR abstraction levels, SymPy path conditions, KLEE container. See ROADMAP.md for details.
Known Limitations & Open Questions¶
LPython optional: ASR extractor requires LPython which is alpha-status. AST extractor is always available. See
docs/lpython_klee_feasibility.md.→ D-001~~Fiedler on disconnected graphs: Need to handle disconnected components separately.~~ Resolved in
graph/spectral.pyvia per-component Fiedler.→ D-004~~Planarity NP-hard: Maximal planar subgraph identification is NP-hard in general.~~ Mitigated via iterative edge removal heuristic in
graph/planarity.py.→ D-006~~scipy not yet in dependencies~~ Resolved: Added as
[graph]optional dependency.→ D-007
Test Suite Summary¶
Test File |
Count |
Covers |
|---|---|---|
|
25 |
Region model parsing, containment, overlap |
|
25 |
Stryker + mutmut parsing, detection |
|
62 |
cosmic-ray, poodle, universalmutator parsers |
|
26 |
AST extractor, call resolution |
|
15 |
BRS evidence adapter integration |
|
41 |
Laplacian, Fiedler, partitioner, virtual nodes |
|
54 |
Planarity, Kameda reachability, BFS verification |
|
48 |
Hierarchy, dependency extractor, imports |
|
26 |
MCP graph tools end-to-end pipeline |
|
38 |
Typed assertions, serialization, contradiction detection |
|
30 |
Provenance DAG, event recording, path queries |
|
12 |
Rollback, checkpoints, undo operations |
|
38 |
Failure classification, overfitting, contraction suggestions |
|
9 |
M3 full lifecycle: evidence → assertion → provenance → rollback |
|
28 |
Synthesis data models, config validation |
|
18 |
Mock/cloud/local LLM clients, prompt building |
|
27 |
Population, fitness, AST operators, entropy |
|
8 |
CEGIS engine, cancellation, timeout |
|
11 |
M4 full pipeline end-to-end |
|
10 |
Synthesis store JSONL persistence |
|
25 |
SQLite graph store round-trips, queries |
|
12 |
Kuzu graph store (skipped if kuzu not installed) |
|
15 |
Incremental update engine, change detection |
|
7 |
M6 full pipeline: persist → query → update |
Total |
616 passed, 1 pre-existing failure, 1 skipped |
File Inventory¶
curate-ipsum/
├── server.py # MCP server entry point (35 tools)
├── tools.py # Async test/mutation execution layer
├── models.py # Pydantic data models (MutationRunResult, etc.)
├── config.toml # Server configuration
├── pyproject.toml # Package metadata + dependencies
├── graph/
│ ├── __init__.py # Public API + optional dependency flags
│ ├── models.py # CallGraph, GraphNode, GraphEdge, SCC, condensation
│ ├── extractor.py # Abstract base class for extractors
│ ├── ast_extractor.py # Python AST-based call graph extraction
│ ├── asr_extractor.py # LPython ASR-based extraction (optional)
│ ├── dependency_extractor.py # Module-level import graph extraction
│ ├── spectral.py # Laplacian + Fiedler vector computation
│ ├── partitioner.py # Recursive Fiedler partitioning + virtual nodes
│ ├── hierarchy.py # Alternating condense/partition tree
│ ├── planarity.py # Boyer-Myrvold planarity + Kuratowski
│ └── kameda.py # O(1) reachability index (2D dominance)
├── parsers/
│ ├── __init__.py # Unified parser interface (routes 5 frameworks)
│ ├── detection.py # Framework + language auto-detection
│ ├── stryker_parser.py # Stryker JSON report parser
│ ├── mutmut_parser.py # mutmut SQLite cache parser
│ ├── cosmic_ray_parser.py # cosmic-ray JSON dump + SQLite parser
│ ├── poodle_parser.py # poodle JSON mutation-testing-report parser
│ └── universalmutator_parser.py # universalmutator text file parser
├── regions/
│ └── models.py # Region, RegionLevel (file/class/func/lines)
├── adapters/
│ └── evidence_adapter.py # Mutation results → BRS beliefs
├── theory/
│ ├── __init__.py # Package with submodule listing
│ ├── manager.py # Theory manager (provenance + rollback wired)
│ ├── assertions.py # Typed assertion model + contradiction detection
│ ├── provenance.py # Append-only provenance DAG
│ ├── rollback.py # Rollback manager + checkpoints
│ └── failure_analyzer.py # Heuristic failure classification
├── synthesis/
│ ├── __init__.py # Public API + optional dependency flag
│ ├── models.py # Individual, CodePatch, Specification, SynthesisResult
│ ├── llm_client.py # LLMClient ABC + MockLLMClient + prompt builder
│ ├── cloud_llm.py # Cloud LLM (Anthropic/OpenAI) via httpx
│ ├── local_llm.py # Local LLM (Ollama) via httpx
│ ├── population.py # GA population management
│ ├── fitness.py # Fitness evaluation (CE + spec - complexity)
│ ├── ast_operators.py # AST crossover + directed mutation
│ ├── entropy.py # Shannon entropy + diversity injection
│ └── cegis.py # CEGIS engine (main synthesis loop)
├── storage/
│ ├── __init__.py # Package init, exports
│ ├── synthesis_store.py # JSONL persistence for synthesis results
│ ├── graph_store.py # Abstract GraphStore ABC + factory → D-014
│ ├── sqlite_graph_store.py # SQLite backend (primary, zero deps)
│ ├── kuzu_graph_store.py # Kuzu backend (optional, Cypher queries)
│ └── incremental.py # File hash tracking + delta updates → D-015
├── rag/
│ ├── __init__.py # RAG package init
│ ├── embedding_provider.py # EmbeddingProvider ABC + sentence-transformers → D-017
│ ├── vector_store.py # VectorStore ABC + ChromaDB backend → D-017
│ ├── chroma_vector_store.py # ChromaDB implementation (embedded/HTTP)
│ └── search.py # RAG pipeline (vector + graph expansion)
├── verification/
│ ├── __init__.py # Verification package init
│ ├── backend.py # VerificationBackend ABC → D-016
│ ├── backends/
│ │ ├── __init__.py
│ │ ├── z3_backend.py # Z3 constraint solving backend
│ │ ├── angr_docker.py # angr symbolic execution via Docker
│ │ └── mock_backend.py # Mock verification for testing
│ └── orchestrator.py # VerificationOrchestrator with CEGAR budget escalation → D-016
├── tests/
│ ├── test_m1_regions.py # Region model tests
│ ├── test_m1_parsers.py # Parser tests (Stryker + mutmut)
│ ├── test_new_parsers.py # cosmic-ray, poodle, universalmutator tests
│ ├── test_graph_extraction.py # AST extractor tests
│ ├── test_brs_integration.py # BRS integration tests
│ ├── test_assertions.py # Typed assertions + contradiction detection
│ ├── test_provenance.py # Provenance DAG tests
│ ├── test_rollback.py # Rollback + checkpoint tests
│ ├── test_failure_analyzer.py # Failure classification tests
│ ├── test_m3_end_to_end.py # M3 full lifecycle E2E
│ ├── test_spectral.py # Laplacian/Fiedler/partitioner tests
│ ├── test_planarity_kameda.py # Planarity + Kameda tests
│ ├── test_hierarchy_deps.py # Hierarchy + dependency tests
│ ├── test_mcp_graph.py # MCP graph tool integration tests
│ ├── test_synthesis_models.py # Synthesis data model tests
│ ├── test_llm_client.py # LLM client tests
│ ├── test_genetic_operators.py # GA operator tests
│ ├── test_cegis.py # CEGIS engine tests
│ ├── test_m4_end_to_end.py # M4 full pipeline E2E
│ ├── test_synthesis_store.py # Synthesis JSONL store tests
│ ├── test_sqlite_graph_store.py # SQLite graph store tests
│ ├── test_kuzu_graph_store.py # Kuzu graph store tests (skip if no kuzu)
│ ├── test_incremental.py # Incremental update engine tests
│ ├── test_m6_end_to_end.py # M6 full pipeline E2E
│ ├── test_embedding_provider.py # Embedding provider tests
│ ├── test_vector_store.py # Vector store ABC + ChromaDB tests
│ ├── test_rag_pipeline.py # RAG search + graph expansion tests
│ ├── test_verification_backend.py # Verification backend ABC tests
│ ├── test_z3_backend.py # Z3 backend constraint solving tests
│ ├── test_angr_docker.py # angr Docker backend tests
│ ├── test_verification_orchestrator.py # CEGAR orchestrator tests
│ └── test_m5_m6_complete.py # M5 + M6 complete integration tests
├── docs/
│ ├── m1_m3_audit.md # M1-M3 audit findings
│ └── lpython_klee_feasibility.md # LPython/KLEE feasibility study
├── README.md # Project overview + roadmap
├── ROADMAP.md # Full milestone tracker (M1–M7)
├── CONTEXT.md # Directory structure + naming conventions
├── DOCS_INDEX.md # Documentation navigation guide
├── PROGRESS.md # ← You are here
├── DECISIONS.md # Architectural decision log (D-001 through D-015)
├── PHASE2_PLAN.md # Phase 2 implementation plan (complete)
├── architectural_vision.md # Graph-spectral framework theory
├── synthesis_framework.md # CEGIS/CEGAR/genetic approach
├── belief_revision_framework.md # AGM theory + provenance
├── m1_multi_framework_plan.md # Phase 1 implementation plan (done)
├── brs_integration_plan.md # BRS ↔ curate-ipsum mapping
├── brs_v2_refactoring_plan.md # py-brs v2.0.0 plan
├── brs_contract_pr.md # AGM contraction spec
├── brs_cicd.md # CI/CD pipeline docs
├── summary.md # Functionality catalog
├── potential_directions.md # Enhancement ideas
├── synergies.md # Tool ecosystem integration
└── inferred_goals.md # Evidence-based goal hierarchy
Revision History¶
v1.0 (2026-02-08): Initial PROGRESS.md created from comprehensive codebase audit. Phase 1 complete, Phase 2 active.
v2.0 (2026-02-08): Phase 2 (M2) complete — all 9 steps implemented (195 tests). M1 remaining parsers now active focus. Updated architecture diagram, file inventory, test summary, and known limitations.
v3.0 (2026-02-08): M1 ✅ complete (3 new parsers: cosmic-ray, poodle, universalmutator). M3 ✅ complete (assertions, provenance DAG, rollback, failure analyzer, 8 new MCP tools). 468 tests passing. Updated all inventories.
v4.0 (2026-02-08): M4 ✅ complete (synthesis loop: CEGIS + genetic algorithm + LLM client). 560 tests passing. 10 new synthesis files, 5 new test files, 4 new MCP tools.
v5.0 (2026-02-08): M6 🟡 partial (graph persistence: SQLite + Kuzu backends, incremental updates, synthesis persistence). 616 tests passing. 6 new storage files, 5 new test files, 3 new MCP tools (35 total). RAG deferred.
v6.0 (2026-02-09): M5 ✅ complete (verification backend ABC with Z3, angr Docker, mock backends + CEGAR orchestrator). M6 deferred RAG items ✅ complete (Chroma vector store, embedding provider, RAG pipeline, 7 new MCP tools). 700+ tests passing. 9 new verification files, 5 new RAG files, 10 new test files. 42 MCP tools total.