Curate-Ipsum Roadmap¶
Vision¶
Transform mutation testing from a quality metric into the foundation of a verified code synthesis pipeline where LLM outputs become seeds for formally proven patches.
Current Status¶
Last Updated: 2026-02-09
Milestone |
Status |
Progress |
|---|---|---|
M1: Multi-Framework Foundation |
✅ Complete |
100% (5 parsers, 389 tests) |
M2: Graph-Spectral Infrastructure |
✅ Complete |
100% (195 tests passing) |
M3: Belief Revision Engine |
✅ Complete |
100% (127 new tests, 32 MCP tools) |
M4: Synthesis Loop |
✅ Complete |
100% (92 new tests, 32 MCP tools total) |
M5: Verification Backends |
✅ Complete |
100% (Z3, angr Docker, mock, orchestrator, CEGIS integration) |
M6: Graph Database + RAG |
✅ Complete |
100% (graph persistence + RAG + embeddings) |
M7: Production Hardening |
⚪ Not Started |
0% |
Milestones¶
M1: Multi-Framework Foundation (Q1)¶
Goal: Unified interface across mutation testing tools
Task |
Status |
Complexity |
Dependencies |
|---|---|---|---|
Flexible region model |
✅ Done |
Medium |
- |
Stryker parser extraction |
✅ Done |
Low |
- |
mutmut parser |
✅ Done |
Low |
- |
Framework auto-detection |
✅ Done |
Low |
- |
Unified parser interface |
✅ Done |
Low |
All parsers |
cosmic-ray parser |
✅ Done |
Medium |
- |
poodle parser |
✅ Done |
Low |
- |
universalmutator parser |
✅ Done |
Medium |
- |
Exit Criteria: Run any Python mutation tool through single MCP interface — MET
M2: Graph-Spectral Infrastructure (Q1-Q2)¶
Goal: O(1) reachability queries via hierarchical decomposition
Task |
Status |
Complexity |
Dependencies |
|---|---|---|---|
Graph models (CodeGraph, Node, Edge) |
✅ Done |
Low |
- |
Call graph extraction (AST) |
✅ Done |
Medium |
- |
ASR extractor (import/class analysis) |
✅ Done |
Medium |
- |
Dependency graph extraction |
✅ Done |
Medium |
- |
Laplacian construction |
✅ Done |
Low |
Graph extraction |
Fiedler vector computation |
✅ Done |
Medium |
Laplacian |
Recursive partitioning |
✅ Done |
Medium |
Fiedler |
SCC detection + condensation |
✅ Done |
Low |
Partitioning |
Planar subgraph identification |
✅ Done |
High |
SCC |
Kameda preprocessing |
✅ Done |
High |
Planar subgraph |
Virtual sink/source augmentation |
✅ Done |
Low |
Module detection |
MCP tools for graph queries |
✅ Done |
Low |
All above |
Exit Criteria: Query reachability between any two functions in O(1) after O(n) preprocessing — MET
M3: Belief Revision Engine (Q2)¶
Goal: AGM-compliant theory management with provenance
Task |
Status |
Complexity |
Dependencies |
|---|---|---|---|
py-brs library (AGM core) |
✅ Done |
High |
- |
Evidence adapter (mutation→belief) |
✅ Done |
Medium |
py-brs |
Theory manager (curate-ipsum) |
✅ Done |
Medium |
Evidence adapter |
AGM contraction (py-brs v2.0.0) |
✅ Done |
High |
py-brs |
Assertion model (types, behaviors) |
✅ Done |
Medium |
- |
Entrenchment calculation (py-brs v2.0.0) |
✅ Done |
Medium |
Evidence |
Provenance DAG storage |
✅ Done |
Medium |
AGM operations |
Rollback mechanism |
✅ Done |
Medium |
Provenance DAG |
Failure mode analyzer |
✅ Done |
High |
All above |
Exit Criteria: Track belief evolution across synthesis attempts with full provenance — MET
M4: Synthesis Loop (Q2-Q3)¶
Goal: LLM candidates → verified patches
Task |
Status |
Complexity |
Dependencies |
|---|---|---|---|
LLM candidate extraction (top-k) |
✅ Done |
Low |
- |
Population initialization |
✅ Done |
Low |
LLM extraction |
Fitness function (CE avoidance + spec) |
✅ Done |
Medium |
M3 |
AST-aware crossover |
✅ Done |
High |
Population |
Directed mutation (CE-guided) |
✅ Done |
High |
Fitness |
Entropy monitoring |
✅ Done |
Medium |
Population |
Diversity injection |
✅ Done |
Medium |
Entropy |
CEGIS main loop |
✅ Done |
High |
All above |
Exit Criteria: Generate patch that kills previously-surviving mutant, verified correct — MET
M5: Verification Backends (Q3)¶
Goal: Formal verification infrastructure
Task |
Complexity |
Dependencies |
|---|---|---|
Z3 Python bindings integration |
✅ Done |
Low |
Type abstraction level (CEGAR) |
✅ Done |
Medium |
CFG abstraction level |
✅ Done |
Medium |
DFG abstraction level |
✅ Done |
High |
Concrete execution level |
✅ Done |
Medium |
Spurious CE detection |
✅ Done |
High |
SymPy path condition encoding |
✅ Done |
Medium |
Numerical solver fallback |
✅ Done |
Medium |
KLEE container integration |
✅ Done |
High |
Exit Criteria: Verify patch correctness against specification with proof certificate
M6: Graph Database + RAG (Q3-Q4)¶
Goal: Persistent, queryable code graph
Task |
Status |
Complexity |
Dependencies |
|---|---|---|---|
Abstract GraphStore ABC |
✅ Done |
Low |
M2 |
SQLite graph store (primary) |
✅ Done |
Medium |
GraphStore |
Kuzu graph store (optional) |
✅ Done |
Medium |
GraphStore |
Synthesis result persistence |
✅ Done |
Low |
M4 |
Kameda index persistence |
✅ Done |
Medium |
M2, GraphStore |
Fiedler partition persistence |
✅ Done |
Medium |
M2, GraphStore |
Incremental update engine |
✅ Done |
High |
GraphStore |
MCP tools (3 new) |
✅ Done |
Low |
All above |
Code embedding model |
✅ Done |
Medium |
- |
Semantic search index |
✅ Done |
Medium |
Embedding |
RAG retrieval pipeline |
✅ Done |
Medium |
Search index |
Text-to-Cypher queries |
✅ Done |
Medium |
Kuzu + RAG |
Exit Criteria: Natural language queries over codebase with graph-backed retrieval — PARTIALLY MET (graph persistence complete, RAG deferred)
M7: Production Hardening (Q4)¶
Goal: CI/CD-ready deployment
Task |
Complexity |
Dependencies |
|---|---|---|
GitHub Actions integration |
Low |
M1 |
Regression detection (PID d-term) |
Low |
M1 |
Threshold-based quality gates |
Low |
Regression |
HTML report generation |
Medium |
- |
SARIF output format |
Low |
- |
VSCode extension |
High |
MCP interface |
Self-healing metadata consistency |
High |
M2, M3 |
Performance benchmarking |
Medium |
All |
Exit Criteria: Drop-in CI integration with automated quality gates
Critical Path¶
M1 (Frameworks) ──→ M2 (Graph) ──→ M6 (Graph DB)
│ │
▼ ▼
M3 (Belief) ←────→ M4 (Synthesis)
│ │
▼ ▼
M5 (Verification) ─────→ M7 (Production)
Risk Register¶
Risk |
Impact |
Mitigation |
|---|---|---|
Fiedler computation slow for large graphs |
M2 delay |
Sparse eigensolvers, approximate methods |
Planar subgraph identification NP-hard |
M2 delay |
Heuristic identification, accept suboptimality |
CEGIS loop non-convergence |
M4 failure |
Entropy injection, timeout with best-effort |
Z3 timeout on complex constraints |
M5 delay |
SymPy reformulation, numerical fallback |
LLM API rate limits |
M4 slowdown |
Local model fallback (CodeLlama) |
Success Metrics¶
Metric |
Target |
Measurement |
|---|---|---|
Mutation score improvement |
+15% |
Before/after synthesis |
Patch verification rate |
>80% |
Patches that pass CEGAR |
Reachability query time |
<1ms |
p99 latency |
False positive rate |
<5% |
Spurious CE ratio |
Time to verified patch |
<5min |
End-to-end for single mutant |
Resource Requirements¶
Compute¶
Development: Standard workstation
CI: 4-core runners with 16GB RAM
KLEE/Z3: Dedicated container with 32GB+ RAM
Graph DB: Neo4j instance (can start with embedded)
Dependencies¶
Python 3.11+
scipy, networkx (graph algorithms)
z3-solver (SMT)
sympy (symbolic math)
pydantic (models)
FastMCP (server)
Optional¶
KLEE (concolic execution)
Neo4j/JanusGraph (graph persistence)
Joern (CPG generation)