Milestones M1-M3 Completeness Audit¶
Date: 2026-01-27 Auditor: Claude Project: curate-ipsum
Executive Summary¶
Milestone |
Status |
Completion |
Notes |
|---|---|---|---|
M1: Multi-Framework Foundation |
⚠️ PARTIAL |
~50% |
Core parsers done, 3 frameworks missing |
M2: Graph-Spectral Infrastructure |
❌ NOT STARTED |
0% |
No implementation |
M3: Belief Revision Engine |
✅ MOSTLY DONE |
~85% |
Via py-brs integration |
M1: Multi-Framework Foundation¶
Goal: Unified interface across mutation testing tools Exit Criteria: Run any Python mutation tool through single MCP interface
Task Breakdown¶
Task |
Complexity |
Status |
Evidence |
|---|---|---|---|
mutmut parser |
Low |
✅ DONE |
|
cosmic-ray parser |
Medium |
❌ NOT DONE |
Only error stub in |
poodle parser |
Low |
❌ NOT DONE |
Not implemented |
universalmutator parser |
Medium |
❌ NOT DONE |
Not implemented |
Framework auto-detection |
Low |
✅ DONE |
|
Non-contradictory region assignment |
Medium |
⚠️ PARTIAL |
Region model done, assignment logic missing |
Detailed Analysis¶
✅ mutmut parser¶
Full SQLite cache parser supporting v1 and v2 schemas
Status mapping:
ok_killed,bad_survived,bad_timeout,ok_suspicious,untested,skippedRegion-level filtering via
get_mutmut_region_mutants()Comprehensive test coverage in
tests/test_m1_parsers.py
✅ Framework auto-detection¶
Language detection from file extensions and config files
Framework detection from output files, cache, and config
Recommendation engine based on language and existing setup
MCP tool:
detect_frameworks_tool()
⚠️ Non-contradictory region assignment¶
Implemented:
Regionmodel with hierarchical levels (FILE, CLASS, FUNCTION, LINES)contains()andoverlaps()methods for relationship checkingString serialization:
file:path::class:name::func:name::lines:start-endMCP tools:
parse_region_tool(),check_region_relationship_tool(),create_region_tool()
Missing:
Automatic region assignment from mutation results
Region hierarchy building from AST
Non-contradictory assignment algorithm (ensuring nested regions don’t conflict)
❌ cosmic-ray parser¶
Current state: Error stub only
elif tool_lower in ("cosmic_ray", "cosmicray", "cosmic"):
raise UnsupportedFrameworkError(
f"cosmic-ray parser not yet implemented. "
f"Supported frameworks: stryker, mutmut"
)
Required:
Parse cosmic-ray’s SQLite session database
Status mapping from cosmic-ray’s outcomes (KILLED, SURVIVED, INCOMPETENT, etc.)
Integration with unified parser interface
❌ poodle / universalmutator parsers¶
No implementation started. Would need:
poodle: Parse JSON/text output format
universalmutator: Parse output format (varies by target language)
Exit Criteria Assessment¶
“Run any Python mutation tool through single MCP interface”
Tool |
Supported |
Notes |
|---|---|---|
mutmut |
✅ Yes |
Full support |
cosmic-ray |
❌ No |
UnsupportedFrameworkError |
poodle |
❌ No |
UnsupportedFrameworkError |
universalmutator |
❌ No |
UnsupportedFrameworkError |
Stryker (JS/TS) |
✅ Yes |
Full support |
Exit criteria NOT MET - only 2/5 Python-relevant tools supported.
M2: Graph-Spectral Infrastructure¶
Goal: O(1) reachability queries via hierarchical decomposition Exit Criteria: Query reachability between any two functions in O(1) after O(n) preprocessing
Task Breakdown¶
Task |
Complexity |
Status |
Evidence |
|---|---|---|---|
Call graph extraction (AST) |
Medium |
❌ NOT DONE |
- |
Dependency graph extraction |
Medium |
❌ NOT DONE |
- |
Laplacian construction |
Low |
❌ NOT DONE |
- |
Fiedler vector computation |
Medium |
❌ NOT DONE |
- |
Recursive partitioning |
Medium |
❌ NOT DONE |
- |
SCC detection + condensation |
Low |
❌ NOT DONE |
- |
Planar subgraph identification |
High |
❌ NOT DONE |
- |
Kameda preprocessing |
High |
❌ NOT DONE |
- |
Virtual sink/source augmentation |
Low |
❌ NOT DONE |
- |
Detailed Analysis¶
No implementation exists. This milestone has not been started.
The infrastructure would require:
AST parsing for Python/JS to extract function call relationships
NetworkX or similar for graph operations
SciPy for sparse eigensolvers (Fiedler vector)
Custom algorithms for Kameda preprocessing (planar reachability)
Exit Criteria Assessment¶
“Query reachability between any two functions in O(1) after O(n) preprocessing”
Exit criteria NOT MET - 0% implementation.
M3: Belief Revision Engine¶
Goal: AGM-compliant theory management with provenance Exit Criteria: Track belief evolution across synthesis attempts with full provenance
Task Breakdown¶
Task |
Complexity |
Status |
Evidence |
|---|---|---|---|
Assertion model |
Medium |
✅ DONE |
|
Evidence types + grounding rules |
Low |
✅ DONE |
|
Entrenchment calculation |
Medium |
✅ DONE |
|
AGM expansion/contraction/revision |
High |
✅ DONE |
Via py-brs |
Provenance DAG storage |
Medium |
✅ DONE |
Via py-brs CASStore WorldBundle |
Rollback mechanism |
Medium |
⚠️ PARTIAL |
World forking exists, no explicit API |
Failure mode analyzer |
High |
❌ NOT DONE |
- |
Detailed Analysis¶
✅ Assertion model¶
Supports typed assertions:
type- Type assertions about codebehavior- Behavioral propertiesinvariant- Invariant conditionscontract- Pre/post conditionsprecondition,postcondition
Each assertion has:
Confidence score (0.0-1.0)
Region binding (optional)
Evidence grounding (required)
✅ Evidence types + grounding rules¶
test_result_to_evidence()- Maps TestRunResult → BRS Evidencemutation_result_to_evidence()- Maps MutationRunResult → BRS EvidenceEvidence reliability derived from test outcomes
Grounding edges link Evidence → Assertion nodes
✅ Entrenchment calculation¶
Via py-brs compute_entrenchment():
Considers incoming edge tiers
Weighs by confidence scores
Returns 0.0-1.0 resilience score
✅ AGM expansion/contraction/revision¶
Expansion:
add_assertion()- Adds new belief with evidenceContraction:
contract_assertion()- Removes belief via AGM contractionStrategies:
entrenchment,minimal,full_cascade
Revision:
revise_with_assertion()- Implements Levi identity K*φ = (K÷¬φ)+φ
✅ Provenance DAG storage¶
Via py-brs CASStore:
Content-addressed storage (immutable objects)
WorldBundle tracks: node_ids, edge_ids, evidence_ids
Version labels for world states
Full audit trail via hash chains
⚠️ Rollback mechanism¶
Implemented:
World forking via
store.get_world(domain, label)Can create new world versions
Can query historical worlds by hash
Missing:
Explicit
rollback_to(version)APIWorld comparison / diff tools
Time-travel queries
❌ Failure mode analyzer¶
No implementation. Would analyze:
Why beliefs were contracted
Contradiction patterns
Evidence invalidation chains
Synthesis attempt failure modes
MCP Tools Implemented¶
Tool |
Function |
|---|---|
|
Add typed assertion with evidence |
|
AGM contraction |
|
Query entrenchment score |
|
List/filter assertions |
|
World state snapshot |
|
AGM revision |
Exit Criteria Assessment¶
“Track belief evolution across synthesis attempts with full provenance”
Capability |
Status |
|---|---|
Track belief additions |
✅ Yes |
Track belief removals |
✅ Yes |
Track belief revisions |
✅ Yes |
Query provenance |
✅ Yes (via CASStore) |
Analyze failure modes |
❌ No |
Exit criteria MOSTLY MET - core belief tracking works, failure analysis missing.
Recommendations¶
Priority 1: Complete M1 Exit Criteria¶
Implement cosmic-ray parser (most common after mutmut)
Defer poodle/universalmutator (lower adoption)
Add region auto-assignment from AST
Priority 2: M3 Gap Closure¶
Add explicit
rollback_to(version)APIImplement basic failure mode analyzer
Priority 3: M2 (defer or reduce scope)¶
M2 is a significant undertaking. Consider:
Using existing tools (e.g.,
pyanfor Python call graphs)Reducing scope to basic reachability (no O(1) requirement)
Deferring until M4 (Synthesis Loop) actually needs it
Test Coverage¶
Module |
Test File |
Coverage |
|---|---|---|
regions |
|
Good (356 lines) |
parsers |
|
Good (458 lines) |
BRS integration |
|
Basic |
theory |
- |
Needs tests |
adapters |
- |
Needs tests |
Files Inventory¶
M1 Implementation¶
parsers/
├── __init__.py # Unified interface (169 lines)
├── detection.py # Language/framework detection (354 lines)
├── stryker_parser.py # Stryker JSON parser (236 lines)
└── mutmut_parser.py # mutmut SQLite parser (434 lines)
regions/
├── __init__.py # Module exports
└── models.py # Region model (325 lines)
M3 Implementation¶
theory/
├── __init__.py # Module exports
└── manager.py # TheoryManager (486 lines)
adapters/
├── __init__.py # Module exports
└── evidence_adapter.py # Evidence mapping
domains/
├── __init__.py # Module exports
└── code_mutation_smoke.py # BRS domain extension
M2 Implementation¶
(none)