Decision Log¶
Every significant architectural and implementation decision, with reasoning and context. Written for cold-start — assumes the reader has zero prior context.
Last updated: 2026-02-09
D-001: Use Dual AST/ASR Extractors with AST as Default¶
Date: 2026-01-27
Status: Active
Affects: graph/ast_extractor.py, graph/asr_extractor.py, graph/extractor.py
Context: Curate-ipsum needs to extract call graphs from Python source code for graph-spectral analysis (Fiedler partitioning, reachability queries). LPython’s ASR (Abstract Semantic Representation) provides richer semantic data including resolved types and imports, but LPython is alpha-status and requires type-annotated code. Python’s built-in ast module is always available but provides only syntactic information.
Decision: Implement both extractors behind an abstract CallGraphExtractor base class. AST backend is always available and is the default. ASR backend is optional and auto-selected when LPython is detected. Both produce the same CallGraph data structure.
Reasoning:
AST is universally available — no external dependency needed for basic functionality.
ASR provides higher-confidence call resolution (resolved imports, type information), valuable for Fiedler partitioning accuracy.
The abstract interface (
extractor.py) means downstream code (spectral analysis, partitioning) is backend-agnostic.LPython’s alpha status means we can’t depend on it for production use. See
docs/lpython_klee_feasibility.md.
Upgrade path: When LPython stabilizes, ASR can become the default. The LLVM backend may eventually enable the LPython → C → KLEE symbolic execution pipeline.
D-002: CallGraph as Central Data Structure for All Graph Operations¶
Date: 2026-01-27
Status: Active
Affects: graph/models.py, all Phase 2 modules
Context: Multiple Phase 2 operations (Laplacian construction, Fiedler computation, SCC detection, reachability) all operate on graph data. The question is whether to use networkx graphs directly, define custom models, or both.
Decision: Define a custom CallGraph class in graph/models.py with typed GraphNode and GraphEdge objects. Built-in graph algorithms (SCC, reachability, condensation, topological sort) operate directly on CallGraph. Phase 2 spectral methods will convert to scipy sparse matrices as needed.
Reasoning:
Custom models carry domain-specific metadata (source locations, signatures, confidence scores, edge kinds) that networkx’s generic node/edge attributes would lose type safety on.
CallGraphalready has index structures (_outgoing,_incoming,_by_file) optimized for common queries.Serialization (
to_dict(),from_dict(),to_dot()) is cleaner with custom models.Conversion to scipy sparse matrices for eigenvalue computation is a one-line operation (
scipy.sparse.csr_matrix(adjacency)).
Trade-off: Algorithms like planarity testing will require conversion to networkx format, since we shouldn’t reimplement networkx’s planarity algorithms.
D-003: Laplacian and Fiedler Computation via scipy.sparse¶
Date: 2026-02-08
Status: Active
Affects: graph/spectral.py (new file)
Context: The Fiedler vector (second eigenvector of the graph Laplacian L = D − A) is the core of Phase 2’s spectral partitioning. For production codebases, call graphs can have thousands of nodes. The Laplacian is typically sparse (most functions call only a few others), so dense eigenvalue decomposition would waste memory and time.
Decision: Use scipy.sparse.linalg.eigsh on a scipy.sparse.csr_matrix Laplacian. Restrict computation to the two smallest eigenvalues (k=2, which='SM'). Handle disconnected components by computing Fiedler vectors per connected component.
Reasoning:
eigshuses ARPACK (iterative Arnoldi method) — O(n·k) for k eigenvectors, efficient for sparse matrices.Call graphs are naturally sparse: a function with 5 callees in a 1000-node graph produces a matrix that is 99.5% zeros.
Connected component handling is essential: the Fiedler vector is undefined for disconnected graphs (λ₂ = 0 with multiplicity > 1). Each connected component gets its own Fiedler partitioning.
Fallback: If eigsh fails to converge (rare, but possible for near-singular Laplacians), fall back to dense numpy.linalg.eigh for small graphs (< 500 nodes) or report the failure for large ones.
D-004: Per-Component Fiedler Partitioning for Disconnected Graphs¶
Date: 2026-02-08
Status: Active
Affects: graph/spectral.py, graph/partitioner.py (new files)
Context: Real-world codebases often have disconnected call graphs — utility modules that are imported but don’t call each other, test files that are independent. The algebraic connectivity λ₂ is 0 for disconnected graphs, making the Fiedler vector meaningless.
Decision: Before Fiedler computation, decompose the graph into connected components. Apply Fiedler partitioning independently to each component with ≥ 3 nodes. Components with 1–2 nodes are treated as singleton partitions.
Reasoning:
This is the mathematically correct approach — Fiedler’s theorem assumes connectivity.
Most connected components in a codebase will be non-trivial (tens to hundreds of nodes), so partitioning is still valuable.
Singleton components (isolated utility functions) don’t benefit from spectral analysis.
D-005: Hierarchical Decomposition via Alternating Condense/Partition¶
Date: 2026-02-08
Status: Active
Affects: graph/hierarchy.py (new file), graph/partitioner.py
Context: The architectural vision calls for a hierarchical decomposition strategy: condense SCCs into single nodes, then apply Fiedler partitioning to the condensed DAG, then recurse within each partition. This produces a tree of progressively finer partitions, terminating at Kuratowski atoms (K₅, K₃,₃ subgraphs) or singleton nodes.
Decision: Implement a HierarchyBuilder that alternates: (1) SCC condensation (using existing CallGraph.condensation()), (2) Fiedler bipartition on the resulting DAG, (3) recurse into each partition. Stop recursion when a partition is below min_partition_size (default: 3) or when the Fiedler value λ₂ exceeds a threshold (indicating the partition is already well-connected).
Reasoning:
Alternating condense/partition respects both cyclic structure (via SCC) and spectral structure (via Fiedler).
The condensation step guarantees the partitioner always operates on a DAG, avoiding issues with directed cycles in Fiedler computation.
Recursion depth is bounded by
log₂(n)in practice (each bipartition halves the graph).This matches the architectural vision in
architectural_vision.md§ A.6.
Trade-off: The alternating strategy may over-partition tightly coupled code. The min_partition_size parameter provides a tuning knob.
D-006: Use networkx for Planarity Testing, Custom Code for Kameda¶
Date: 2026-02-08
Status: Active
Affects: graph/planarity.py, graph/kameda.py (new files)
Context: O(1) reachability queries require Kameda’s algorithm, which only works on planar directed graphs. We need to: (1) identify the maximal planar subgraph, and (2) implement Kameda preprocessing on it. Planarity testing and maximal planar subgraph extraction are well-solved problems available in networkx. Kameda’s algorithm is specialized and not available in any Python library.
Decision: Convert CallGraph to networkx DiGraph for planarity testing using networkx.check_planarity(). Extract Kuratowski subgraphs (K₅, K₃,₃) as the non-planar complement. Implement Kameda’s O(n) preprocessing and O(1) query as custom code in graph/kameda.py, following the original 1975 paper.
Reasoning:
networkx has a well-tested, O(n) Boyer-Myrvold planarity implementation — no reason to reimplement.
Kameda’s algorithm is obscure enough that no Python library implements it. The original paper provides pseudocode that is straightforward to implement.
Converting between
CallGraphand networkxDiGraphis low-cost (iterate nodes and edges, O(n + m)).Kuratowski subgraph extraction (networkx provides this as part of planarity checking) identifies the atomic non-planar units — useful for downstream analysis.
Trade-off: networkx becomes a required dependency for Phase 2 (currently optional).
Fallback: If planarity-based O(1) reachability proves unnecessary for practical graph sizes, BFS/DFS reachability (already implemented in CallGraph.reachable_from()) is O(n + m) per query and may suffice.
D-007: Add scipy and networkx as Phase 2 Dependencies¶
Date: 2026-02-08
Status: Active
Affects: pyproject.toml
Context: Phase 2 requires sparse eigenvalue computation (scipy) and planarity testing (networkx). Currently neither is in the project’s dependency list.
Decision: Add scipy>=1.10 and networkx>=3.0 as optional dependencies under a [project.optional-dependencies.graph] extra, so users who only need Phase 1 (mutation parsing) don’t pull in heavy numerical libraries.
Reasoning:
scipy is ~30 MB; networkx is ~3 MB. Users who only need mutation score reporting shouldn’t be forced to install them.
Optional dependency pattern matches how py-brs handles optional features.
Code should fail gracefully with a clear error message if the graph extra isn’t installed.
Upgrade path: If Phase 2 becomes core functionality, move these to required dependencies.
D-008: Virtual Sink/Source Augmentation per Module¶
Date: 2026-02-08
Status: Active
Affects: graph/partitioner.py
Context: The architectural vision calls for virtual source nodes (edges to all entry points) and virtual sink nodes (edges from all exit points) for each module/partition. This enables uniform reachability queries and module-to-module flow analysis.
Decision: After partitioning, augment each partition with a virtual source s_{partition_id} connected to nodes with no incoming edges from within the partition (entry points), and a virtual sink t_{partition_id} connected from nodes with no outgoing edges within the partition (exit points). Virtual nodes have a special NodeKind or metadata flag to distinguish them from real code nodes.
Reasoning:
Uniform interface: any reachability query can be phrased as “can s_A reach t_B?” for module-level flow.
Entry/exit detection is straightforward: entry = in-degree 0 within partition, exit = out-degree 0 within partition.
Virtual nodes don’t affect spectral properties (they’re added after partitioning).
D-009: Standardized Parser Interface for All Mutation Frameworks¶
Date: 2026-02-08
Status: Active
Affects: parsers/__init__.py, all parsers/*_parser.py modules
Context: M1 exit criteria require running “any Python mutation tool through single MCP interface.” The project already has Stryker (JS) and mutmut (Python) parsers. Three more Python mutation frameworks — cosmic-ray, poodle, and universalmutator — need parsers. The MCP server’s run_mutation_tests_tool already routes through the unified parse_mutation_output() interface, so new parsers only need to implement the standard entry point and be wired into the router.
Decision: Every parser module exports a single entry-point function with this signature:
def parse_<tool>_output(
working_directory: str,
report_path: Optional[str] = None,
) -> Tuple[int, int, int, int, float, List[FileMutationStats]]:
Returning (total, killed, survived, no_coverage, score, by_file). The parsers/__init__.py router normalizes tool name variations (e.g., "cosmic-ray" → "cosmic_ray") and imports lazily. Each parser also exports a find_<tool>_report() locator function. Detection signals go in parsers/detection.py.
Reasoning:
Uniform 6-tuple return means no downstream changes needed —
tools.py,server.py, andmodels.pyare already wired.Lazy imports avoid ImportError if a parser’s optional dependencies aren’t installed.
Detection and parsing are separate concerns: detection tells you which tool ran; parsing reads the output.
D-010: Provenance DAG as Append-Only Event Log¶
Date: 2026-02-08
Status: Active
Affects: theory/provenance.py, theory/manager.py, theory/rollback.py
Context: M3 requires tracking WHY each world state exists — not just WHAT it looks like (already handled by CASStore’s content-addressed snapshots). We need a causal chain: which evidence triggered which revision, which assertions were added or removed, and why.
Decision: Implement a ProvenanceDAG as an append-only event log stored alongside CASStore objects. Events record (event_type, assertion_id, evidence_id, from_world_hash, to_world_hash, strategy, reason, nodes_removed, nodes_added). The DAG structure is derived from event ordering (no explicit parent pointers). Serialized to CASStore as a single object per domain.
Reasoning:
Append-only log provides an audit trail — events are immutable once recorded.
Content-addressed storage means all historical worlds are already preserved; provenance just adds the “why.”
Storing the entire DAG as one object simplifies persistence (vs. storing individual events).
The rollback mechanism only changes the world_label pointer — it doesn’t copy or mutate any worlds, which is correct because CAS preserves everything.
Indexes by assertion_id and to_world_hash enable efficient queries (why_believe, when_added, when_removed, belief_stability).
Trade-off: Storing the entire DAG as one object means write amplification (the whole DAG is re-serialized on each event). For typical synthesis sessions with <1000 events, this is negligible. If scaling becomes an issue, events could be stored individually with an index object.
D-011: Heuristic Failure Classification (Formal Verification Deferred to M5)¶
Date: 2026-02-08
Status: Active
Affects: theory/failure_analyzer.py, theory/manager.py
Context: When a synthesis attempt fails, the system needs to understand WHY it failed to guide the next iteration. Formal verification (SMT/CEGAR) is planned for M5, but M3 needs failure classification now to connect failure modes to belief revision operations (which assertions to contract).
Decision: Implement heuristic failure classification using regex pattern matching on error messages, combined with statistical detection of overfitting (high test pass + low mutation kill) and underfitting (low test pass). Map failure modes to assertion kinds: TYPE_MISMATCH → contract TYPE assertions, POSTCONDITION_VIOLATION → contract POSTCONDITION assertions, UNDERFITTING → contract all assertions in region. Sort contraction candidates by confidence (weakest first — contract least-entrenched beliefs).
Reasoning:
Pattern matching on error messages works surprisingly well for common failure modes (TypeError, AssertionError, IndexError, etc.).
Overfitting/underfitting detection from test pass rate vs mutation kill rate is a well-established technique in mutation testing literature.
The mapping from failure mode to assertion kind provides actionable guidance: “this failed because of a type issue, so contract the type-related beliefs.”
Heuristic approach is fast (no SMT solver), deterministic, and easy to debug.
Formal verification in M5 will replace/augment these heuristics with proof-based classification.
Trade-off: Heuristic classification has lower precision than formal verification. The confidence field on FailureAnalysis communicates this uncertainty to downstream consumers. The SEMANTIC_DRIFT detection (test pass rate 50–80% with no specific error pattern) is particularly low confidence (0.5).
D-012: Hybrid LLM Client with Abstract Interface¶
Date: 2026-02-08
Status: Active
Affects: synthesis/llm_client.py, synthesis/cloud_llm.py, synthesis/local_llm.py
Context: M4 requires LLM-generated code candidates as seeds for the genetic algorithm. Two options: cloud APIs (Claude/GPT) offer higher quality (85-92% syntactically valid) but cost $0.80-2/run and require API keys. Local models (Ollama + codellama:7b) are free but produce lower quality (60-70% valid) and require GPU for usable speed.
Decision: Implement both behind an abstract LLMClient ABC, mirroring D-001’s dual extractor pattern. Three concrete implementations: CloudLLMClient (Anthropic/OpenAI via httpx), LocalLLMClient (Ollama HTTP API), and MockLLMClient (testing). Backend is selectable at runtime via SynthesisConfig.llm_backend.
Reasoning:
Mirrors the project’s established pattern (D-001: dual AST/ASR extractors behind abstract base).
~50 extra lines of abstraction enables runtime switching and zero-dependency testing.
Cloud API for high-quality initial seeds; local model for cost-free refinement iterations.
CI/CD runs with MockLLMClient — no API keys or GPU needed for testing.
Users without GPU or API keys can still use whichever backend is available.
Trade-off: httpx becomes an optional dependency for cloud/local backends. Mock backend works without it.
D-013: Fitness Function Formula for Genetic Algorithm¶
Date: 2026-02-08
Status: Active
Affects: synthesis/fitness.py, synthesis/cegis.py
Context: The genetic algorithm needs a fitness function to evaluate candidate patches. The function must balance three concerns: (1) avoiding known counterexamples, (2) satisfying the specification (tests pass), and (3) code simplicity.
Decision: Fitness = (0.4 × CE_avoidance) + (0.5 × spec_satisfaction) - (0.1 × complexity_penalty). CE_avoidance is the fraction of counterexamples NOT triggered. Spec_satisfaction is the fraction of test commands that pass. Complexity_penalty is AST node count / 100, capped at 1.0.
Reasoning:
Spec satisfaction dominates (0.5 weight) because passing tests is the primary goal.
CE avoidance has second priority (0.4) because avoiding known failures is critical for convergence.
Complexity penalty is low (0.1) — a tiebreaker to prefer simpler code, not a primary concern.
AST node count is a simple, fast proxy for complexity (no external dependency needed).
The formula matches
synthesis_framework.md§Genetic Algorithm Population Management.
Trade-off: Static weights may not be optimal for all codebases. Could be made adaptive in future milestones.
D-014: Abstract GraphStore with SQLite/Kuzu Dual Backends¶
Date: 2026-02-08
Status: Active
Affects: storage/graph_store.py, storage/sqlite_graph_store.py, storage/kuzu_graph_store.py
Context: M6 requires persistent graph storage for call graphs, Kameda reachability indices, and Fiedler partition trees. Prior to M6, all graph data was in-memory and lost on restart. Synthesis results from M4 were similarly ephemeral.
Decision: Abstract GraphStore ABC with two concrete backends: SQLite (primary, zero external deps) and Kuzu (optional embedded graph DB with Cypher). Mirrors the D-012 pattern (hybrid client with abstract base + concrete backends). Factory function build_graph_store(backend, project_path) selects backend at runtime.
Reasoning:
SQLite is stdlib — zero-dependency primary backend ensures curate-ipsum works out of the box.
Kuzu provides native Cypher query language and efficient multi-hop traversals — important for future RAG and text-to-Cypher features.
Storing both backends behind an ABC means new backends (Neo4j, JanusGraph) can be added without changing consumer code.
Storage location:
.curate_ipsum/directory alongside existingbeliefs.db.
Alternatives Considered:
Neo4j server: Rejected — external server dependency too heavy for embedded tool.
Joern CPG + Neo4j: Deferred — Joern adds JVM dependency; Kuzu provides similar Cypher without JVM.
SQLite only: Would work but blocks future Cypher-based queries.
D-015: Incremental Graph Updates via File Hashing¶
Date: 2026-02-08
Status: Active
Affects: storage/incremental.py, storage/sqlite_graph_store.py
Context: Full call graph re-extraction is expensive for large projects. Users modify a few files between queries, making full re-extraction wasteful.
Decision: SHA-256 file hashing with change detection. The IncrementalEngine computes hashes for all .py files, compares with stored hashes, and produces a ChangeSet (added/modified/removed files). Only affected nodes and edges are updated. File hashes are persisted in the graph store’s file_hashes table.
Reasoning:
SHA-256 is fast enough for file-level change detection and avoids false positives.
File-level granularity balances precision vs. complexity — function-level diffing would require AST parsing before change detection.
Removed files trigger node/edge deletion. Modified files trigger delete + re-extract. Added files trigger extraction only.
Full rebuild is always available as a fallback via
force_full_rebuild().
D-016: Verification Backend ABC with Z3 + angr Docker Backends¶
Date: 2026-02-09
Status: Active
Affects: verification/backend.py, verification/backends/z3_backend.py, verification/backends/angr_docker.py, verification/orchestrator.py
Context: M5 requires formal verification backends to prove patch correctness. The CEGIS engine (M4) currently uses test-based verification only. D-011 explicitly notes that formal verification will replace/augment heuristic failure classification.
Decision: Implement an abstract VerificationBackend ABC (mirrors D-012/D-014 pattern) with three backends: Z3 (constraint solving, “cheap” tier), angr Docker (symbolic execution, “expensive” tier), and Mock (testing). A VerificationOrchestrator implements CEGAR with budget escalation (10s → 30s → 120s). Wire into synthesis/cegis.py as an optional verification layer in _verify_patch.
Reasoning:
Z3 is fast for constraint satisfaction but cannot handle addr_reached (needs actual binary execution).
angr via Docker isolates the heavy symbolic execution environment and matches the angr_adapter_baseline reference implementation.
CEGAR budget escalation prevents wasting time on cheap verification that won’t converge.
Optional integration: when no verification backend is configured, behavior is unchanged (pure test-based).
Factory function pattern consistent with build_graph_store (D-014) and LLMClient (D-012).
Trade-off: Docker dependency for angr means CI needs Docker. Mock backend enables testing without Docker or Z3.
D-017: Chroma Vector Store + Graph-Expanded RAG Pipeline¶
Date: 2026-02-09
Status: Active
Affects: rag/vector_store.py, rag/embedding_provider.py, rag/search.py, synthesis/cegis.py
Context: M6 deferred RAG/semantic search. The CEGIS engine needs code context to generate better LLM candidates. Previously, context was limited to spec.context_code (manually provided). RAG enables automatic retrieval of relevant code from the project’s call graph.
Decision: Implement VectorStore ABC with ChromaDB backend (embedded PersistentClient for zero-dep operation, HttpClient for Docker). EmbeddingProvider ABC with sentence-transformers (all-MiniLM-L6-v2, 384-dim). RAGPipeline combines vector top-k with graph expansion via existing GraphStore.get_neighbors(). Wire into CEGIS as optional rag_pipeline parameter that enriches build_synthesis_prompt context.
Reasoning:
ChromaDB chosen for near-zero external dependency in embedded mode (vs FAISS needing native builds, Qdrant needing server).
Graph expansion leverages existing GraphStore infrastructure (D-014) for caller/callee retrieval.
Decay factors (callee=0.8, caller=0.7) ensure direct neighbors rank higher than transitive.
Optional integration: CEGIS works identically without RAG; RAG enriches context when available.
all-MiniLM-L6-v2 balances speed and quality for code embedding (384-dim, fast inference).
Trade-off: ChromaDB adds ~50MB dependency. sentence-transformers adds ~500MB with model. Both are optional deps under [rag] extra.
Decision Index¶
ID |
Short Name |
Date |
Status |
|---|---|---|---|
D-001 |
Dual AST/ASR extractors |
2026-01-27 |
Active |
D-002 |
CallGraph as central data structure |
2026-01-27 |
Active |
D-003 |
Sparse Laplacian + Fiedler via scipy |
2026-02-08 |
Active |
D-004 |
Per-component Fiedler for disconnected graphs |
2026-02-08 |
Active |
D-005 |
Alternating condense/partition hierarchy |
2026-02-08 |
Active |
D-006 |
networkx for planarity, custom Kameda |
2026-02-08 |
Active |
D-007 |
scipy/networkx as optional dependencies |
2026-02-08 |
Active |
D-008 |
Virtual sink/source augmentation |
2026-02-08 |
Active |
D-009 |
Standardized parser interface |
2026-02-08 |
Active |
D-010 |
Provenance DAG as append-only event log |
2026-02-08 |
Active |
D-011 |
Heuristic failure classification |
2026-02-08 |
Active |
D-012 |
Hybrid LLM client with abstract interface |
2026-02-08 |
Active |
D-013 |
Fitness function formula for GA |
2026-02-08 |
Active |
D-014 |
Abstract GraphStore with SQLite/Kuzu backends |
2026-02-08 |
Active |
D-015 |
Incremental graph updates via file hashing |
2026-02-08 |
Active |
D-016 |
Verification backend ABC with Z3 + angr Docker |
2026-02-09 |
Active |
D-017 |
Chroma vector store + graph-expanded RAG |
2026-02-09 |
Active |
Revision History¶
v1.0 (2026-02-08): Initial decision log created. D-001 and D-002 documented retroactively from existing code. D-003 through D-008 are Phase 2 design decisions.
v1.1 (2026-02-08): Added D-009 (standardized parser interface) for M1 completion. All Phase 2 decisions (D-003 through D-008) confirmed as implemented and tested.
v1.2 (2026-02-08): Added D-010 (provenance DAG) and D-011 (heuristic failure classification) for M3 completion.
v1.3 (2026-02-08): Added D-012 (hybrid LLM client) and D-013 (fitness function formula) for M4 completion.
v1.4 (2026-02-08): Added D-014 (abstract GraphStore) and D-015 (incremental updates) for M6 graph persistence.
v1.5 (2026-02-09): Added D-016 (verification backend ABC) and D-017 (Chroma RAG pipeline) for M5 and M6-deferred completion.