Storage
Persistent stores for graphs, synthesis results, and incremental updates.
Abstract graph store interface.
Defines the GraphStore ABC with two backends:
- SQLiteGraphStore (primary, zero dependencies)
- KuzuGraphStore (optional, embedded graph DB with Cypher)
Follows the D-012 pattern (hybrid client with abstract base + concrete backends).
Decision: D-014
-
class curate_ipsum.storage.graph_store.GraphStore[source]
Bases: ABC
Abstract interface for persistent graph storage.
Implementations persist call graphs, reachability indices,
Fiedler partitions, and file hashes for incremental updates.
-
abstractmethod store_graph(graph, project_id)[source]
Persist an entire call graph (nodes + edges).
- Parameters:
-
- Return type:
None
-
abstractmethod load_graph(project_id)[source]
Load a previously stored call graph.
- Parameters:
project_id (str)
- Return type:
CallGraph | None
-
abstractmethod store_node(node_data, project_id)[source]
Store or update a single node.
- Parameters:
-
- Return type:
None
-
abstractmethod store_edge(edge_data, project_id)[source]
Store or update a single edge.
- Parameters:
-
- Return type:
None
-
abstractmethod get_node(node_id, project_id)[source]
Get a single node’s data by ID.
- Parameters:
node_id (str)
project_id (str)
- Return type:
dict[str, Any] | None
-
abstractmethod get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]
Get neighboring node IDs.
- Parameters:
node_id (str) – Source node
project_id (str) – Project identifier
direction (str) – “outgoing”, “incoming”, or “both”
edge_kind (str | None) – Filter by edge kind (None = all)
- Return type:
list[str]
-
abstractmethod query_reachable(source_id, target_id, project_id)[source]
Check if target is reachable from source using stored Kameda labels.
Falls back to non-planar reachability table if needed.
- Parameters:
source_id (str)
target_id (str)
project_id (str)
- Return type:
bool
-
abstractmethod store_reachability_index(kameda_data, project_id)[source]
Persist Kameda reachability index.
- kameda_data keys: left_rank, right_rank, source_id, sink_id,
non_planar_reachability, all_node_ids
- Parameters:
-
- Return type:
None
-
abstractmethod load_reachability_index(project_id)[source]
Load stored Kameda reachability index.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
abstractmethod store_partitions(partition_data, project_id)[source]
Persist Fiedler partition tree.
partition_data is the recursive tree structure with
id, node_ids, children, fiedler_value, depth.
- Parameters:
-
- Return type:
None
-
abstractmethod load_partitions(project_id)[source]
Load stored partition tree.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
abstractmethod get_file_hashes(project_id)[source]
Get stored file hashes for incremental update detection.
- Parameters:
project_id (str)
- Return type:
dict[str, str]
-
abstractmethod set_file_hashes(project_id, hashes)[source]
Store file hashes for incremental update detection.
- Parameters:
-
- Return type:
None
-
abstractmethod delete_nodes_by_file(file_path, project_id)[source]
Delete all nodes (and their edges) belonging to a file.
Returns the number of nodes deleted.
- Parameters:
file_path (str)
project_id (str)
- Return type:
int
-
abstractmethod get_stats(project_id)[source]
Get storage statistics (node count, edge count, etc.).
- Parameters:
project_id (str)
- Return type:
dict[str, Any]
-
abstractmethod close()[source]
Release storage resources.
- Return type:
None
-
curate_ipsum.storage.graph_store.build_graph_store(backend, project_path)[source]
Factory: create a GraphStore of the requested backend type.
- Parameters:
-
- Returns:
GraphStore instance
- Raises:
-
- Return type:
GraphStore
SQLite-backed graph store.
Primary backend — zero external dependencies (stdlib sqlite3).
Uses WAL mode for concurrent read safety and batch inserts for performance.
Schema: 7 tables covering nodes, edges, Kameda labels, non-planar reachability,
partitions, partition membership, and file hashes.
Decision: D-014
-
class curate_ipsum.storage.sqlite_graph_store.SQLiteGraphStore(db_path)[source]
Bases: GraphStore
SQLite-backed graph storage. Primary backend with zero external dependencies.
- Parameters:
db_path (Path)
-
store_graph(graph, project_id)[source]
Persist an entire call graph (bulk INSERT OR REPLACE).
- Parameters:
-
- Return type:
None
-
load_graph(project_id)[source]
Load a previously stored call graph.
- Parameters:
project_id (str)
- Return type:
CallGraph | None
-
store_node(node_data, project_id)[source]
Store or update a single node.
- Parameters:
-
- Return type:
None
-
store_edge(edge_data, project_id)[source]
Store or update a single edge.
- Parameters:
-
- Return type:
None
-
get_node(node_id, project_id)[source]
Get a single node’s data by ID.
- Parameters:
node_id (str)
project_id (str)
- Return type:
dict[str, Any] | None
-
get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]
Get neighboring node IDs.
- Parameters:
node_id (str)
project_id (str)
direction (str)
edge_kind (str | None)
- Return type:
list[str]
-
query_reachable(source_id, target_id, project_id)[source]
Check if target is reachable from source using Kameda labels.
- Parameters:
source_id (str)
target_id (str)
project_id (str)
- Return type:
bool
-
store_reachability_index(kameda_data, project_id)[source]
Persist Kameda reachability index.
- Parameters:
-
- Return type:
None
-
load_reachability_index(project_id)[source]
Load stored Kameda reachability index.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
store_partitions(partition_data, project_id)[source]
Persist Fiedler partition tree.
- Parameters:
-
- Return type:
None
-
load_partitions(project_id)[source]
Load stored partition tree.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
get_file_hashes(project_id)[source]
Get stored file hashes for incremental update detection.
- Parameters:
project_id (str)
- Return type:
dict[str, str]
-
set_file_hashes(project_id, hashes)[source]
Store file hashes for incremental update detection.
- Parameters:
-
- Return type:
None
-
delete_nodes_by_file(file_path, project_id)[source]
Delete all nodes (and their edges) belonging to a file.
- Parameters:
file_path (str)
project_id (str)
- Return type:
int
-
get_stats(project_id)[source]
Get storage statistics.
- Parameters:
project_id (str)
- Return type:
dict[str, Any]
-
close()[source]
Release database connection.
- Return type:
None
Kuzu-backed graph store.
Optional backend — requires pip install kuzu.
Provides native Cypher query support and efficient multi-hop traversals.
Uses the same GraphStore ABC as the SQLite backend (D-014 pattern).
-
class curate_ipsum.storage.kuzu_graph_store.KuzuGraphStore(db_path)[source]
Bases: GraphStore
Kuzu-backed graph storage with native Cypher support.
- Parameters:
db_path (Path)
-
store_graph(graph, project_id)[source]
Persist an entire call graph.
- Parameters:
-
- Return type:
None
-
load_graph(project_id)[source]
Load a previously stored call graph.
- Parameters:
project_id (str)
- Return type:
CallGraph | None
-
store_node(node_data, project_id)[source]
Store or update a single node.
- Parameters:
-
- Return type:
None
-
store_edge(edge_data, project_id)[source]
Store or update a single edge.
- Parameters:
-
- Return type:
None
-
get_node(node_id, project_id)[source]
Get a single node’s data by ID.
- Parameters:
node_id (str)
project_id (str)
- Return type:
dict[str, Any] | None
-
get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]
Get neighboring node IDs.
- Parameters:
node_id (str)
project_id (str)
direction (str)
edge_kind (str | None)
- Return type:
list[str]
-
query_reachable(source_id, target_id, project_id)[source]
Check if target is reachable from source using Kameda labels.
- Parameters:
source_id (str)
target_id (str)
project_id (str)
- Return type:
bool
-
store_reachability_index(kameda_data, project_id)[source]
Persist Kameda reachability index.
- Parameters:
-
- Return type:
None
-
load_reachability_index(project_id)[source]
Load stored Kameda reachability index.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
store_partitions(partition_data, project_id)[source]
Persist Fiedler partition tree.
- Parameters:
-
- Return type:
None
-
load_partitions(project_id)[source]
Load stored partition tree.
- Parameters:
project_id (str)
- Return type:
dict[str, Any] | None
-
get_file_hashes(project_id)[source]
Get stored file hashes for incremental update detection.
- Parameters:
project_id (str)
- Return type:
dict[str, str]
-
set_file_hashes(project_id, hashes)[source]
Store file hashes for incremental update detection.
- Parameters:
-
- Return type:
None
-
delete_nodes_by_file(file_path, project_id)[source]
Delete all nodes (and their edges) belonging to a file.
- Parameters:
file_path (str)
project_id (str)
- Return type:
int
-
get_stats(project_id)[source]
Get storage statistics.
- Parameters:
project_id (str)
- Return type:
dict[str, Any]
-
close()[source]
Release database resources.
- Return type:
None
Persistent storage for synthesis results.
Uses JSONL (newline-delimited JSON) for append-only persistence,
mirroring the project’s existing tools.py::append_run() pattern.
Each line is a JSON object with the SynthesisResult fields plus
a project_id key for multi-project filtering.
-
class curate_ipsum.storage.synthesis_store.SynthesisStore(data_dir)[source]
Bases: object
Append-only JSONL store for synthesis results.
- Parameters:
data_dir (Path)
-
append(result, project_id)[source]
Append a synthesis result to the JSONL store.
- Parameters:
-
- Return type:
None
-
load_all(project_id)[source]
Load all synthesis results for a project.
- Parameters:
project_id (str)
- Return type:
list[SynthesisResult]
-
load_by_id(synthesis_id)[source]
Load a specific synthesis result by ID.
- Parameters:
synthesis_id (str)
- Return type:
SynthesisResult | None
-
load_by_region(project_id, region_id)[source]
Load all synthesis results for a specific region within a project.
- Parameters:
project_id (str)
region_id (str)
- Return type:
list[SynthesisResult]
Incremental update engine for graph persistence.
Detects which files changed since the last extraction and updates
only the affected graph nodes/edges, avoiding full re-extraction.
Uses SHA-256 file hashing to detect changes. The hash map is persisted
via the GraphStore’s file_hashes table.
Decision: D-015
-
class curate_ipsum.storage.incremental.ChangeSet(added=<factory>, modified=<factory>, removed=<factory>)[source]
Bases: object
Files that changed since last extraction.
- Parameters:
-
-
added: list[str]
-
modified: list[str]
-
removed: list[str]
-
property has_changes: bool
-
property total_changed: int
-
to_dict()[source]
- Return type:
dict
-
class curate_ipsum.storage.incremental.UpdateResult(added_nodes=0, removed_nodes=0, modified_files=0, total_files_scanned=0, duration_ms=0, change_set=None, full_rebuild=False)[source]
Bases: object
Result of an incremental graph update.
- Parameters:
-
-
added_nodes: int = 0
-
removed_nodes: int = 0
-
modified_files: int = 0
-
total_files_scanned: int = 0
-
duration_ms: int = 0
-
change_set: ChangeSet | None = None
-
full_rebuild: bool = False
-
to_dict()[source]
- Return type:
dict
-
class curate_ipsum.storage.incremental.IncrementalEngine(store)[source]
Bases: object
Detects file changes and performs incremental graph updates.
Workflow:
1. Compute current file hashes for all matching files
2. Compare with stored hashes → ChangeSet
3. For removed files: delete nodes/edges
4. For added/modified files: re-extract and merge
5. Update stored file hashes
- Parameters:
store (GraphStore)
-
static compute_file_hashes(directory, pattern='**/*.py')[source]
Compute SHA-256 hashes for all files matching pattern.
- Parameters:
-
- Returns:
Dict mapping relative file paths to their SHA-256 hex digests
- Return type:
dict[str, str]
-
detect_changes(project_id, current_hashes)[source]
Compare current file hashes with stored hashes to find changes.
- Parameters:
-
- Returns:
ChangeSet with added, modified, and removed files
- Return type:
ChangeSet
-
update_graph(project_id, directory, pattern='**/*.py', extractor_func=None)[source]
Perform an incremental graph update.
- Parameters:
project_id (str) – Project identifier
directory (Path) – Root directory of the project
pattern (str) – File glob pattern
extractor_func – Optional callable(file_path) → (nodes, edges) for extraction.
If None, only file hash tracking and node deletion are performed.
- Returns:
UpdateResult with counts of changes made
- Return type:
UpdateResult
-
force_full_rebuild(project_id, graph, directory, pattern='**/*.py')[source]
Force a complete graph rebuild (drop all + store full graph).
- Parameters:
project_id (str) – Project identifier
graph – The complete CallGraph to persist
directory (Path) – Root directory for file hash computation
pattern (str) – File glob pattern
- Returns:
UpdateResult marked as full_rebuild
- Return type:
UpdateResult