Storage

Persistent stores for graphs, synthesis results, and incremental updates.

Abstract graph store interface.

Defines the GraphStore ABC with two backends: - SQLiteGraphStore (primary, zero dependencies) - KuzuGraphStore (optional, embedded graph DB with Cypher)

Follows the D-012 pattern (hybrid client with abstract base + concrete backends).

Decision: D-014

class curate_ipsum.storage.graph_store.GraphStore[source]

Bases: ABC

Abstract interface for persistent graph storage.

Implementations persist call graphs, reachability indices, Fiedler partitions, and file hashes for incremental updates.

abstractmethod store_graph(graph, project_id)[source]

Persist an entire call graph (nodes + edges).

Parameters:
Return type:

None

abstractmethod load_graph(project_id)[source]

Load a previously stored call graph.

Parameters:

project_id (str)

Return type:

CallGraph | None

abstractmethod store_node(node_data, project_id)[source]

Store or update a single node.

Parameters:
Return type:

None

abstractmethod store_edge(edge_data, project_id)[source]

Store or update a single edge.

Parameters:
Return type:

None

abstractmethod get_node(node_id, project_id)[source]

Get a single node’s data by ID.

Parameters:
  • node_id (str)

  • project_id (str)

Return type:

dict[str, Any] | None

abstractmethod get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]

Get neighboring node IDs.

Parameters:
  • node_id (str) – Source node

  • project_id (str) – Project identifier

  • direction (str) – “outgoing”, “incoming”, or “both”

  • edge_kind (str | None) – Filter by edge kind (None = all)

Return type:

list[str]

abstractmethod query_reachable(source_id, target_id, project_id)[source]

Check if target is reachable from source using stored Kameda labels.

Falls back to non-planar reachability table if needed.

Parameters:
  • source_id (str)

  • target_id (str)

  • project_id (str)

Return type:

bool

abstractmethod store_reachability_index(kameda_data, project_id)[source]

Persist Kameda reachability index.

kameda_data keys: left_rank, right_rank, source_id, sink_id,

non_planar_reachability, all_node_ids

Parameters:
Return type:

None

abstractmethod load_reachability_index(project_id)[source]

Load stored Kameda reachability index.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

abstractmethod store_partitions(partition_data, project_id)[source]

Persist Fiedler partition tree.

partition_data is the recursive tree structure with id, node_ids, children, fiedler_value, depth.

Parameters:
Return type:

None

abstractmethod load_partitions(project_id)[source]

Load stored partition tree.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

abstractmethod get_file_hashes(project_id)[source]

Get stored file hashes for incremental update detection.

Parameters:

project_id (str)

Return type:

dict[str, str]

abstractmethod set_file_hashes(project_id, hashes)[source]

Store file hashes for incremental update detection.

Parameters:
Return type:

None

abstractmethod delete_nodes_by_file(file_path, project_id)[source]

Delete all nodes (and their edges) belonging to a file.

Returns the number of nodes deleted.

Parameters:
  • file_path (str)

  • project_id (str)

Return type:

int

abstractmethod get_stats(project_id)[source]

Get storage statistics (node count, edge count, etc.).

Parameters:

project_id (str)

Return type:

dict[str, Any]

abstractmethod close()[source]

Release storage resources.

Return type:

None

curate_ipsum.storage.graph_store.build_graph_store(backend, project_path)[source]

Factory: create a GraphStore of the requested backend type.

Parameters:
  • backend (str) – “sqlite” or “kuzu”

  • project_path (Path) – Root path of the project being analyzed. Storage directory is created at project_path / .curate_ipsum /

Returns:

GraphStore instance

Raises:
Return type:

GraphStore

SQLite-backed graph store.

Primary backend — zero external dependencies (stdlib sqlite3). Uses WAL mode for concurrent read safety and batch inserts for performance.

Schema: 7 tables covering nodes, edges, Kameda labels, non-planar reachability, partitions, partition membership, and file hashes.

Decision: D-014

class curate_ipsum.storage.sqlite_graph_store.SQLiteGraphStore(db_path)[source]

Bases: GraphStore

SQLite-backed graph storage. Primary backend with zero external dependencies.

Parameters:

db_path (Path)

store_graph(graph, project_id)[source]

Persist an entire call graph (bulk INSERT OR REPLACE).

Parameters:
Return type:

None

load_graph(project_id)[source]

Load a previously stored call graph.

Parameters:

project_id (str)

Return type:

CallGraph | None

store_node(node_data, project_id)[source]

Store or update a single node.

Parameters:
Return type:

None

store_edge(edge_data, project_id)[source]

Store or update a single edge.

Parameters:
Return type:

None

get_node(node_id, project_id)[source]

Get a single node’s data by ID.

Parameters:
  • node_id (str)

  • project_id (str)

Return type:

dict[str, Any] | None

get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]

Get neighboring node IDs.

Parameters:
  • node_id (str)

  • project_id (str)

  • direction (str)

  • edge_kind (str | None)

Return type:

list[str]

query_reachable(source_id, target_id, project_id)[source]

Check if target is reachable from source using Kameda labels.

Parameters:
  • source_id (str)

  • target_id (str)

  • project_id (str)

Return type:

bool

store_reachability_index(kameda_data, project_id)[source]

Persist Kameda reachability index.

Parameters:
Return type:

None

load_reachability_index(project_id)[source]

Load stored Kameda reachability index.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

store_partitions(partition_data, project_id)[source]

Persist Fiedler partition tree.

Parameters:
Return type:

None

load_partitions(project_id)[source]

Load stored partition tree.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

get_file_hashes(project_id)[source]

Get stored file hashes for incremental update detection.

Parameters:

project_id (str)

Return type:

dict[str, str]

set_file_hashes(project_id, hashes)[source]

Store file hashes for incremental update detection.

Parameters:
Return type:

None

delete_nodes_by_file(file_path, project_id)[source]

Delete all nodes (and their edges) belonging to a file.

Parameters:
  • file_path (str)

  • project_id (str)

Return type:

int

get_stats(project_id)[source]

Get storage statistics.

Parameters:

project_id (str)

Return type:

dict[str, Any]

close()[source]

Release database connection.

Return type:

None

Kuzu-backed graph store.

Optional backend — requires pip install kuzu. Provides native Cypher query support and efficient multi-hop traversals.

Uses the same GraphStore ABC as the SQLite backend (D-014 pattern).

class curate_ipsum.storage.kuzu_graph_store.KuzuGraphStore(db_path)[source]

Bases: GraphStore

Kuzu-backed graph storage with native Cypher support.

Parameters:

db_path (Path)

store_graph(graph, project_id)[source]

Persist an entire call graph.

Parameters:
Return type:

None

load_graph(project_id)[source]

Load a previously stored call graph.

Parameters:

project_id (str)

Return type:

CallGraph | None

store_node(node_data, project_id)[source]

Store or update a single node.

Parameters:
Return type:

None

store_edge(edge_data, project_id)[source]

Store or update a single edge.

Parameters:
Return type:

None

get_node(node_id, project_id)[source]

Get a single node’s data by ID.

Parameters:
  • node_id (str)

  • project_id (str)

Return type:

dict[str, Any] | None

get_neighbors(node_id, project_id, direction='outgoing', edge_kind=None)[source]

Get neighboring node IDs.

Parameters:
  • node_id (str)

  • project_id (str)

  • direction (str)

  • edge_kind (str | None)

Return type:

list[str]

query_reachable(source_id, target_id, project_id)[source]

Check if target is reachable from source using Kameda labels.

Parameters:
  • source_id (str)

  • target_id (str)

  • project_id (str)

Return type:

bool

store_reachability_index(kameda_data, project_id)[source]

Persist Kameda reachability index.

Parameters:
Return type:

None

load_reachability_index(project_id)[source]

Load stored Kameda reachability index.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

store_partitions(partition_data, project_id)[source]

Persist Fiedler partition tree.

Parameters:
Return type:

None

load_partitions(project_id)[source]

Load stored partition tree.

Parameters:

project_id (str)

Return type:

dict[str, Any] | None

get_file_hashes(project_id)[source]

Get stored file hashes for incremental update detection.

Parameters:

project_id (str)

Return type:

dict[str, str]

set_file_hashes(project_id, hashes)[source]

Store file hashes for incremental update detection.

Parameters:
Return type:

None

delete_nodes_by_file(file_path, project_id)[source]

Delete all nodes (and their edges) belonging to a file.

Parameters:
  • file_path (str)

  • project_id (str)

Return type:

int

get_stats(project_id)[source]

Get storage statistics.

Parameters:

project_id (str)

Return type:

dict[str, Any]

close()[source]

Release database resources.

Return type:

None

Persistent storage for synthesis results.

Uses JSONL (newline-delimited JSON) for append-only persistence, mirroring the project’s existing tools.py::append_run() pattern.

Each line is a JSON object with the SynthesisResult fields plus a project_id key for multi-project filtering.

class curate_ipsum.storage.synthesis_store.SynthesisStore(data_dir)[source]

Bases: object

Append-only JSONL store for synthesis results.

Parameters:

data_dir (Path)

append(result, project_id)[source]

Append a synthesis result to the JSONL store.

Parameters:
Return type:

None

load_all(project_id)[source]

Load all synthesis results for a project.

Parameters:

project_id (str)

Return type:

list[SynthesisResult]

load_by_id(synthesis_id)[source]

Load a specific synthesis result by ID.

Parameters:

synthesis_id (str)

Return type:

SynthesisResult | None

load_by_region(project_id, region_id)[source]

Load all synthesis results for a specific region within a project.

Parameters:
  • project_id (str)

  • region_id (str)

Return type:

list[SynthesisResult]

Incremental update engine for graph persistence.

Detects which files changed since the last extraction and updates only the affected graph nodes/edges, avoiding full re-extraction.

Uses SHA-256 file hashing to detect changes. The hash map is persisted via the GraphStore’s file_hashes table.

Decision: D-015

class curate_ipsum.storage.incremental.ChangeSet(added=<factory>, modified=<factory>, removed=<factory>)[source]

Bases: object

Files that changed since last extraction.

Parameters:
added: list[str]
modified: list[str]
removed: list[str]
property has_changes: bool
property total_changed: int
to_dict()[source]
Return type:

dict

class curate_ipsum.storage.incremental.UpdateResult(added_nodes=0, removed_nodes=0, modified_files=0, total_files_scanned=0, duration_ms=0, change_set=None, full_rebuild=False)[source]

Bases: object

Result of an incremental graph update.

Parameters:
  • added_nodes (int)

  • removed_nodes (int)

  • modified_files (int)

  • total_files_scanned (int)

  • duration_ms (int)

  • change_set (ChangeSet | None)

  • full_rebuild (bool)

added_nodes: int = 0
removed_nodes: int = 0
modified_files: int = 0
total_files_scanned: int = 0
duration_ms: int = 0
change_set: ChangeSet | None = None
full_rebuild: bool = False
to_dict()[source]
Return type:

dict

class curate_ipsum.storage.incremental.IncrementalEngine(store)[source]

Bases: object

Detects file changes and performs incremental graph updates.

Workflow: 1. Compute current file hashes for all matching files 2. Compare with stored hashes → ChangeSet 3. For removed files: delete nodes/edges 4. For added/modified files: re-extract and merge 5. Update stored file hashes

Parameters:

store (GraphStore)

static compute_file_hashes(directory, pattern='**/*.py')[source]

Compute SHA-256 hashes for all files matching pattern.

Parameters:
  • directory (Path) – Root directory to scan

  • pattern (str) – Glob pattern for files

Returns:

Dict mapping relative file paths to their SHA-256 hex digests

Return type:

dict[str, str]

detect_changes(project_id, current_hashes)[source]

Compare current file hashes with stored hashes to find changes.

Parameters:
  • project_id (str) – Project identifier

  • current_hashes (dict[str, str]) – Current file → hash mapping

Returns:

ChangeSet with added, modified, and removed files

Return type:

ChangeSet

update_graph(project_id, directory, pattern='**/*.py', extractor_func=None)[source]

Perform an incremental graph update.

Parameters:
  • project_id (str) – Project identifier

  • directory (Path) – Root directory of the project

  • pattern (str) – File glob pattern

  • extractor_func – Optional callable(file_path) → (nodes, edges) for extraction. If None, only file hash tracking and node deletion are performed.

Returns:

UpdateResult with counts of changes made

Return type:

UpdateResult

force_full_rebuild(project_id, graph, directory, pattern='**/*.py')[source]

Force a complete graph rebuild (drop all + store full graph).

Parameters:
  • project_id (str) – Project identifier

  • graph – The complete CallGraph to persist

  • directory (Path) – Root directory for file hash computation

  • pattern (str) – File glob pattern

Returns:

UpdateResult marked as full_rebuild

Return type:

UpdateResult