Graph Analysis

Call graph extraction, spectral partitioning, Kameda reachability, and hierarchical decomposition.

Graph extraction and spectral analysis module for curate-ipsum.

This module provides:

  1. Call graph extraction (Phase 1 — complete): - AST backend (always available) - ASR backend (requires LPython) - Dependency extraction (import-level)

  2. Graph-spectral analysis (Phase 2): - Laplacian construction and Fiedler vector computation - Recursive spectral partitioning - Virtual sink/source augmentation - Hierarchical SCC condensation - Planarity testing and Kuratowski extraction - Kameda O(1) reachability index

Usage:

from curate_ipsum.graph import get_extractor, CallGraph

# Extract call graph extractor = get_extractor() graph = extractor.extract_directory(Path(“src/”))

# Spectral partitioning (requires scipy) from curate_ipsum.graph.spectral import compute_fiedler_components from curate_ipsum.graph.partitioner import GraphPartitioner

partitioner = GraphPartitioner(min_partition_size=3) tree = partitioner.partition(graph)

# O(1) reachability (requires networkx) from curate_ipsum.graph.planarity import check_planarity from curate_ipsum.graph.kameda import KamedaIndex

result = check_planarity(graph) index = KamedaIndex.build(result.planar_subgraph, result.embedding) print(index.reaches(“module.func_a”, “module.func_b”))

class curate_ipsum.graph.CallGraph(nodes=<factory>, edges=<factory>, _outgoing=<factory>, _incoming=<factory>, _by_file=<factory>)[source]

Bases: object

Directed graph representing function calls and definitions.

This is the core data structure for M2 graph-spectral analysis. Designed to be populated from either Python AST or LPython ASR.

Parameters:
nodes: dict[str, GraphNode]
edges: set[GraphEdge]
add_node(node)[source]

Add a node to the graph.

Parameters:

node (GraphNode)

Return type:

None

add_edge(edge)[source]

Add an edge to the graph.

Parameters:

edge (GraphEdge)

Return type:

None

get_node(node_id)[source]

Get a node by ID.

Parameters:

node_id (str)

Return type:

GraphNode | None

get_callees(node_id)[source]

Get IDs of functions called by node_id.

Parameters:

node_id (str)

Return type:

set[str]

get_callers(node_id)[source]

Get IDs of functions that call node_id.

Parameters:

node_id (str)

Return type:

set[str]

get_edges_from(node_id, kind=None)[source]

Get all edges originating from a node.

Parameters:
Return type:

Iterator[GraphEdge]

get_edges_to(node_id, kind=None)[source]

Get all edges pointing to a node.

Parameters:
Return type:

Iterator[GraphEdge]

get_nodes_in_file(file_path)[source]

Get all node IDs in a specific file.

Parameters:

file_path (str)

Return type:

set[str]

functions()[source]

Iterate over all function/method nodes.

Return type:

Iterator[GraphNode]

classes()[source]

Iterate over all class nodes.

Return type:

Iterator[GraphNode]

modules()[source]

Iterate over all module nodes.

Return type:

Iterator[GraphNode]

reachable_from(node_id, max_depth=None)[source]

Get all nodes reachable from node_id via calls.

Parameters:
  • node_id (str) – Starting node

  • max_depth (int | None) – Maximum traversal depth (None = unlimited)

Returns:

Set of reachable node IDs (excluding start node)

Return type:

set[str]

reaches(node_id, max_depth=None)[source]

Get all nodes that can reach node_id via calls.

Parameters:
  • node_id (str) – Target node

  • max_depth (int | None) – Maximum traversal depth (None = unlimited)

Returns:

Set of node IDs that can reach target (excluding target)

Return type:

set[str]

strongly_connected_components()[source]

Find strongly connected components using Tarjan’s algorithm.

Returns:

List of SCCs, each as a frozenset of node IDs

Return type:

list[frozenset[str]]

topological_sort()[source]

Topological sort of nodes (only valid for DAG).

Returns:

List of node IDs in topological order

Raises:

ValueError – If graph contains cycles

Return type:

list[str]

condensation()[source]

Create condensation graph (DAG of SCCs).

Each SCC becomes a single node in the condensation graph. Useful for hierarchical decomposition (M2).

Returns:

New CallGraph where each node is an SCC

Return type:

CallGraph

to_dict()[source]

Serialize to dictionary.

Return type:

dict

classmethod from_dict(data)[source]

Deserialize from dictionary.

Parameters:

data (dict)

Return type:

CallGraph

to_dot(title='Call Graph')[source]

Export to Graphviz DOT format.

Parameters:

title (str)

Return type:

str

class curate_ipsum.graph.GraphNode(id, kind, name, location=None, signature=None, docstring=None, metadata=<factory>)[source]

Bases: object

Node in the call graph.

Parameters:
id: str
kind: NodeKind
name: str
location: SourceLocation | None = None
signature: FunctionSignature | None = None
docstring: str | None = None
metadata: dict[str, any]
class curate_ipsum.graph.GraphEdge(source_id, target_id, kind, location=None, is_conditional=False, is_dynamic=False, confidence=1.0)[source]

Bases: object

Edge in the call graph.

Parameters:
source_id: str
target_id: str
kind: EdgeKind
location: SourceLocation | None = None
is_conditional: bool = False
is_dynamic: bool = False
confidence: float = 1.0
class curate_ipsum.graph.NodeKind(*values)[source]

Bases: StrEnum

Kind of node in the call graph.

MODULE = 'module'
CLASS = 'class'
FUNCTION = 'function'
METHOD = 'method'
LAMBDA = 'lambda'
COMPREHENSION = 'comprehension'
class curate_ipsum.graph.EdgeKind(*values)[source]

Bases: StrEnum

Kind of edge in the call graph.

CALLS = 'calls'
DEFINES = 'defines'
INHERITS = 'inherits'
IMPORTS = 'imports'
REFERENCES = 'references'
class curate_ipsum.graph.SourceLocation(file, line_start, line_end, col_start=0, col_end=0)[source]

Bases: object

Source code location.

Parameters:
file: str
line_start: int
line_end: int
col_start: int = 0
col_end: int = 0
contains_line(line)[source]

Check if a line number falls within this location.

Parameters:

line (int)

Return type:

bool

class curate_ipsum.graph.FunctionSignature(name, params=(), return_type=None, decorators=(), is_async=False, is_generator=False)[source]

Bases: object

Function signature information.

Parameters:
name: str
params: tuple[str, ...] = ()
return_type: str | None = None
decorators: tuple[str, ...] = ()
is_async: bool = False
is_generator: bool = False
property arity: int

Number of parameters.

class curate_ipsum.graph.CallGraphExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Bases: ABC

Abstract base class for call graph extraction.

Implementations: - ASTExtractor: Uses Python’s built-in ast module (always available) - ASRExtractor: Uses LPython’s ASR (requires LPython installation)

The interface is designed to be semantic-representation-agnostic, allowing the same downstream code to work with either backend.

Parameters:
  • include_lambdas (bool)

  • include_comprehensions (bool)

  • include_dynamic_calls (bool)

  • resolve_imports (bool)

__init__(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Initialize the extractor.

Parameters:
  • include_lambdas (bool) – Include lambda expressions as nodes

  • include_comprehensions (bool) – Include list/dict/set comprehensions

  • include_dynamic_calls (bool) – Try to resolve getattr/operator.attrgetter calls

  • resolve_imports (bool) – Follow imports to build cross-module graph

abstractmethod extract_file(file_path)[source]

Extract call graph from a single file.

Parameters:

file_path (Path) – Path to Python source file

Returns:

CallGraph containing nodes and edges from the file

Raises:
Return type:

CallGraph

abstractmethod extract_module(source, module_name='<module>')[source]

Extract call graph from source code string.

Parameters:
  • source (str) – Python source code

  • module_name (str) – Name to use for the module node

Returns:

CallGraph containing nodes and edges

Raises:

ParseError – If the source cannot be parsed

Return type:

CallGraph

extract_files(file_paths)[source]

Extract call graph from multiple files.

Parameters:

file_paths (list[Path]) – List of paths to Python source files

Returns:

Combined CallGraph from all files

Return type:

CallGraph

extract_directory(directory, pattern='**/*.py', exclude=None)[source]

Extract call graph from all Python files in a directory.

Parameters:
  • directory (Path) – Root directory to search

  • pattern (str) – Glob pattern for file matching

  • exclude (set[str] | None) – Set of directory/file names to exclude

Returns:

Combined CallGraph from all matching files

Return type:

CallGraph

abstract property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

abstract property backend_version: str

Version of the extraction backend.

class curate_ipsum.graph.ASTExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Bases: CallGraphExtractor

Call graph extractor using Python’s built-in AST module.

This is the default extractor that is always available without external dependencies.

Parameters:
  • include_lambdas (bool)

  • include_comprehensions (bool)

  • include_dynamic_calls (bool)

  • resolve_imports (bool)

extract_file(file_path)[source]

Extract call graph from a file.

Parameters:

file_path (Path)

Return type:

CallGraph

extract_module(source, module_name='<module>', file_path='<string>')[source]

Extract call graph from source string.

Parameters:
  • source (str)

  • module_name (str)

  • file_path (str)

Return type:

CallGraph

property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

property backend_version: str

Version of the extraction backend.

class curate_ipsum.graph.ASRExtractor(lpython_path=None, **kwargs)[source]

Bases: CallGraphExtractor

Call graph extractor using LPython’s ASR.

ASR (Abstract Semantic Representation) provides richer semantic information than Python’s AST, making call resolution more accurate.

Note: LPython requires type-annotated Python code. Code without type annotations may fail to parse or produce incomplete ASR.

Parameters:

lpython_path (str | None)

__init__(lpython_path=None, **kwargs)[source]

Initialize ASR extractor.

Parameters:
  • lpython_path (str | None) – Path to lpython executable (auto-detected if None)

  • **kwargs – Additional arguments passed to parent

extract_file(file_path)[source]

Extract call graph from a file using LPython ASR.

Parameters:

file_path (Path)

Return type:

CallGraph

extract_module(source, module_name='<module>', file_path='<string>')[source]

Extract call graph from source string using LPython ASR.

Parameters:
  • source (str)

  • module_name (str)

  • file_path (str)

Return type:

CallGraph

property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

property backend_version: str

Version of the extraction backend.

class curate_ipsum.graph.DependencyExtractor(include_stdlib=False, include_thirdparty=False)[source]

Bases: object

Extract module-level dependency graph from a Python project.

Produces a CallGraph where: - Each module is a GraphNode with NodeKind.MODULE - Each import is a GraphEdge with EdgeKind.IMPORTS - Only local (intra-project) imports create edges - stdlib and third-party imports are excluded

Parameters:
  • include_stdlib (bool)

  • include_thirdparty (bool)

__init__(include_stdlib=False, include_thirdparty=False)[source]
Parameters:
  • include_stdlib (bool) – If True, include stdlib modules as nodes/edges.

  • include_thirdparty (bool) – If True, include third-party modules as nodes/edges.

extract_directory(directory, pattern='**/*.py', exclude=None)[source]

Build a module-level dependency graph from all Python files in a directory.

Parameters:
  • directory (Path) – Root directory of the Python project.

  • pattern (str) – Glob pattern for matching Python files.

  • exclude (set[str] | None) – Directory/file names to skip.

Returns:

CallGraph with MODULE nodes and IMPORTS edges.

Return type:

CallGraph

extract_file(file_path)[source]

Extract imports from a single Python file.

Returns a list of ImportInfo objects.

Parameters:

file_path (Path)

Return type:

list[ImportInfo]

curate_ipsum.graph.get_extractor(backend='auto', **kwargs)[source]

Factory function to get an appropriate extractor.

Parameters:
  • backend (str) – Backend to use (‘auto’, ‘ast’, ‘asr’)

  • **kwargs – Additional arguments passed to extractor constructor

Returns:

CallGraphExtractor instance

Raises:

ValueError – If requested backend is not available

Return type:

CallGraphExtractor

exception curate_ipsum.graph.ExtractorError[source]

Bases: Exception

Base exception for extraction errors.

exception curate_ipsum.graph.ParseError[source]

Bases: ExtractorError

Error parsing source code.

exception curate_ipsum.graph.UnsupportedFeatureError[source]

Bases: ExtractorError

Source contains features not supported by this extractor.

exception curate_ipsum.graph.LPythonNotFoundError[source]

Bases: ImportError

LPython is not installed or not in PATH.

class curate_ipsum.graph.PlanarityResult(is_planar, planar_subgraph, non_planar_edges, kuratowski_edges, embedding)[source]

Bases: object

Result of planarity analysis on a subgraph.

Parameters:
is_planar: bool

Whether the graph is planar.

planar_subgraph: CallGraph

The maximal planar subgraph (equals the input graph if planar).

non_planar_edges: set[GraphEdge]

Edges removed to achieve planarity (empty if already planar).

kuratowski_edges: set[tuple[str, str]] | None

Edge set of the Kuratowski subgraph (K₅ or K₃,₃) if non-planar, else None.

embedding: dict | None

node_id → ordered list of neighbor IDs representing the clockwise order of edges around each vertex. None if graph has 0 or 1 nodes.

Type:

Planar embedding as a dict

curate_ipsum.graph.check_planarity(graph, edge_kinds=None)[source]

Test if a CallGraph is planar and extract planarity-related structures.

Uses networkx’s Boyer-Myrvold O(n) planarity test. If the graph is not planar, identifies a Kuratowski subgraph and computes a maximal planar subgraph by removing edges from the Kuratowski certificate.

Planarity is tested on the underlying undirected graph (ignoring edge direction), since planarity is a property of the undirected structure.

Parameters:
  • graph (CallGraph) – The CallGraph to test.

  • edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.

Returns:

PlanarityResult with planarity status, planar subgraph, removed edges, and Kuratowski subgraph if applicable.

Return type:

PlanarityResult

curate_ipsum.graph.callgraph_to_networkx(graph, edge_kinds=None, as_undirected=False)[source]

Convert a CallGraph to a networkx graph.

Preserves node and edge metadata as attributes.

Parameters:
  • graph (CallGraph) – The CallGraph to convert.

  • edge_kinds (set[EdgeKind] | None) – Edge types to include. Default: all.

  • as_undirected (bool) – If True, return an undirected Graph.

Returns:

networkx DiGraph (or Graph if as_undirected=True).

Return type:

nx.DiGraph | nx.Graph

curate_ipsum.graph.networkx_to_callgraph(nx_graph, original=None)[source]

Convert a networkx graph back to a CallGraph.

If an original CallGraph is provided, uses it to restore full GraphNode/GraphEdge metadata for nodes/edges present in nx_graph.

Parameters:
  • nx_graph (nx.DiGraph | nx.Graph) – The networkx graph.

  • original (CallGraph | None) – Optional original CallGraph for metadata recovery.

Returns:

A new CallGraph.

Return type:

CallGraph

class curate_ipsum.graph.KamedaIndex(left_rank, right_rank, source_id, sink_id, non_planar_reachability, all_node_ids)[source]

Bases: object

O(1) reachability index for a planar directed acyclic graph.

After O(n) construction, reachability queries take O(1) time by comparing 2D labels. For nodes not in the planar subgraph, or when edges were removed during planarization, a fallback set is checked.

Parameters:
left_rank: dict[str, int]

Left-boundary topological rank for each node.

right_rank: dict[str, int]

Right-boundary topological rank for each node.

source_id: str

The single source (or virtual source) of the st-graph.

sink_id: str

The single sink (or virtual sink) of the st-graph.

non_planar_reachability: dict[str, set[str]]

Fallback reachability for edges that were removed during planarization. Maps node_id → set of additional nodes reachable via non-planar edges. Precomputed during build.

all_node_ids: frozenset[str]

All node IDs in the indexed graph.

classmethod build(graph, embedding=None, non_planar_edges=None, source_id=None, sink_id=None, edge_kinds=None)[source]

Build the reachability index from a planar DAG.

Parameters:
  • graph (CallGraph) – A planar directed acyclic graph (the planar subgraph).

  • embedding (dict | None) – Planar embedding dict from planarity.check_planarity(). Maps node_id → clockwise-ordered list of neighbor IDs. If None, a simple topological-order heuristic is used.

  • non_planar_edges (set[GraphEdge] | None) – Edges removed during planarization. These are handled via a fallback BFS precomputation.

  • source_id (str | None) – ID of the source node. If None, auto-detected (node with no incoming edges).

  • sink_id (str | None) – ID of the sink node. If None, auto-detected (node with no outgoing edges).

  • edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.

Returns:

KamedaIndex ready for O(1) queries.

Raises:

ValueError – If graph is not a DAG or has no valid source/sink.

Return type:

KamedaIndex

reaches(source, target)[source]

O(1) reachability query: can source reach target?

Uses the 2D dominance test on left/right ranks, falling back to the precomputed non-planar reachability set for edges that were removed during planarization.

Parameters:
  • source (str) – Source node ID.

  • target (str) – Target node ID.

Returns:

True if there exists a directed path from source to target.

Return type:

bool

all_reachable_from(source)[source]

Get all nodes reachable from source.

Uses the index for the planar portion and merges with non-planar fallback reachability.

Parameters:

source (str) – Source node ID.

Returns:

Set of all reachable node IDs (excluding source itself).

Return type:

set[str]

to_dict()[source]

Serialize the index for storage/transmission.

Return type:

dict

Graph models for call graph representation.

These models are backend-agnostic and can be populated from either Python’s AST or LPython’s ASR.

class curate_ipsum.graph.models.NodeKind(*values)[source]

Bases: StrEnum

Kind of node in the call graph.

MODULE = 'module'
CLASS = 'class'
FUNCTION = 'function'
METHOD = 'method'
LAMBDA = 'lambda'
COMPREHENSION = 'comprehension'
class curate_ipsum.graph.models.EdgeKind(*values)[source]

Bases: StrEnum

Kind of edge in the call graph.

CALLS = 'calls'
DEFINES = 'defines'
INHERITS = 'inherits'
IMPORTS = 'imports'
REFERENCES = 'references'
class curate_ipsum.graph.models.SourceLocation(file, line_start, line_end, col_start=0, col_end=0)[source]

Bases: object

Source code location.

Parameters:
file: str
line_start: int
line_end: int
col_start: int = 0
col_end: int = 0
contains_line(line)[source]

Check if a line number falls within this location.

Parameters:

line (int)

Return type:

bool

class curate_ipsum.graph.models.FunctionSignature(name, params=(), return_type=None, decorators=(), is_async=False, is_generator=False)[source]

Bases: object

Function signature information.

Parameters:
name: str
params: tuple[str, ...] = ()
return_type: str | None = None
decorators: tuple[str, ...] = ()
is_async: bool = False
is_generator: bool = False
property arity: int

Number of parameters.

class curate_ipsum.graph.models.GraphNode(id, kind, name, location=None, signature=None, docstring=None, metadata=<factory>)[source]

Bases: object

Node in the call graph.

Parameters:
id: str
kind: NodeKind
name: str
location: SourceLocation | None = None
signature: FunctionSignature | None = None
docstring: str | None = None
metadata: dict[str, any]
class curate_ipsum.graph.models.GraphEdge(source_id, target_id, kind, location=None, is_conditional=False, is_dynamic=False, confidence=1.0)[source]

Bases: object

Edge in the call graph.

Parameters:
source_id: str
target_id: str
kind: EdgeKind
location: SourceLocation | None = None
is_conditional: bool = False
is_dynamic: bool = False
confidence: float = 1.0
class curate_ipsum.graph.models.CallGraph(nodes=<factory>, edges=<factory>, _outgoing=<factory>, _incoming=<factory>, _by_file=<factory>)[source]

Bases: object

Directed graph representing function calls and definitions.

This is the core data structure for M2 graph-spectral analysis. Designed to be populated from either Python AST or LPython ASR.

Parameters:
nodes: dict[str, GraphNode]
edges: set[GraphEdge]
add_node(node)[source]

Add a node to the graph.

Parameters:

node (GraphNode)

Return type:

None

add_edge(edge)[source]

Add an edge to the graph.

Parameters:

edge (GraphEdge)

Return type:

None

get_node(node_id)[source]

Get a node by ID.

Parameters:

node_id (str)

Return type:

GraphNode | None

get_callees(node_id)[source]

Get IDs of functions called by node_id.

Parameters:

node_id (str)

Return type:

set[str]

get_callers(node_id)[source]

Get IDs of functions that call node_id.

Parameters:

node_id (str)

Return type:

set[str]

get_edges_from(node_id, kind=None)[source]

Get all edges originating from a node.

Parameters:
Return type:

Iterator[GraphEdge]

get_edges_to(node_id, kind=None)[source]

Get all edges pointing to a node.

Parameters:
Return type:

Iterator[GraphEdge]

get_nodes_in_file(file_path)[source]

Get all node IDs in a specific file.

Parameters:

file_path (str)

Return type:

set[str]

functions()[source]

Iterate over all function/method nodes.

Return type:

Iterator[GraphNode]

classes()[source]

Iterate over all class nodes.

Return type:

Iterator[GraphNode]

modules()[source]

Iterate over all module nodes.

Return type:

Iterator[GraphNode]

reachable_from(node_id, max_depth=None)[source]

Get all nodes reachable from node_id via calls.

Parameters:
  • node_id (str) – Starting node

  • max_depth (int | None) – Maximum traversal depth (None = unlimited)

Returns:

Set of reachable node IDs (excluding start node)

Return type:

set[str]

reaches(node_id, max_depth=None)[source]

Get all nodes that can reach node_id via calls.

Parameters:
  • node_id (str) – Target node

  • max_depth (int | None) – Maximum traversal depth (None = unlimited)

Returns:

Set of node IDs that can reach target (excluding target)

Return type:

set[str]

strongly_connected_components()[source]

Find strongly connected components using Tarjan’s algorithm.

Returns:

List of SCCs, each as a frozenset of node IDs

Return type:

list[frozenset[str]]

topological_sort()[source]

Topological sort of nodes (only valid for DAG).

Returns:

List of node IDs in topological order

Raises:

ValueError – If graph contains cycles

Return type:

list[str]

condensation()[source]

Create condensation graph (DAG of SCCs).

Each SCC becomes a single node in the condensation graph. Useful for hierarchical decomposition (M2).

Returns:

New CallGraph where each node is an SCC

Return type:

CallGraph

to_dict()[source]

Serialize to dictionary.

Return type:

dict

classmethod from_dict(data)[source]

Deserialize from dictionary.

Parameters:

data (dict)

Return type:

CallGraph

to_dot(title='Call Graph')[source]

Export to Graphviz DOT format.

Parameters:

title (str)

Return type:

str

Abstract base class for call graph extraction.

This module defines the interface that both Python AST and LPython ASR extractors implement. The common interface allows swapping backends while maintaining consistent behavior.

exception curate_ipsum.graph.extractor.ExtractorError[source]

Bases: Exception

Base exception for extraction errors.

exception curate_ipsum.graph.extractor.ParseError[source]

Bases: ExtractorError

Error parsing source code.

exception curate_ipsum.graph.extractor.UnsupportedFeatureError[source]

Bases: ExtractorError

Source contains features not supported by this extractor.

class curate_ipsum.graph.extractor.CallGraphExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Bases: ABC

Abstract base class for call graph extraction.

Implementations: - ASTExtractor: Uses Python’s built-in ast module (always available) - ASRExtractor: Uses LPython’s ASR (requires LPython installation)

The interface is designed to be semantic-representation-agnostic, allowing the same downstream code to work with either backend.

Parameters:
  • include_lambdas (bool)

  • include_comprehensions (bool)

  • include_dynamic_calls (bool)

  • resolve_imports (bool)

__init__(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Initialize the extractor.

Parameters:
  • include_lambdas (bool) – Include lambda expressions as nodes

  • include_comprehensions (bool) – Include list/dict/set comprehensions

  • include_dynamic_calls (bool) – Try to resolve getattr/operator.attrgetter calls

  • resolve_imports (bool) – Follow imports to build cross-module graph

abstractmethod extract_file(file_path)[source]

Extract call graph from a single file.

Parameters:

file_path (Path) – Path to Python source file

Returns:

CallGraph containing nodes and edges from the file

Raises:
Return type:

CallGraph

abstractmethod extract_module(source, module_name='<module>')[source]

Extract call graph from source code string.

Parameters:
  • source (str) – Python source code

  • module_name (str) – Name to use for the module node

Returns:

CallGraph containing nodes and edges

Raises:

ParseError – If the source cannot be parsed

Return type:

CallGraph

extract_files(file_paths)[source]

Extract call graph from multiple files.

Parameters:

file_paths (list[Path]) – List of paths to Python source files

Returns:

Combined CallGraph from all files

Return type:

CallGraph

extract_directory(directory, pattern='**/*.py', exclude=None)[source]

Extract call graph from all Python files in a directory.

Parameters:
  • directory (Path) – Root directory to search

  • pattern (str) – Glob pattern for file matching

  • exclude (set[str] | None) – Set of directory/file names to exclude

Returns:

Combined CallGraph from all matching files

Return type:

CallGraph

abstract property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

abstract property backend_version: str

Version of the extraction backend.

curate_ipsum.graph.extractor.get_extractor(backend='auto', **kwargs)[source]

Factory function to get an appropriate extractor.

Parameters:
  • backend (str) – Backend to use (‘auto’, ‘ast’, ‘asr’)

  • **kwargs – Additional arguments passed to extractor constructor

Returns:

CallGraphExtractor instance

Raises:

ValueError – If requested backend is not available

Return type:

CallGraphExtractor

Python AST-based call graph extractor.

Uses Python’s built-in ast module to extract function definitions, calls, and their relationships. This is always available without external dependencies.

The extractor performs two passes: 1. Definition pass: Collect all function/class/method definitions 2. Call pass: Collect all calls and resolve targets

For unresolved calls (e.g., method calls on unknown objects), we use heuristics and mark edges with lower confidence.

class curate_ipsum.graph.ast_extractor.ScopeTracker(module_name)[source]

Bases: object

Tracks the current scope during AST traversal.

Maintains a stack of scope names to build fully qualified names.

Parameters:

module_name (str)

push(name, kind)[source]

Enter a new scope.

Parameters:
Return type:

None

pop()[source]

Exit current scope.

Return type:

tuple[str, NodeKind]

property current_fqn: str

Fully qualified name of current scope.

property current_kind: NodeKind

Kind of current scope.

fqn_for(name)[source]

Build FQN for a name in current scope.

Parameters:

name (str)

Return type:

str

property depth: int

Current nesting depth.

class curate_ipsum.graph.ast_extractor.DefinitionVisitor(file_path, module_name, include_lambdas=True, include_comprehensions=False)[source]

Bases: NodeVisitor

First pass: collect all definitions (functions, classes, methods).

Builds a symbol table mapping names to their nodes.

Parameters:
  • file_path (str)

  • module_name (str)

  • include_lambdas (bool)

  • include_comprehensions (bool)

visit_Module(node)[source]

Visit module node.

Parameters:

node (Module)

Return type:

None

visit_ClassDef(node)[source]

Visit class definition.

Parameters:

node (ClassDef)

Return type:

None

visit_FunctionDef(node)[source]

Visit function/method definition.

Parameters:

node (FunctionDef)

Return type:

None

visit_AsyncFunctionDef(node)[source]

Visit async function/method definition.

Parameters:

node (AsyncFunctionDef)

Return type:

None

visit_Lambda(node)[source]

Visit lambda expression.

Parameters:

node (Lambda)

Return type:

None

visit_ListComp(node)[source]
Parameters:

node (ListComp)

Return type:

None

visit_SetComp(node)[source]
Parameters:

node (SetComp)

Return type:

None

visit_DictComp(node)[source]
Parameters:

node (DictComp)

Return type:

None

visit_GeneratorExp(node)[source]
Parameters:

node (GeneratorExp)

Return type:

None

class curate_ipsum.graph.ast_extractor.CallVisitor(file_path, module_name, graph, symbol_table, include_dynamic_calls=True)[source]

Bases: NodeVisitor

Second pass: collect all function/method calls.

Uses the symbol table from the definition pass to resolve call targets.

Parameters:
visit_Module(node)[source]
Parameters:

node (Module)

Return type:

None

visit_ClassDef(node)[source]
Parameters:

node (ClassDef)

Return type:

None

visit_FunctionDef(node)[source]
Parameters:

node (FunctionDef)

Return type:

None

visit_AsyncFunctionDef(node)[source]
Parameters:

node (AsyncFunctionDef)

Return type:

None

visit_Call(node)[source]

Visit function call.

Parameters:

node (Call)

Return type:

None

class curate_ipsum.graph.ast_extractor.ASTExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]

Bases: CallGraphExtractor

Call graph extractor using Python’s built-in AST module.

This is the default extractor that is always available without external dependencies.

Parameters:
  • include_lambdas (bool)

  • include_comprehensions (bool)

  • include_dynamic_calls (bool)

  • resolve_imports (bool)

extract_file(file_path)[source]

Extract call graph from a file.

Parameters:

file_path (Path)

Return type:

CallGraph

extract_module(source, module_name='<module>', file_path='<string>')[source]

Extract call graph from source string.

Parameters:
  • source (str)

  • module_name (str)

  • file_path (str)

Return type:

CallGraph

property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

property backend_version: str

Version of the extraction backend.

LPython ASR-based call graph extractor.

Uses LPython’s Abstract Semantic Representation (ASR) for extraction. ASR provides richer semantic information than Python’s AST, including: - Full type information (requires type annotations) - Resolved imports - Semantic rather than syntactic representation

This extractor requires LPython to be installed:

conda install -c conda-forge lpython

Or built from source:

https://github.com/lcompilers/lpython

exception curate_ipsum.graph.asr_extractor.LPythonNotFoundError[source]

Bases: ImportError

LPython is not installed or not in PATH.

class curate_ipsum.graph.asr_extractor.ASRExtractor(lpython_path=None, **kwargs)[source]

Bases: CallGraphExtractor

Call graph extractor using LPython’s ASR.

ASR (Abstract Semantic Representation) provides richer semantic information than Python’s AST, making call resolution more accurate.

Note: LPython requires type-annotated Python code. Code without type annotations may fail to parse or produce incomplete ASR.

Parameters:

lpython_path (str | None)

__init__(lpython_path=None, **kwargs)[source]

Initialize ASR extractor.

Parameters:
  • lpython_path (str | None) – Path to lpython executable (auto-detected if None)

  • **kwargs – Additional arguments passed to parent

extract_file(file_path)[source]

Extract call graph from a file using LPython ASR.

Parameters:

file_path (Path)

Return type:

CallGraph

extract_module(source, module_name='<module>', file_path='<string>')[source]

Extract call graph from source string using LPython ASR.

Parameters:
  • source (str)

  • module_name (str)

  • file_path (str)

Return type:

CallGraph

property backend_name: str

Name of the extraction backend (e.g., ‘ast’, ‘asr’).

property backend_version: str

Version of the extraction backend.

Module-level dependency graph extraction.

Extracts import relationships between Python modules in a project directory, producing a CallGraph with MODULE nodes and IMPORTS edges. This complements the function-level call graph from ast_extractor.py by providing a higher-level view of module dependencies.

class curate_ipsum.graph.dependency_extractor.ImportInfo(module, names, level, line, is_from_import)[source]

Bases: object

Information about a single import statement.

Parameters:
module
names
level
line
is_from_import
curate_ipsum.graph.dependency_extractor.extract_imports_from_source(source, file_path='<string>')[source]

Extract all import statements from Python source code.

Returns a list of ImportInfo objects, one per import statement.

Parameters:
  • source (str)

  • file_path (str)

Return type:

list[ImportInfo]

class curate_ipsum.graph.dependency_extractor.DependencyExtractor(include_stdlib=False, include_thirdparty=False)[source]

Bases: object

Extract module-level dependency graph from a Python project.

Produces a CallGraph where: - Each module is a GraphNode with NodeKind.MODULE - Each import is a GraphEdge with EdgeKind.IMPORTS - Only local (intra-project) imports create edges - stdlib and third-party imports are excluded

Parameters:
  • include_stdlib (bool)

  • include_thirdparty (bool)

__init__(include_stdlib=False, include_thirdparty=False)[source]
Parameters:
  • include_stdlib (bool) – If True, include stdlib modules as nodes/edges.

  • include_thirdparty (bool) – If True, include third-party modules as nodes/edges.

extract_directory(directory, pattern='**/*.py', exclude=None)[source]

Build a module-level dependency graph from all Python files in a directory.

Parameters:
  • directory (Path) – Root directory of the Python project.

  • pattern (str) – Glob pattern for matching Python files.

  • exclude (set[str] | None) – Directory/file names to skip.

Returns:

CallGraph with MODULE nodes and IMPORTS edges.

Return type:

CallGraph

extract_file(file_path)[source]

Extract imports from a single Python file.

Returns a list of ImportInfo objects.

Parameters:

file_path (Path)

Return type:

list[ImportInfo]

Planar subgraph identification and Kuratowski subgraph extraction.

For each subgraph, determines if it is planar. If not, extracts the Kuratowski subgraph (K₅ or K₃,₃ minor) and computes a maximal planar subgraph by iteratively removing edges that break planarity.

Uses networkx for O(n) Boyer-Myrvold planarity testing.

References

Boyer, J. & Myrvold, W. (2004). On the cutting edge:

Simplified O(n) planarity by edge addition.

Kuratowski, K. (1930). Sur le problème des courbes gauches en topologie. DECISIONS.md → D-006

Requires: networkx>=3.0

class curate_ipsum.graph.planarity.PlanarityResult(is_planar, planar_subgraph, non_planar_edges, kuratowski_edges, embedding)[source]

Bases: object

Result of planarity analysis on a subgraph.

Parameters:
is_planar: bool

Whether the graph is planar.

planar_subgraph: CallGraph

The maximal planar subgraph (equals the input graph if planar).

non_planar_edges: set[GraphEdge]

Edges removed to achieve planarity (empty if already planar).

kuratowski_edges: set[tuple[str, str]] | None

Edge set of the Kuratowski subgraph (K₅ or K₃,₃) if non-planar, else None.

embedding: dict | None

node_id → ordered list of neighbor IDs representing the clockwise order of edges around each vertex. None if graph has 0 or 1 nodes.

Type:

Planar embedding as a dict

curate_ipsum.graph.planarity.callgraph_to_networkx(graph, edge_kinds=None, as_undirected=False)[source]

Convert a CallGraph to a networkx graph.

Preserves node and edge metadata as attributes.

Parameters:
  • graph (CallGraph) – The CallGraph to convert.

  • edge_kinds (set[EdgeKind] | None) – Edge types to include. Default: all.

  • as_undirected (bool) – If True, return an undirected Graph.

Returns:

networkx DiGraph (or Graph if as_undirected=True).

Return type:

nx.DiGraph | nx.Graph

curate_ipsum.graph.planarity.networkx_to_callgraph(nx_graph, original=None)[source]

Convert a networkx graph back to a CallGraph.

If an original CallGraph is provided, uses it to restore full GraphNode/GraphEdge metadata for nodes/edges present in nx_graph.

Parameters:
  • nx_graph (nx.DiGraph | nx.Graph) – The networkx graph.

  • original (CallGraph | None) – Optional original CallGraph for metadata recovery.

Returns:

A new CallGraph.

Return type:

CallGraph

curate_ipsum.graph.planarity.check_planarity(graph, edge_kinds=None)[source]

Test if a CallGraph is planar and extract planarity-related structures.

Uses networkx’s Boyer-Myrvold O(n) planarity test. If the graph is not planar, identifies a Kuratowski subgraph and computes a maximal planar subgraph by removing edges from the Kuratowski certificate.

Planarity is tested on the underlying undirected graph (ignoring edge direction), since planarity is a property of the undirected structure.

Parameters:
  • graph (CallGraph) – The CallGraph to test.

  • edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.

Returns:

PlanarityResult with planarity status, planar subgraph, removed edges, and Kuratowski subgraph if applicable.

Return type:

PlanarityResult

Kameda O(1) reachability index for planar directed acyclic graphs.

Implements a variant of Kameda’s 1975 algorithm: for a planar st-DAG (single source s, single sink t), we compute a 2D label for each node such that reachability can be answered in O(1) by comparing labels.

The key insight: in a planar st-digraph, each node can be assigned an interval [left, right] such that u reaches v if and only if u’s interval contains v’s interval (or equivalently, the left/right orderings are consistent).

Algorithm:
  1. Ensure graph is a DAG with single source and single sink (add virtual s/t if needed).

  2. Compute a topological ordering.

  3. Using the planar embedding, compute “left rank” and “right rank” for each node — these are topological orderings that respect the left-boundary and right-boundary of the planar embedding.

  4. Node u reaches node v iff: left_rank[u] <= left_rank[v] AND right_rank[u] <= right_rank[v]

For non-planar edges (identified by planarity.py), a separate fallback set is maintained and checked via BFS.

References

Kameda, T. (1975). On the vector representation of the reachability

in planar directed graphs. Inf. Proc. Letters 3(3), 75-77.

Tamassia, R. & Tollis, I. (1989). Planar grid embedding in linear time. DECISIONS.md → D-006

Requires: networkx>=3.0 (for planar embedding utilities)

class curate_ipsum.graph.kameda.KamedaIndex(left_rank, right_rank, source_id, sink_id, non_planar_reachability, all_node_ids)[source]

Bases: object

O(1) reachability index for a planar directed acyclic graph.

After O(n) construction, reachability queries take O(1) time by comparing 2D labels. For nodes not in the planar subgraph, or when edges were removed during planarization, a fallback set is checked.

Parameters:
left_rank: dict[str, int]

Left-boundary topological rank for each node.

right_rank: dict[str, int]

Right-boundary topological rank for each node.

source_id: str

The single source (or virtual source) of the st-graph.

sink_id: str

The single sink (or virtual sink) of the st-graph.

non_planar_reachability: dict[str, set[str]]

Fallback reachability for edges that were removed during planarization. Maps node_id → set of additional nodes reachable via non-planar edges. Precomputed during build.

all_node_ids: frozenset[str]

All node IDs in the indexed graph.

classmethod build(graph, embedding=None, non_planar_edges=None, source_id=None, sink_id=None, edge_kinds=None)[source]

Build the reachability index from a planar DAG.

Parameters:
  • graph (CallGraph) – A planar directed acyclic graph (the planar subgraph).

  • embedding (dict | None) – Planar embedding dict from planarity.check_planarity(). Maps node_id → clockwise-ordered list of neighbor IDs. If None, a simple topological-order heuristic is used.

  • non_planar_edges (set[GraphEdge] | None) – Edges removed during planarization. These are handled via a fallback BFS precomputation.

  • source_id (str | None) – ID of the source node. If None, auto-detected (node with no incoming edges).

  • sink_id (str | None) – ID of the sink node. If None, auto-detected (node with no outgoing edges).

  • edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.

Returns:

KamedaIndex ready for O(1) queries.

Raises:

ValueError – If graph is not a DAG or has no valid source/sink.

Return type:

KamedaIndex

reaches(source, target)[source]

O(1) reachability query: can source reach target?

Uses the 2D dominance test on left/right ranks, falling back to the precomputed non-planar reachability set for edges that were removed during planarization.

Parameters:
  • source (str) – Source node ID.

  • target (str) – Target node ID.

Returns:

True if there exists a directed path from source to target.

Return type:

bool

all_reachable_from(source)[source]

Get all nodes reachable from source.

Uses the index for the planar portion and merges with non-planar fallback reachability.

Parameters:

source (str) – Source node ID.

Returns:

Set of all reachable node IDs (excluding source itself).

Return type:

set[str]

to_dict()[source]

Serialize the index for storage/transmission.

Return type:

dict