Graph Analysis¶
Call graph extraction, spectral partitioning, Kameda reachability, and hierarchical decomposition.
Graph extraction and spectral analysis module for curate-ipsum.
This module provides:
Call graph extraction (Phase 1 — complete): - AST backend (always available) - ASR backend (requires LPython) - Dependency extraction (import-level)
Graph-spectral analysis (Phase 2): - Laplacian construction and Fiedler vector computation - Recursive spectral partitioning - Virtual sink/source augmentation - Hierarchical SCC condensation - Planarity testing and Kuratowski extraction - Kameda O(1) reachability index
- Usage:
from curate_ipsum.graph import get_extractor, CallGraph
# Extract call graph extractor = get_extractor() graph = extractor.extract_directory(Path(“src/”))
# Spectral partitioning (requires scipy) from curate_ipsum.graph.spectral import compute_fiedler_components from curate_ipsum.graph.partitioner import GraphPartitioner
partitioner = GraphPartitioner(min_partition_size=3) tree = partitioner.partition(graph)
# O(1) reachability (requires networkx) from curate_ipsum.graph.planarity import check_planarity from curate_ipsum.graph.kameda import KamedaIndex
result = check_planarity(graph) index = KamedaIndex.build(result.planar_subgraph, result.embedding) print(index.reaches(“module.func_a”, “module.func_b”))
- class curate_ipsum.graph.CallGraph(nodes=<factory>, edges=<factory>, _outgoing=<factory>, _incoming=<factory>, _by_file=<factory>)[source]¶
Bases:
objectDirected graph representing function calls and definitions.
This is the core data structure for M2 graph-spectral analysis. Designed to be populated from either Python AST or LPython ASR.
- Parameters:
- strongly_connected_components()[source]¶
Find strongly connected components using Tarjan’s algorithm.
- topological_sort()[source]¶
Topological sort of nodes (only valid for DAG).
- Returns:
List of node IDs in topological order
- Raises:
ValueError – If graph contains cycles
- Return type:
- condensation()[source]¶
Create condensation graph (DAG of SCCs).
Each SCC becomes a single node in the condensation graph. Useful for hierarchical decomposition (M2).
- Returns:
New CallGraph where each node is an SCC
- Return type:
- class curate_ipsum.graph.GraphNode(id, kind, name, location=None, signature=None, docstring=None, metadata=<factory>)[source]¶
Bases:
objectNode in the call graph.
- Parameters:
id (str)
kind (NodeKind)
name (str)
location (SourceLocation | None)
signature (FunctionSignature | None)
docstring (str | None)
- location: SourceLocation | None = None¶
- signature: FunctionSignature | None = None¶
- class curate_ipsum.graph.GraphEdge(source_id, target_id, kind, location=None, is_conditional=False, is_dynamic=False, confidence=1.0)[source]¶
Bases:
objectEdge in the call graph.
- Parameters:
- location: SourceLocation | None = None¶
- class curate_ipsum.graph.NodeKind(*values)[source]¶
Bases:
StrEnumKind of node in the call graph.
- MODULE = 'module'¶
- CLASS = 'class'¶
- FUNCTION = 'function'¶
- METHOD = 'method'¶
- LAMBDA = 'lambda'¶
- COMPREHENSION = 'comprehension'¶
- class curate_ipsum.graph.EdgeKind(*values)[source]¶
Bases:
StrEnumKind of edge in the call graph.
- CALLS = 'calls'¶
- DEFINES = 'defines'¶
- INHERITS = 'inherits'¶
- IMPORTS = 'imports'¶
- REFERENCES = 'references'¶
- class curate_ipsum.graph.SourceLocation(file, line_start, line_end, col_start=0, col_end=0)[source]¶
Bases:
objectSource code location.
- class curate_ipsum.graph.FunctionSignature(name, params=(), return_type=None, decorators=(), is_async=False, is_generator=False)[source]¶
Bases:
objectFunction signature information.
- Parameters:
- class curate_ipsum.graph.CallGraphExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Bases:
ABCAbstract base class for call graph extraction.
Implementations: - ASTExtractor: Uses Python’s built-in ast module (always available) - ASRExtractor: Uses LPython’s ASR (requires LPython installation)
The interface is designed to be semantic-representation-agnostic, allowing the same downstream code to work with either backend.
- Parameters:
- __init__(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Initialize the extractor.
- abstractmethod extract_file(file_path)[source]¶
Extract call graph from a single file.
- Parameters:
file_path (Path) – Path to Python source file
- Returns:
CallGraph containing nodes and edges from the file
- Raises:
ParseError – If the file cannot be parsed
FileNotFoundError – If the file doesn’t exist
- Return type:
- abstractmethod extract_module(source, module_name='<module>')[source]¶
Extract call graph from source code string.
- Parameters:
- Returns:
CallGraph containing nodes and edges
- Raises:
ParseError – If the source cannot be parsed
- Return type:
- class curate_ipsum.graph.ASTExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Bases:
CallGraphExtractorCall graph extractor using Python’s built-in AST module.
This is the default extractor that is always available without external dependencies.
- Parameters:
- class curate_ipsum.graph.ASRExtractor(lpython_path=None, **kwargs)[source]¶
Bases:
CallGraphExtractorCall graph extractor using LPython’s ASR.
ASR (Abstract Semantic Representation) provides richer semantic information than Python’s AST, making call resolution more accurate.
Note: LPython requires type-annotated Python code. Code without type annotations may fail to parse or produce incomplete ASR.
- Parameters:
lpython_path (str | None)
- __init__(lpython_path=None, **kwargs)[source]¶
Initialize ASR extractor.
- Parameters:
lpython_path (str | None) – Path to lpython executable (auto-detected if None)
**kwargs – Additional arguments passed to parent
- class curate_ipsum.graph.DependencyExtractor(include_stdlib=False, include_thirdparty=False)[source]¶
Bases:
objectExtract module-level dependency graph from a Python project.
Produces a CallGraph where: - Each module is a GraphNode with NodeKind.MODULE - Each import is a GraphEdge with EdgeKind.IMPORTS - Only local (intra-project) imports create edges - stdlib and third-party imports are excluded
- extract_directory(directory, pattern='**/*.py', exclude=None)[source]¶
Build a module-level dependency graph from all Python files in a directory.
- curate_ipsum.graph.get_extractor(backend='auto', **kwargs)[source]¶
Factory function to get an appropriate extractor.
- Parameters:
backend (str) – Backend to use (‘auto’, ‘ast’, ‘asr’)
**kwargs – Additional arguments passed to extractor constructor
- Returns:
CallGraphExtractor instance
- Raises:
ValueError – If requested backend is not available
- Return type:
- exception curate_ipsum.graph.ExtractorError[source]¶
Bases:
ExceptionBase exception for extraction errors.
- exception curate_ipsum.graph.ParseError[source]¶
Bases:
ExtractorErrorError parsing source code.
- exception curate_ipsum.graph.UnsupportedFeatureError[source]¶
Bases:
ExtractorErrorSource contains features not supported by this extractor.
- exception curate_ipsum.graph.LPythonNotFoundError[source]¶
Bases:
ImportErrorLPython is not installed or not in PATH.
- class curate_ipsum.graph.PlanarityResult(is_planar, planar_subgraph, non_planar_edges, kuratowski_edges, embedding)[source]¶
Bases:
objectResult of planarity analysis on a subgraph.
- Parameters:
- curate_ipsum.graph.check_planarity(graph, edge_kinds=None)[source]¶
Test if a CallGraph is planar and extract planarity-related structures.
Uses networkx’s Boyer-Myrvold O(n) planarity test. If the graph is not planar, identifies a Kuratowski subgraph and computes a maximal planar subgraph by removing edges from the Kuratowski certificate.
Planarity is tested on the underlying undirected graph (ignoring edge direction), since planarity is a property of the undirected structure.
- Parameters:
- Returns:
PlanarityResult with planarity status, planar subgraph, removed edges, and Kuratowski subgraph if applicable.
- Return type:
- curate_ipsum.graph.callgraph_to_networkx(graph, edge_kinds=None, as_undirected=False)[source]¶
Convert a CallGraph to a networkx graph.
Preserves node and edge metadata as attributes.
- curate_ipsum.graph.networkx_to_callgraph(nx_graph, original=None)[source]¶
Convert a networkx graph back to a CallGraph.
If an original CallGraph is provided, uses it to restore full GraphNode/GraphEdge metadata for nodes/edges present in nx_graph.
- class curate_ipsum.graph.KamedaIndex(left_rank, right_rank, source_id, sink_id, non_planar_reachability, all_node_ids)[source]¶
Bases:
objectO(1) reachability index for a planar directed acyclic graph.
After O(n) construction, reachability queries take O(1) time by comparing 2D labels. For nodes not in the planar subgraph, or when edges were removed during planarization, a fallback set is checked.
- Parameters:
- non_planar_reachability: dict[str, set[str]]¶
Fallback reachability for edges that were removed during planarization. Maps node_id → set of additional nodes reachable via non-planar edges. Precomputed during build.
- classmethod build(graph, embedding=None, non_planar_edges=None, source_id=None, sink_id=None, edge_kinds=None)[source]¶
Build the reachability index from a planar DAG.
- Parameters:
graph (CallGraph) – A planar directed acyclic graph (the planar subgraph).
embedding (dict | None) – Planar embedding dict from planarity.check_planarity(). Maps node_id → clockwise-ordered list of neighbor IDs. If None, a simple topological-order heuristic is used.
non_planar_edges (set[GraphEdge] | None) – Edges removed during planarization. These are handled via a fallback BFS precomputation.
source_id (str | None) – ID of the source node. If None, auto-detected (node with no incoming edges).
sink_id (str | None) – ID of the sink node. If None, auto-detected (node with no outgoing edges).
edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.
- Returns:
KamedaIndex ready for O(1) queries.
- Raises:
ValueError – If graph is not a DAG or has no valid source/sink.
- Return type:
- reaches(source, target)[source]¶
O(1) reachability query: can source reach target?
Uses the 2D dominance test on left/right ranks, falling back to the precomputed non-planar reachability set for edges that were removed during planarization.
Graph models for call graph representation.
These models are backend-agnostic and can be populated from either Python’s AST or LPython’s ASR.
- class curate_ipsum.graph.models.NodeKind(*values)[source]¶
Bases:
StrEnumKind of node in the call graph.
- MODULE = 'module'¶
- CLASS = 'class'¶
- FUNCTION = 'function'¶
- METHOD = 'method'¶
- LAMBDA = 'lambda'¶
- COMPREHENSION = 'comprehension'¶
- class curate_ipsum.graph.models.EdgeKind(*values)[source]¶
Bases:
StrEnumKind of edge in the call graph.
- CALLS = 'calls'¶
- DEFINES = 'defines'¶
- INHERITS = 'inherits'¶
- IMPORTS = 'imports'¶
- REFERENCES = 'references'¶
- class curate_ipsum.graph.models.SourceLocation(file, line_start, line_end, col_start=0, col_end=0)[source]¶
Bases:
objectSource code location.
- class curate_ipsum.graph.models.FunctionSignature(name, params=(), return_type=None, decorators=(), is_async=False, is_generator=False)[source]¶
Bases:
objectFunction signature information.
- Parameters:
- class curate_ipsum.graph.models.GraphNode(id, kind, name, location=None, signature=None, docstring=None, metadata=<factory>)[source]¶
Bases:
objectNode in the call graph.
- Parameters:
id (str)
kind (NodeKind)
name (str)
location (SourceLocation | None)
signature (FunctionSignature | None)
docstring (str | None)
- location: SourceLocation | None = None¶
- signature: FunctionSignature | None = None¶
- class curate_ipsum.graph.models.GraphEdge(source_id, target_id, kind, location=None, is_conditional=False, is_dynamic=False, confidence=1.0)[source]¶
Bases:
objectEdge in the call graph.
- Parameters:
- location: SourceLocation | None = None¶
- class curate_ipsum.graph.models.CallGraph(nodes=<factory>, edges=<factory>, _outgoing=<factory>, _incoming=<factory>, _by_file=<factory>)[source]¶
Bases:
objectDirected graph representing function calls and definitions.
This is the core data structure for M2 graph-spectral analysis. Designed to be populated from either Python AST or LPython ASR.
- Parameters:
- strongly_connected_components()[source]¶
Find strongly connected components using Tarjan’s algorithm.
- topological_sort()[source]¶
Topological sort of nodes (only valid for DAG).
- Returns:
List of node IDs in topological order
- Raises:
ValueError – If graph contains cycles
- Return type:
- condensation()[source]¶
Create condensation graph (DAG of SCCs).
Each SCC becomes a single node in the condensation graph. Useful for hierarchical decomposition (M2).
- Returns:
New CallGraph where each node is an SCC
- Return type:
Abstract base class for call graph extraction.
This module defines the interface that both Python AST and LPython ASR extractors implement. The common interface allows swapping backends while maintaining consistent behavior.
- exception curate_ipsum.graph.extractor.ExtractorError[source]¶
Bases:
ExceptionBase exception for extraction errors.
- exception curate_ipsum.graph.extractor.ParseError[source]¶
Bases:
ExtractorErrorError parsing source code.
- exception curate_ipsum.graph.extractor.UnsupportedFeatureError[source]¶
Bases:
ExtractorErrorSource contains features not supported by this extractor.
- class curate_ipsum.graph.extractor.CallGraphExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Bases:
ABCAbstract base class for call graph extraction.
Implementations: - ASTExtractor: Uses Python’s built-in ast module (always available) - ASRExtractor: Uses LPython’s ASR (requires LPython installation)
The interface is designed to be semantic-representation-agnostic, allowing the same downstream code to work with either backend.
- Parameters:
- __init__(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Initialize the extractor.
- abstractmethod extract_file(file_path)[source]¶
Extract call graph from a single file.
- Parameters:
file_path (Path) – Path to Python source file
- Returns:
CallGraph containing nodes and edges from the file
- Raises:
ParseError – If the file cannot be parsed
FileNotFoundError – If the file doesn’t exist
- Return type:
- abstractmethod extract_module(source, module_name='<module>')[source]¶
Extract call graph from source code string.
- Parameters:
- Returns:
CallGraph containing nodes and edges
- Raises:
ParseError – If the source cannot be parsed
- Return type:
- curate_ipsum.graph.extractor.get_extractor(backend='auto', **kwargs)[source]¶
Factory function to get an appropriate extractor.
- Parameters:
backend (str) – Backend to use (‘auto’, ‘ast’, ‘asr’)
**kwargs – Additional arguments passed to extractor constructor
- Returns:
CallGraphExtractor instance
- Raises:
ValueError – If requested backend is not available
- Return type:
Python AST-based call graph extractor.
Uses Python’s built-in ast module to extract function definitions, calls, and their relationships. This is always available without external dependencies.
The extractor performs two passes: 1. Definition pass: Collect all function/class/method definitions 2. Call pass: Collect all calls and resolve targets
For unresolved calls (e.g., method calls on unknown objects), we use heuristics and mark edges with lower confidence.
- class curate_ipsum.graph.ast_extractor.ScopeTracker(module_name)[source]¶
Bases:
objectTracks the current scope during AST traversal.
Maintains a stack of scope names to build fully qualified names.
- Parameters:
module_name (str)
- class curate_ipsum.graph.ast_extractor.DefinitionVisitor(file_path, module_name, include_lambdas=True, include_comprehensions=False)[source]¶
Bases:
NodeVisitorFirst pass: collect all definitions (functions, classes, methods).
Builds a symbol table mapping names to their nodes.
- visit_FunctionDef(node)[source]¶
Visit function/method definition.
- Parameters:
node (FunctionDef)
- Return type:
None
- visit_AsyncFunctionDef(node)[source]¶
Visit async function/method definition.
- Parameters:
node (AsyncFunctionDef)
- Return type:
None
- visit_GeneratorExp(node)[source]¶
- Parameters:
node (GeneratorExp)
- Return type:
None
- class curate_ipsum.graph.ast_extractor.CallVisitor(file_path, module_name, graph, symbol_table, include_dynamic_calls=True)[source]¶
Bases:
NodeVisitorSecond pass: collect all function/method calls.
Uses the symbol table from the definition pass to resolve call targets.
- Parameters:
- visit_FunctionDef(node)[source]¶
- Parameters:
node (FunctionDef)
- Return type:
None
- visit_AsyncFunctionDef(node)[source]¶
- Parameters:
node (AsyncFunctionDef)
- Return type:
None
- class curate_ipsum.graph.ast_extractor.ASTExtractor(include_lambdas=True, include_comprehensions=False, include_dynamic_calls=True, resolve_imports=True)[source]¶
Bases:
CallGraphExtractorCall graph extractor using Python’s built-in AST module.
This is the default extractor that is always available without external dependencies.
- Parameters:
LPython ASR-based call graph extractor.
Uses LPython’s Abstract Semantic Representation (ASR) for extraction. ASR provides richer semantic information than Python’s AST, including: - Full type information (requires type annotations) - Resolved imports - Semantic rather than syntactic representation
- This extractor requires LPython to be installed:
conda install -c conda-forge lpython
- Or built from source:
- exception curate_ipsum.graph.asr_extractor.LPythonNotFoundError[source]¶
Bases:
ImportErrorLPython is not installed or not in PATH.
- class curate_ipsum.graph.asr_extractor.ASRExtractor(lpython_path=None, **kwargs)[source]¶
Bases:
CallGraphExtractorCall graph extractor using LPython’s ASR.
ASR (Abstract Semantic Representation) provides richer semantic information than Python’s AST, making call resolution more accurate.
Note: LPython requires type-annotated Python code. Code without type annotations may fail to parse or produce incomplete ASR.
- Parameters:
lpython_path (str | None)
- __init__(lpython_path=None, **kwargs)[source]¶
Initialize ASR extractor.
- Parameters:
lpython_path (str | None) – Path to lpython executable (auto-detected if None)
**kwargs – Additional arguments passed to parent
Module-level dependency graph extraction.
Extracts import relationships between Python modules in a project directory, producing a CallGraph with MODULE nodes and IMPORTS edges. This complements the function-level call graph from ast_extractor.py by providing a higher-level view of module dependencies.
- class curate_ipsum.graph.dependency_extractor.ImportInfo(module, names, level, line, is_from_import)[source]¶
Bases:
objectInformation about a single import statement.
- Parameters:
- module¶
- names¶
- level¶
- line¶
- is_from_import¶
- curate_ipsum.graph.dependency_extractor.extract_imports_from_source(source, file_path='<string>')[source]¶
Extract all import statements from Python source code.
Returns a list of ImportInfo objects, one per import statement.
- Parameters:
- Return type:
- class curate_ipsum.graph.dependency_extractor.DependencyExtractor(include_stdlib=False, include_thirdparty=False)[source]¶
Bases:
objectExtract module-level dependency graph from a Python project.
Produces a CallGraph where: - Each module is a GraphNode with NodeKind.MODULE - Each import is a GraphEdge with EdgeKind.IMPORTS - Only local (intra-project) imports create edges - stdlib and third-party imports are excluded
- extract_directory(directory, pattern='**/*.py', exclude=None)[source]¶
Build a module-level dependency graph from all Python files in a directory.
Planar subgraph identification and Kuratowski subgraph extraction.
For each subgraph, determines if it is planar. If not, extracts the Kuratowski subgraph (K₅ or K₃,₃ minor) and computes a maximal planar subgraph by iteratively removing edges that break planarity.
Uses networkx for O(n) Boyer-Myrvold planarity testing.
References
- Boyer, J. & Myrvold, W. (2004). On the cutting edge:
Simplified O(n) planarity by edge addition.
Kuratowski, K. (1930). Sur le problème des courbes gauches en topologie. DECISIONS.md → D-006
Requires: networkx>=3.0
- class curate_ipsum.graph.planarity.PlanarityResult(is_planar, planar_subgraph, non_planar_edges, kuratowski_edges, embedding)[source]¶
Bases:
objectResult of planarity analysis on a subgraph.
- Parameters:
- curate_ipsum.graph.planarity.callgraph_to_networkx(graph, edge_kinds=None, as_undirected=False)[source]¶
Convert a CallGraph to a networkx graph.
Preserves node and edge metadata as attributes.
- curate_ipsum.graph.planarity.networkx_to_callgraph(nx_graph, original=None)[source]¶
Convert a networkx graph back to a CallGraph.
If an original CallGraph is provided, uses it to restore full GraphNode/GraphEdge metadata for nodes/edges present in nx_graph.
- curate_ipsum.graph.planarity.check_planarity(graph, edge_kinds=None)[source]¶
Test if a CallGraph is planar and extract planarity-related structures.
Uses networkx’s Boyer-Myrvold O(n) planarity test. If the graph is not planar, identifies a Kuratowski subgraph and computes a maximal planar subgraph by removing edges from the Kuratowski certificate.
Planarity is tested on the underlying undirected graph (ignoring edge direction), since planarity is a property of the undirected structure.
- Parameters:
- Returns:
PlanarityResult with planarity status, planar subgraph, removed edges, and Kuratowski subgraph if applicable.
- Return type:
Kameda O(1) reachability index for planar directed acyclic graphs.
Implements a variant of Kameda’s 1975 algorithm: for a planar st-DAG (single source s, single sink t), we compute a 2D label for each node such that reachability can be answered in O(1) by comparing labels.
The key insight: in a planar st-digraph, each node can be assigned an interval [left, right] such that u reaches v if and only if u’s interval contains v’s interval (or equivalently, the left/right orderings are consistent).
- Algorithm:
Ensure graph is a DAG with single source and single sink (add virtual s/t if needed).
Compute a topological ordering.
Using the planar embedding, compute “left rank” and “right rank” for each node — these are topological orderings that respect the left-boundary and right-boundary of the planar embedding.
Node u reaches node v iff: left_rank[u] <= left_rank[v] AND right_rank[u] <= right_rank[v]
For non-planar edges (identified by planarity.py), a separate fallback set is maintained and checked via BFS.
References
- Kameda, T. (1975). On the vector representation of the reachability
in planar directed graphs. Inf. Proc. Letters 3(3), 75-77.
Tamassia, R. & Tollis, I. (1989). Planar grid embedding in linear time. DECISIONS.md → D-006
Requires: networkx>=3.0 (for planar embedding utilities)
- class curate_ipsum.graph.kameda.KamedaIndex(left_rank, right_rank, source_id, sink_id, non_planar_reachability, all_node_ids)[source]¶
Bases:
objectO(1) reachability index for a planar directed acyclic graph.
After O(n) construction, reachability queries take O(1) time by comparing 2D labels. For nodes not in the planar subgraph, or when edges were removed during planarization, a fallback set is checked.
- Parameters:
- non_planar_reachability: dict[str, set[str]]¶
Fallback reachability for edges that were removed during planarization. Maps node_id → set of additional nodes reachable via non-planar edges. Precomputed during build.
- classmethod build(graph, embedding=None, non_planar_edges=None, source_id=None, sink_id=None, edge_kinds=None)[source]¶
Build the reachability index from a planar DAG.
- Parameters:
graph (CallGraph) – A planar directed acyclic graph (the planar subgraph).
embedding (dict | None) – Planar embedding dict from planarity.check_planarity(). Maps node_id → clockwise-ordered list of neighbor IDs. If None, a simple topological-order heuristic is used.
non_planar_edges (set[GraphEdge] | None) – Edges removed during planarization. These are handled via a fallback BFS precomputation.
source_id (str | None) – ID of the source node. If None, auto-detected (node with no incoming edges).
sink_id (str | None) – ID of the sink node. If None, auto-detected (node with no outgoing edges).
edge_kinds (set[EdgeKind] | None) – Edge types to consider. Default: {CALLS}.
- Returns:
KamedaIndex ready for O(1) queries.
- Raises:
ValueError – If graph is not a DAG or has no valid source/sink.
- Return type:
- reaches(source, target)[source]¶
O(1) reachability query: can source reach target?
Uses the 2D dominance test on left/right ranks, falling back to the precomputed non-planar reachability set for edges that were removed during planarization.