Question 1

What is index-free adjacency and why does it matter for graph traversal?

Accepted Answer

Index-free adjacency means each node record stores direct pointers (physical memory or disk addresses) to its adjacent relationship records, rather than storing neighbor IDs that must be looked up in a global index. This gives O(1) local neighbor access regardless of the total graph size. In a relational database, finding all friends of a user requires a join on the edges table, which is O(log N) via index or O(N) via scan. In a native graph database, the traversal follows direct pointers, so multi-hop traversals (friends of friends) remain fast even at scale.

Question 2

When should you use traversal versus index lookup to start a graph query?

Accepted Answer

Index lookup (label index or property index) is used for the entry-point node — the starting node of a traversal. For example, MATCH (u:User {id: 42}) uses a property index to find node 42 in O(log N). Once the entry node is found, all subsequent hops use index-free adjacency via direct pointers, not the index. Traversal without an entry-point index (full graph scan) should be avoided in production. Always ensure that the property used in the WHERE clause of a MATCH pattern is indexed.

Question 3

How does Cypher query compilation work in a graph database engine?

Accepted Answer

Cypher compilation follows a pipeline: lexing and parsing produce an AST, which is converted to a logical plan (relational-algebra-like operations: NodeByLabelScan, Expand, Filter, Project). The logical plan is optimized — filter predicates are pushed as close to the data source as possible to prune traversal early. The physical plan selects execution strategies: NodeIndexSeek for indexed lookups, Expand(All) for neighbor traversal, and optionally parallel execution for independent branches. The compiled plan is cached by query signature so repeated queries avoid re-compilation.

Question 4

What is the time complexity of PageRank and community detection on a graph database?

Accepted Answer

PageRank runs iteratively: each iteration is O(V + E) where V is the number of nodes and E the number of edges. Convergence typically requires 10-50 iterations, so overall complexity is O(k*(V+E)) for k iterations. The Louvain community detection algorithm is approximately O(V log V) in practice but varies by graph density and modularity landscape. Dijkstra shortest path is O((V + E) log V) with a binary heap. For very large graphs, these algorithms are typically run offline in batch (e.g., via graph processing frameworks) and the results stored back as node properties for real-time lookup.

Question 5

What is index-free adjacency and why does it improve traversal performance?

Accepted Answer

Each node stores direct memory or disk pointers to its adjacent relationship records, enabling O(1) neighbor lookup regardless of graph size; relational databases require index lookups that scale with the total number of edges, not the local neighborhood.

Question 6

How does Cypher query compilation work?

Accepted Answer

The parser produces an AST; the logical planner maps MATCH patterns to logical operators (NodeScan, Expand, Filter); the physical planner selects execution strategies (index seek vs scan, join order) and produces an execution plan; plans are cached by query fingerprint.

Question 7

How is PageRank computed in a graph database?

Accepted Answer

PageRank iterates: each node's rank = (1-d)/N + d * sum(rank(in_neighbor)/out_degree(in_neighbor)); iterations continue until rank values converge below a tolerance threshold; convergence typically requires 20-50 iterations for web-scale graphs.

Question 8

How are large graphs partitioned across multiple machines?

Accepted Answer

Edges are co-located with their source node using hash partitioning on the source node ID; cross-partition edges are tracked in a routing table; multi-hop traversals that cross partitions use scatter-gather coordination.

Graph Database Low-Level Design: Native Graph Storage, Traversal Engine, and Cypher Query Execution

Native Graph Storage Model

Relationship Record Structure

Traversal Engine: DFS and BFS with Visitor Pattern

Cypher Query Execution

Label and Property Indexes

Graph Algorithms

SQL DDL: Relational Analog

Python: Core Operations

Design Considerations Summary