Question 1

What is a knowledge graph and how does it differ from a relational database?

Accepted Answer

A knowledge graph stores entities (nodes) and typed relationships (edges) between them as first-class citizens. Unlike a relational database where relationships are expressed as foreign keys and require JOIN operations, a knowledge graph uses native adjacency lists for O(degree) hop traversal regardless of graph size. Knowledge graphs excel at: multi-hop path queries, flexible schema (adding new entity types or relationship types without migration), and semantic queries. Use a relational DB for structured, tabular data with complex aggregations; use a knowledge graph for interconnected entity networks.

Question 2

What storage backend should you use for a knowledge graph?

Accepted Answer

The choice depends on your primary query patterns. For deep traversal (3+ hops) and pattern matching: use a native graph database like Neo4j or Amazon Neptune with index-free adjacency. For simple 1-2 hop queries on existing Postgres infrastructure: use relational tables with recursive CTEs. For semantic/text search on entity properties: add Elasticsearch. For vector similarity (semantic entity matching): add pgvector or Pinecone. Most production knowledge graphs use a hybrid: relational for entity storage, graph DB for traversal, vector DB for semantic search.

Question 3

How do you prevent infinite loops in graph traversal?

Accepted Answer

Use a visited set that tracks all explored node IDs. Before adding a neighbor to the BFS/DFS queue, check if it is already in the visited set. In the context of knowledge graphs, also set a max_hops limit to bound the traversal depth and a max_results limit to bound output size. For very large graphs, use bidirectional BFS (search from both source and target simultaneously) to reduce the explored frontier by roughly the square root.

Question 4

How do embeddings enable semantic search in a knowledge graph?

Accepted Answer

Each entity is represented as a high-dimensional embedding vector (e.g., 1536 dimensions from OpenAI text-embedding-3-small), encoding semantic meaning from its properties and description. To find semantically similar entities, compute the cosine similarity between the query embedding and all entity embeddings. An HNSW (Hierarchical Navigable Small World) index in pgvector enables approximate nearest-neighbor search in O(log n) instead of O(n) brute force. This allows queries like "find companies similar to OpenAI" that go beyond keyword matching to semantic understanding.

Question 5

How do you handle graph updates efficiently in a knowledge graph?

Accepted Answer

Cache frequently traversed node neighborhoods in Redis (key: node:{id}:rel:{type}:neighbors) with TTL. On edge insert or delete, invalidate the Redis cache for the affected source node. For embedding updates, use an async pipeline: when entity properties change, enqueue a re-embedding job. Process in batch to minimize API calls to the embedding service. For bulk graph updates (data ingestion), use a staging table and apply changes in transactions to avoid partial graph states visible to live queries.

System Design: Knowledge Graph — Entity Storage, Relationship Traversal, and Semantic Search (2025)

What is a Knowledge Graph?

Data Model: Labeled Property Graph

Storage Backend Options

Multi-Hop Traversal with BFS

Semantic Search with Embeddings