Building a Knowledge Graph
This tutorial shows how to build a knowledge graph for storing and querying interconnected information.
What You'll Build
A knowledge graph of companies, people, and their relationships—similar to what you might find in a business intelligence system.
What You'll Learn
- Model complex domains with multiple node types
- Create various relationship types
- Write sophisticated Cypher queries
- Use aggregation and path queries
Step 1: Design the Schema
Our knowledge graph will have:
Node Types (Labels):
Company- OrganizationsPerson- IndividualsTechnology- Products and technologies
Relationship Types:
WORKS_AT- Person works at CompanyFOUNDED- Person founded CompanyUSES- Company uses TechnologyKNOWS- Person knows Person
Step 2: Create the Graph
from graphqlite import Graph
g = Graph("knowledge.db") # Persistent database
# Companies
g.upsert_node("acme", {"name": "Acme Corp", "founded": 2010, "industry": "Software"}, label="Company")
g.upsert_node("globex", {"name": "Globex Inc", "founded": 2015, "industry": "AI"}, label="Company")
# People
g.upsert_node("alice", {"name": "Alice Chen", "role": "CEO"}, label="Person")
g.upsert_node("bob", {"name": "Bob Smith", "role": "CTO"}, label="Person")
g.upsert_node("carol", {"name": "Carol Jones", "role": "Engineer"}, label="Person")
# Technologies
g.upsert_node("python", {"name": "Python", "type": "Language"}, label="Technology")
g.upsert_node("graphql", {"name": "GraphQL", "type": "API"}, label="Technology")
Step 3: Add Relationships
# Employment
g.upsert_edge("alice", "acme", {"since": 2010, "title": "CEO"}, rel_type="WORKS_AT")
g.upsert_edge("bob", "acme", {"since": 2012, "title": "CTO"}, rel_type="WORKS_AT")
g.upsert_edge("carol", "globex", {"since": 2020, "title": "Senior Engineer"}, rel_type="WORKS_AT")
# Founding
g.upsert_edge("alice", "acme", {"year": 2010}, rel_type="FOUNDED")
# Technology usage
g.upsert_edge("acme", "python", {"primary": True}, rel_type="USES")
g.upsert_edge("acme", "graphql", {"primary": False}, rel_type="USES")
g.upsert_edge("globex", "python", {"primary": True}, rel_type="USES")
# Personal connections
g.upsert_edge("alice", "bob", {"since": 2010}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2019}, rel_type="KNOWS")
Step 4: Query the Knowledge Graph
Find all employees of a company
results = g.query("""
MATCH (p:Person)-[r:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN p.name AS employee, r.title AS title, r.since AS since
ORDER BY r.since
""")
Find companies using a technology
results = g.query("""
MATCH (c:Company)-[:USES]->(t:Technology {name: 'Python'})
RETURN c.name AS company, c.industry AS industry
""")
Find connections between people
results = g.query("""
MATCH path = (a:Person {name: 'Alice Chen'})-[:KNOWS*1..3]->(b:Person)
RETURN b.name AS connected_person, length(path) AS distance
""")
Aggregate: Count employees per company
results = g.query("""
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name AS company, count(p) AS employee_count
ORDER BY employee_count DESC
""")
Find founders who still work at their company
results = g.query("""
MATCH (p:Person)-[:FOUNDED]->(c:Company),
(p)-[:WORKS_AT]->(c)
RETURN p.name AS founder, c.name AS company
""")
Step 5: Update the Graph
Add new information as it becomes available:
# Carol moves to Acme
g.query("""
MATCH (p:Person {name: 'Carol Jones'})-[r:WORKS_AT]->(:Company)
DELETE r
""")
g.upsert_edge("carol", "acme", {"since": 2024, "title": "Staff Engineer"}, rel_type="WORKS_AT")
# Add a new technology
g.upsert_node("rust", {"name": "Rust", "type": "Language"}, label="Technology")
g.upsert_edge("globex", "rust", {"primary": False}, rel_type="USES")
Next Steps
- Graph Analytics - Run algorithms on your knowledge graph
- Graph Algorithms Reference - Available algorithms