GraphQLite
GraphQLite is an SQLite extension that adds graph database capabilities using the Cypher query language.
Store and query graph data directly in SQLite—combining the simplicity of a single-file, zero-config embedded database with Cypher's expressive power for modeling relationships. Perfect for applications that need graph queries without a separate database server, or for local development and learning without standing up additional infrastructure.
Key Features
- Cypher query language - Use the industry-standard graph query language
- Zero configuration - Works with any SQLite database
- Embedded - No separate server process required
- 15+ graph algorithms - PageRank, shortest paths, community detection, and more
- Multiple bindings - Python, Rust, and raw SQL interfaces
Quick Example
from graphqlite import Graph
g = Graph(":memory:")
g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")
results = g.query("MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name")
How This Documentation is Organized
This documentation follows the Diátaxis framework:
- Tutorials - Step-by-step lessons to get you started
- How-to Guides - Practical guides for specific tasks
- Reference - Technical descriptions of Cypher support, APIs, and algorithms
- Explanation - Background and design decisions
Getting Started
This tutorial walks you through installing GraphQLite and running your first graph queries.
What You'll Learn
- Install GraphQLite for Python
- Create nodes and relationships
- Query the graph with Cypher
- Use the high-level Graph API
Prerequisites
- Python 3.8 or later
- pip package manager
Step 1: Install GraphQLite
pip install graphqlite
Step 2: Create Your First Graph
Open a Python shell and create an in-memory graph:
from graphqlite import Graph
# Create an in-memory graph database
g = Graph(":memory:")
Step 3: Add Nodes
Add some people to your graph:
g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_node("carol", {"name": "Carol", "age": 35}, label="Person")
print(g.stats()) # {'nodes': 3, 'edges': 0}
Each node has:
- A unique ID (
"alice","bob","carol") - Properties (key-value pairs like
nameandage) - A label (
Person)
Step 4: Create Relationships
Connect the nodes with relationships:
g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")
g.upsert_edge("alice", "carol", {"since": 2018}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2021}, rel_type="KNOWS")
print(g.stats()) # {'nodes': 3, 'edges': 3}
Step 5: Query with Cypher
Find all people that Alice knows:
results = g.query("""
MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
RETURN friend.name AS name, friend.age AS age
""")
for row in results:
print(f"{row['name']} is {row['age']} years old")
Output:
Bob is 25 years old
Carol is 35 years old
Step 6: Explore the Graph
Use built-in methods to explore:
# Get Alice's neighbors
neighbors = g.get_neighbors("alice")
print([n["id"] for n in neighbors]) # ['bob', 'carol']
# Check if an edge exists
print(g.has_edge("alice", "bob")) # True
print(g.has_edge("bob", "alice")) # False (directed edge)
# Get node degree (total connections)
print(g.node_degree("alice")) # 2
Next Steps
- Building a Knowledge Graph - A more complex tutorial
- Graph Analytics - Use graph algorithms
- Cypher Reference - Full language reference
Getting Started with SQL
This tutorial shows how to use GraphQLite directly from the SQLite command line.
Prerequisites
- SQLite3 CLI installed
- GraphQLite extension built (
make extension)
Step 1: Load the Extension
sqlite3 my_graph.db
.load build/graphqlite.dylib
-- On Linux: .load build/graphqlite.so
-- On Windows: .load build/graphqlite.dll
Step 2: Create Nodes
-- Create people
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');
Step 3: Create Relationships
-- Alice knows Bob
SELECT cypher('
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS]->(b)
');
-- Bob knows Charlie
SELECT cypher('
MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
CREATE (b)-[:KNOWS]->(c)
');
Step 4: Query the Graph
-- Find all people
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');
-- Find relationships
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');
-- Friends of friends
SELECT cypher('
MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
RETURN fof.name
');
Step 5: Using Parameters
-- Safer queries with parameters
SELECT cypher(
'MATCH (p:Person {name: $name}) RETURN p.age',
'{"name": "Alice"}'
);
Complete Example
Save this as getting_started.sql:
.load build/graphqlite.dylib
-- Create nodes
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');
-- Create relationships
SELECT cypher('
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS]->(b)
');
SELECT cypher('
MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
CREATE (b)-[:KNOWS]->(c)
');
-- Query
SELECT 'All people:';
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');
SELECT '';
SELECT 'Who knows who:';
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');
SELECT '';
SELECT 'Friends of friends:';
SELECT cypher('
MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
RETURN fof.name
');
Run it:
sqlite3 < getting_started.sql
Next Steps
- Query Patterns - More complex pattern matching
- Graph Algorithms in SQL - PageRank, communities
Query Patterns in SQL
This tutorial covers common MATCH patterns for traversing graphs using SQL.
Setup
.load build/graphqlite.dylib
.mode column
.headers on
-- Build a social network
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('CREATE (c:Person {name: "Charlie"})');
SELECT cypher('CREATE (d:Person {name: "Diana"})');
SELECT cypher('CREATE (e:Person {name: "Eve"})');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:FOLLOWS]->(b)');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (c:Person {name: "Charlie"}) CREATE (a)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"}) CREATE (b)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (d:Person {name: "Diana"}) CREATE (b)-[:FOLLOWS]->(d)');
SELECT cypher('MATCH (c:Person {name: "Charlie"}), (e:Person {name: "Eve"}) CREATE (c)-[:FOLLOWS]->(e)');
SELECT cypher('MATCH (d:Person {name: "Diana"}), (e:Person {name: "Eve"}) CREATE (d)-[:FOLLOWS]->(e)');
Direct Connections
Outgoing relationships
-- Who does Alice follow?
SELECT cypher('
MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->(b)
RETURN b.name
');
Incoming relationships
-- Who follows Charlie?
SELECT cypher('
MATCH (a)-[:FOLLOWS]->(b:Person {name: "Charlie"})
RETURN a.name
');
Multi-Hop Patterns
Two hops
-- Friends of friends (people Alice's follows follow)
SELECT cypher('
MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(c)
RETURN DISTINCT c.name
');
Variable length paths
-- Everyone reachable from Alice in 1-3 hops
SELECT cypher('
MATCH (a:Person {name: "Alice"})-[:FOLLOWS*1..3]->(b)
RETURN DISTINCT b.name
');
Aggregation
Count connections
-- Follower counts
SELECT cypher('
MATCH (a:Person)-[:FOLLOWS]->(b:Person)
RETURN b.name, count(a) as followers
ORDER BY followers DESC
');
Collect into list
-- Group followers by person
SELECT cypher('
MATCH (a:Person)-[:FOLLOWS]->(b:Person)
RETURN b.name, collect(a.name) as followed_by
');
Complex Patterns
Multiple relationship types
-- Find mutual follows
SELECT cypher('
MATCH (a:Person)-[:FOLLOWS]->(b:Person)-[:FOLLOWS]->(a)
RETURN a.name, b.name
');
OPTIONAL MATCH
-- All people and their followers (NULL if none)
SELECT cypher('
MATCH (p:Person)
OPTIONAL MATCH (follower)-[:FOLLOWS]->(p)
RETURN p.name, count(follower) as follower_count
');
Filter with WHERE
-- People followed by more than one person
SELECT cypher('
MATCH (a:Person)-[:FOLLOWS]->(b:Person)
WITH b, count(a) as followers
WHERE followers > 1
RETURN b.name, followers
');
Working with Results in SQL
Extract JSON fields
SELECT
json_extract(value, '$.a.name') as person,
json_extract(value, '$.b.name') as follows
FROM json_each(
cypher('MATCH (a:Person)-[:FOLLOWS]->(b) RETURN a, b')
);
Join with regular tables
-- Assuming you have a users table
WITH graph_data AS (
SELECT
json_extract(value, '$.name') as name,
json_extract(value, '$.followers') as followers
FROM json_each(
cypher('MATCH (a)-[:FOLLOWS]->(b) RETURN b.name as name, count(a) as followers')
)
)
SELECT u.email, g.followers
FROM users u
JOIN graph_data g ON u.username = g.name;
Next Steps
- Graph Algorithms in SQL - PageRank and community detection
- SQL Interface Reference - Complete SQL reference
Building a Knowledge Graph
This tutorial shows how to build a knowledge graph for storing and querying interconnected information.
What You'll Build
A knowledge graph of companies, people, and their relationships—similar to what you might find in a business intelligence system.
What You'll Learn
- Model complex domains with multiple node types
- Create various relationship types
- Write sophisticated Cypher queries
- Use aggregation and path queries
Step 1: Design the Schema
Our knowledge graph will have:
Node Types (Labels):
Company- OrganizationsPerson- IndividualsTechnology- Products and technologies
Relationship Types:
WORKS_AT- Person works at CompanyFOUNDED- Person founded CompanyUSES- Company uses TechnologyKNOWS- Person knows Person
Step 2: Create the Graph
from graphqlite import Graph
g = Graph("knowledge.db") # Persistent database
# Companies
g.upsert_node("acme", {"name": "Acme Corp", "founded": 2010, "industry": "Software"}, label="Company")
g.upsert_node("globex", {"name": "Globex Inc", "founded": 2015, "industry": "AI"}, label="Company")
# People
g.upsert_node("alice", {"name": "Alice Chen", "role": "CEO"}, label="Person")
g.upsert_node("bob", {"name": "Bob Smith", "role": "CTO"}, label="Person")
g.upsert_node("carol", {"name": "Carol Jones", "role": "Engineer"}, label="Person")
# Technologies
g.upsert_node("python", {"name": "Python", "type": "Language"}, label="Technology")
g.upsert_node("graphql", {"name": "GraphQL", "type": "API"}, label="Technology")
Step 3: Add Relationships
# Employment
g.upsert_edge("alice", "acme", {"since": 2010, "title": "CEO"}, rel_type="WORKS_AT")
g.upsert_edge("bob", "acme", {"since": 2012, "title": "CTO"}, rel_type="WORKS_AT")
g.upsert_edge("carol", "globex", {"since": 2020, "title": "Senior Engineer"}, rel_type="WORKS_AT")
# Founding
g.upsert_edge("alice", "acme", {"year": 2010}, rel_type="FOUNDED")
# Technology usage
g.upsert_edge("acme", "python", {"primary": True}, rel_type="USES")
g.upsert_edge("acme", "graphql", {"primary": False}, rel_type="USES")
g.upsert_edge("globex", "python", {"primary": True}, rel_type="USES")
# Personal connections
g.upsert_edge("alice", "bob", {"since": 2010}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2019}, rel_type="KNOWS")
Step 4: Query the Knowledge Graph
Find all employees of a company
results = g.query("""
MATCH (p:Person)-[r:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN p.name AS employee, r.title AS title, r.since AS since
ORDER BY r.since
""")
Find companies using a technology
results = g.query("""
MATCH (c:Company)-[:USES]->(t:Technology {name: 'Python'})
RETURN c.name AS company, c.industry AS industry
""")
Find connections between people
results = g.query("""
MATCH path = (a:Person {name: 'Alice Chen'})-[:KNOWS*1..3]->(b:Person)
RETURN b.name AS connected_person, length(path) AS distance
""")
Aggregate: Count employees per company
results = g.query("""
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name AS company, count(p) AS employee_count
ORDER BY employee_count DESC
""")
Find founders who still work at their company
results = g.query("""
MATCH (p:Person)-[:FOUNDED]->(c:Company),
(p)-[:WORKS_AT]->(c)
RETURN p.name AS founder, c.name AS company
""")
Step 5: Update the Graph
Add new information as it becomes available:
# Carol moves to Acme
g.query("""
MATCH (p:Person {name: 'Carol Jones'})-[r:WORKS_AT]->(:Company)
DELETE r
""")
g.upsert_edge("carol", "acme", {"since": 2024, "title": "Staff Engineer"}, rel_type="WORKS_AT")
# Add a new technology
g.upsert_node("rust", {"name": "Rust", "type": "Language"}, label="Technology")
g.upsert_edge("globex", "rust", {"primary": False}, rel_type="USES")
Next Steps
- Graph Analytics - Run algorithms on your knowledge graph
- Graph Algorithms Reference - Available algorithms
Graph Analytics
This tutorial shows how to use GraphQLite's built-in graph algorithms for analysis.
What You'll Learn
- Run centrality algorithms to find important nodes
- Detect communities in your graph
- Find shortest paths between nodes
- Use algorithm results in your applications
Setup: Create a Social Network
from graphqlite import Graph
g = Graph(":memory:")
# Create a small social network
people = ["alice", "bob", "carol", "dave", "eve", "frank", "grace", "henry"]
for person in people:
g.upsert_node(person, {"name": person.title()}, label="Person")
# Create connections (who follows whom)
connections = [
("alice", "bob"), ("alice", "carol"), ("alice", "dave"),
("bob", "carol"), ("bob", "eve"),
("carol", "dave"), ("carol", "eve"), ("carol", "frank"),
("dave", "frank"),
("eve", "frank"), ("eve", "grace"),
("frank", "grace"), ("frank", "henry"),
("grace", "henry"),
]
for source, target in connections:
g.upsert_edge(source, target, {}, rel_type="FOLLOWS")
print(g.stats()) # {'nodes': 8, 'edges': 14}
Centrality: Finding Important Nodes
PageRank
PageRank identifies nodes that are linked to by other important nodes:
results = g.pagerank(damping=0.85, iterations=20)
for r in sorted(results, key=lambda x: x["score"], reverse=True)[:3]:
print(f"{r['user_id']}: {r['score']:.4f}")
Output:
frank: 0.1842
grace: 0.1536
eve: 0.1298
Frank is the most "important" because many well-connected people follow him.
Degree Centrality
Count incoming and outgoing connections:
results = g.degree_centrality()
for r in results:
print(f"{r['user_id']}: in={r['in_degree']}, out={r['out_degree']}")
Betweenness Centrality
Find nodes that act as bridges between communities:
results = g.query("RETURN betweennessCentrality()")
# Carol and Eve have high betweenness - they connect different groups
Community Detection
Label Propagation
Find clusters of densely connected nodes:
results = g.community_detection(max_iterations=10)
communities = {}
for r in results:
label = r["community"]
if label not in communities:
communities[label] = []
communities[label].append(r["user_id"])
for label, members in communities.items():
print(f"Community {label}: {members}")
Louvain Algorithm
For larger graphs, Louvain provides hierarchical community detection:
results = g.query("RETURN louvain(1.0)")
Path Finding
Shortest Path
Find the shortest path between two nodes:
path = g.shortest_path("alice", "henry")
print(f"Distance: {path['distance']}")
print(f"Path: {' -> '.join(path['path'])}")
Output:
Distance: 4
Path: alice -> carol -> frank -> henry
All-Pairs Shortest Paths
Compute distances between all node pairs:
results = g.query("RETURN apsp()")
Connected Components
Weakly Connected Components
Find groups of nodes that are connected (ignoring edge direction):
results = g.connected_components()
Strongly Connected Components
Find groups where every node can reach every other node:
results = g.query("RETURN scc()")
Using Results in Your Application
Algorithm results are returned as lists of dictionaries, making them easy to process:
# Find the top influencers
influencers = g.pagerank()
top_3 = sorted(influencers, key=lambda x: x["score"], reverse=True)[:3]
# Get full node data for top influencers
for inf in top_3:
node = g.get_node(inf["user_id"])
print(f"{node['properties']['name']}: PageRank {inf['score']:.4f}")
Combining Algorithms with Cypher
Use algorithm results to guide Cypher queries:
# Find the most central node
pagerank = g.pagerank()
most_central = max(pagerank, key=lambda x: x["score"])["user_id"]
# Query their connections
results = g.query(f"""
MATCH (p:Person {{name: '{most_central.title()}'}})-[:FOLLOWS]->(friend)
RETURN friend.name AS friend
""")
print(f"Top influencer {most_central} follows: {[r['friend'] for r in results]}")
Next Steps
- Graph Algorithms Reference - Complete algorithm documentation
- Performance - Algorithm performance characteristics
Graph Algorithms in SQL
This tutorial shows how to run graph algorithms and work with their results in SQL.
Setup: Citation Network
.load build/graphqlite.dylib
.mode column
.headers on
-- Create papers
SELECT cypher('CREATE (p:Paper {title: "Foundations"})');
SELECT cypher('CREATE (p:Paper {title: "Methods"})');
SELECT cypher('CREATE (p:Paper {title: "Applications"})');
SELECT cypher('CREATE (p:Paper {title: "Survey"})');
SELECT cypher('CREATE (p:Paper {title: "Analysis"})');
SELECT cypher('CREATE (p:Paper {title: "Review"})');
-- Create citations (citing paper -> cited paper)
SELECT cypher('MATCH (a:Paper {title: "Methods"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Applications"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Analysis"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Review"}), (b:Paper {title: "Survey"}) CREATE (a)-[:CITES]->(b)');
PageRank
Find influential papers based on citation structure.
Basic usage
SELECT cypher('RETURN pageRank(0.85, 20)');
Extract as table
SELECT
json_extract(value, '$.node_id') as id,
json_extract(value, '$.user_id') as paper_id,
printf('%.4f', json_extract(value, '$.score')) as score
FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
ORDER BY json_extract(value, '$.score') DESC;
Join with node properties
WITH rankings AS (
SELECT
json_extract(value, '$.node_id') as node_id,
json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
)
SELECT
n.user_id as paper,
printf('%.4f', r.score) as influence
FROM rankings r
JOIN nodes n ON n.id = r.node_id
ORDER BY r.score DESC;
Community Detection
Label Propagation
SELECT cypher('RETURN labelPropagation(10)');
Group by community
WITH communities AS (
SELECT
json_extract(value, '$.node_id') as node_id,
json_extract(value, '$.community') as community
FROM json_each(cypher('RETURN labelPropagation(10)'))
)
SELECT
c.community,
group_concat(n.user_id) as papers
FROM communities c
JOIN nodes n ON n.id = c.node_id
GROUP BY c.community;
Louvain
SELECT cypher('RETURN louvain(1.0)');
Centrality Metrics
Degree Centrality
SELECT
json_extract(value, '$.user_id') as paper,
json_extract(value, '$.in_degree') as cited_by,
json_extract(value, '$.out_degree') as cites
FROM json_each(cypher('RETURN degreeCentrality()'))
ORDER BY json_extract(value, '$.in_degree') DESC;
Betweenness Centrality
SELECT
json_extract(value, '$.user_id') as paper,
printf('%.4f', json_extract(value, '$.score')) as betweenness
FROM json_each(cypher('RETURN betweennessCentrality()'))
ORDER BY json_extract(value, '$.score') DESC;
Path Finding
Shortest Path
SELECT cypher('RETURN dijkstra("Review", "Foundations")');
Result shows path and distance:
{"distance": 3, "path": ["Review", "Survey", "Foundations"]}
Combining Algorithms with Queries
Find most influential paper's citations
-- Get top paper by PageRank
WITH top_paper AS (
SELECT json_extract(value, '$.user_id') as paper_id
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY json_extract(value, '$.score') DESC
LIMIT 1
)
-- Find what it cites
SELECT cypher(
'MATCH (p:Paper {title: "' || paper_id || '"})-[:CITES]->(cited) RETURN cited.title'
)
FROM top_paper;
Export for visualization
-- Export nodes
.mode csv
.output nodes.csv
SELECT
json_extract(value, '$.node_id') as id,
json_extract(value, '$.user_id') as label,
json_extract(value, '$.score') as pagerank
FROM json_each(cypher('RETURN pageRank()'));
-- Export edges
.output edges.csv
SELECT
source_id, target_id, label as type
FROM edges;
.output stdout
Performance Tips
-
Limit output for large graphs:
SELECT * FROM json_each(cypher('RETURN pageRank()')) LIMIT 100; -
Create views for repeated queries:
CREATE VIEW paper_influence AS SELECT json_extract(value, '$.node_id') as node_id, json_extract(value, '$.score') as score FROM json_each(cypher('RETURN pageRank()')); -
Index algorithm results if needed repeatedly:
CREATE TABLE pagerank_cache AS SELECT * FROM json_each(cypher('RETURN pageRank()')); CREATE INDEX idx_pagerank ON pagerank_cache(json_extract(value, '$.score'));
Next Steps
- Graph Algorithms Reference - All available algorithms
- Performance Guide - Algorithm performance characteristics
Building a GraphRAG System
This tutorial shows how to build a Graph Retrieval-Augmented Generation (GraphRAG) system using GraphQLite.
What is GraphRAG?
GraphRAG combines:
- Document chunking - Split documents into processable pieces
- Entity extraction - Identify entities and relationships
- Graph storage - Store entities as nodes, relationships as edges
- Vector search - Find relevant chunks by semantic similarity
- Graph traversal - Expand context using graph structure
Prerequisites
pip install graphqlite sentence-transformers sqlite-vec spacy
python -m spacy download en_core_web_sm
Architecture
Query: "Who are the tech leaders?"
│
▼
┌─────────────────────┐
│ 1. Vector Search │ Find chunks similar to query
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ 2. Graph Lookup │ MATCH (chunk)-[:MENTIONS]->(entity)
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ 3. Graph Traversal │ MATCH (entity)-[*1..2]-(related)
└─────────┬───────────┘
│
▼
Context for LLM
Step 1: Document Chunking
from dataclasses import dataclass
from typing import List
@dataclass
class Chunk:
chunk_id: str
doc_id: str
text: str
start_char: int
end_char: int
def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50, doc_id: str = "doc") -> List[Chunk]:
"""Split text into overlapping chunks."""
words = text.split()
chunks = []
start = 0
chunk_index = 0
while start < len(words):
end = min(start + chunk_size, len(words))
chunk_words = words[start:end]
chunk_text = " ".join(chunk_words)
# Calculate character positions
start_char = len(" ".join(words[:start])) + (1 if start > 0 else 0)
end_char = start_char + len(chunk_text)
chunks.append(Chunk(
chunk_id=f"{doc_id}_chunk_{chunk_index}",
doc_id=doc_id,
text=chunk_text,
start_char=start_char,
end_char=end_char,
))
start += chunk_size - overlap
chunk_index += 1
return chunks
Step 2: Entity Extraction
import spacy
nlp = spacy.load("en_core_web_sm")
def extract_entities(text: str) -> List[dict]:
"""Extract named entities from text."""
doc = nlp(text)
entities = []
for ent in doc.ents:
entities.append({
"text": ent.text,
"label": ent.label_,
"start": ent.start_char,
"end": ent.end_char,
})
return entities
def extract_relationships(entities: List[dict]) -> List[tuple]:
"""Create co-occurrence relationships between entities."""
relationships = []
for i, e1 in enumerate(entities):
for e2 in entities[i+1:]:
relationships.append((
e1["text"],
e2["text"],
"CO_OCCURS",
))
return relationships
Step 3: Build the Knowledge Graph
from graphqlite import Graph
g = Graph("knowledge.db")
def ingest_document(doc_id: str, text: str):
"""Process a document and add to knowledge graph."""
# Chunk the document
chunks = chunk_text(text, doc_id=doc_id)
for chunk in chunks:
# Store chunk as node
g.upsert_node(
chunk.chunk_id,
{"text": chunk.text[:500], "doc_id": doc_id}, # Truncate for storage
label="Chunk"
)
# Extract and store entities
entities = extract_entities(chunk.text)
for entity in entities:
entity_id = entity["text"].lower().replace(" ", "_")
# Create entity node
g.upsert_node(
entity_id,
{"name": entity["text"], "type": entity["label"]},
label="Entity"
)
# Link chunk to entity
g.upsert_edge(
chunk.chunk_id,
entity_id,
{},
rel_type="MENTIONS"
)
# Create entity co-occurrence edges
relationships = extract_relationships(entities)
for source, target, rel_type in relationships:
source_id = source.lower().replace(" ", "_")
target_id = target.lower().replace(" ", "_")
g.upsert_edge(source_id, target_id, {}, rel_type=rel_type)
Step 4: Add Vector Search
import sqlite3
import sqlite_vec
from sentence_transformers import SentenceTransformer
# Initialize embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
def setup_vector_search(conn: sqlite3.Connection):
"""Set up vector search table."""
sqlite_vec.load(conn)
conn.execute("""
CREATE VIRTUAL TABLE IF NOT EXISTS chunk_embeddings USING vec0(
chunk_id TEXT PRIMARY KEY,
embedding FLOAT[384]
)
""")
def embed_chunks(conn: sqlite3.Connection, chunks: List[Chunk]):
"""Embed chunks and store vectors."""
texts = [c.text for c in chunks]
embeddings = model.encode(texts)
for chunk, embedding in zip(chunks, embeddings):
conn.execute(
"INSERT OR REPLACE INTO chunk_embeddings (chunk_id, embedding) VALUES (?, ?)",
[chunk.chunk_id, embedding.tobytes()]
)
conn.commit()
def vector_search(conn: sqlite3.Connection, query: str, k: int = 5) -> List[str]:
"""Find chunks similar to query."""
query_embedding = model.encode([query])[0]
results = conn.execute("""
SELECT chunk_id
FROM chunk_embeddings
WHERE embedding MATCH ?
LIMIT ?
""", [query_embedding.tobytes(), k]).fetchall()
return [r[0] for r in results]
Step 5: GraphRAG Retrieval
def graphrag_retrieve(query: str, k_chunks: int = 5, expand_hops: int = 1) -> dict:
"""
Retrieve context using GraphRAG:
1. Vector search for relevant chunks
2. Find entities mentioned in those chunks
3. Expand to related entities via graph
"""
# Get underlying connection for vector search
conn = g.connection.sqlite_connection
# Step 1: Vector search
chunk_ids = vector_search(conn, query, k=k_chunks)
# Step 2: Get entities from chunks
entities = set()
for chunk_id in chunk_ids:
results = g.query(f"""
MATCH (c:Chunk {{id: '{chunk_id}'}})-[:MENTIONS]->(e:Entity)
RETURN e.name
""")
for r in results:
entities.add(r["e.name"])
# Step 3: Expand via graph
related_entities = set()
for entity in entities:
entity_id = entity.lower().replace(" ", "_")
results = g.query(f"""
MATCH (e:Entity {{name: '{entity}'}})-[*1..{expand_hops}]-(related:Entity)
RETURN DISTINCT related.name
""")
for r in results:
related_entities.add(r["related.name"])
# Get chunk texts
chunk_texts = []
for chunk_id in chunk_ids:
node = g.get_node(chunk_id)
if node:
chunk_texts.append(node["properties"].get("text", ""))
return {
"chunks": chunk_texts,
"entities": list(entities),
"related_entities": list(related_entities - entities),
}
Step 6: Complete Pipeline
# Initialize
g = Graph("graphrag.db")
conn = sqlite3.connect("graphrag.db")
setup_vector_search(conn)
# Ingest documents
documents = [
{"id": "doc1", "text": "Apple Inc. was founded by Steve Jobs..."},
{"id": "doc2", "text": "Microsoft, led by Satya Nadella..."},
]
for doc in documents:
ingest_document(doc["id"], doc["text"])
chunks = chunk_text(doc["text"], doc_id=doc["id"])
embed_chunks(conn, chunks)
# Query
context = graphrag_retrieve("Who are the tech industry leaders?")
print("Relevant chunks:", len(context["chunks"]))
print("Entities:", context["entities"])
print("Related:", context["related_entities"])
# Use context with an LLM
# response = llm.generate(query, context=context)
Graph Algorithms for GraphRAG
Use graph algorithms to enhance retrieval:
# Find important entities
important = g.pagerank()
top_entities = sorted(important, key=lambda x: x["score"], reverse=True)[:10]
# Find entity communities
communities = g.community_detection()
# Find central entities (good for summarization)
central = g.query("RETURN betweennessCentrality()")
Example Project
See examples/llm-graphrag/ for a complete GraphRAG implementation using the HotpotQA multi-hop reasoning dataset:
- Graph-based knowledge storage with Cypher queries
- sqlite-vec for vector similarity search
- Ollama integration for local LLM inference
- Community detection for topic-based retrieval
cd examples/llm-graphrag
uv sync
uv run python ingest.py # Ingest HotpotQA dataset
uv run python rag.py # Interactive query mode
Next Steps
- Graph Algorithms - All available algorithms
- Python API - Complete API reference
Installation
Python (Recommended)
pip install graphqlite
This installs pre-built binaries for:
- Linux (x86_64, aarch64)
- macOS (arm64, x86_64)
- Windows (x86_64)
Rust
Add to your Cargo.toml:
[dependencies]
graphqlite = "0.2"
From Source
Building from source requires:
- GCC or Clang
- Bison (3.0+)
- Flex
- SQLite development headers
macOS
brew install bison flex sqlite
export PATH="$(brew --prefix bison)/bin:$PATH"
make extension RELEASE=1
Linux (Debian/Ubuntu)
sudo apt-get install build-essential bison flex libsqlite3-dev
make extension RELEASE=1
Windows (MSYS2)
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-sqlite3 bison flex make
make extension RELEASE=1
The extension will be built to:
build/graphqlite.dylib(macOS)build/graphqlite.so(Linux)build/graphqlite.dll(Windows)
Verifying Installation
Python
import graphqlite
print(graphqlite.__version__)
# Quick test
from graphqlite import Graph
g = Graph(":memory:")
g.upsert_node("test", {"name": "Test"})
print(g.stats()) # {'nodes': 1, 'edges': 0}
SQL
sqlite3
.load /path/to/graphqlite
SELECT cypher('RETURN 1 + 1 AS result');
Troubleshooting
Extension not found
If you get FileNotFoundError: GraphQLite extension not found:
- Build the extension:
make extension RELEASE=1 - Set the path explicitly:
from graphqlite import connect conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib") - Or set an environment variable:
export GRAPHQLITE_EXTENSION_PATH=/path/to/graphqlite.dylib
macOS: Library not loaded
If you see errors about missing SQLite libraries, ensure you're using Homebrew's Python or set DYLD_LIBRARY_PATH:
export DYLD_LIBRARY_PATH="$(brew --prefix sqlite)/lib:$DYLD_LIBRARY_PATH"
Working with Multiple Graphs
GraphQLite supports managing and querying across multiple graph databases. This is useful for:
- Separation of concerns: Keep different data domains in separate graphs
- Access control: Different graphs can have different permissions
- Performance: Smaller, focused graphs can be faster to query
- Cross-domain queries: Query relationships across different datasets
Using GraphManager (Python)
The GraphManager class manages multiple graph databases in a directory:
from graphqlite import graphs
# Create a manager for a directory
with graphs("./data") as gm:
# Create graphs
social = gm.create("social")
products = gm.create("products")
# Add data to each graph
social.upsert_node("alice", {"name": "Alice", "age": 30}, "Person")
social.upsert_node("bob", {"name": "Bob", "age": 25}, "Person")
social.upsert_edge("alice", "bob", {"since": 2020}, "KNOWS")
products.upsert_node("phone", {"name": "iPhone", "price": 999}, "Product")
products.upsert_node("laptop", {"name": "MacBook", "price": 1999}, "Product")
# List all graphs
print(gm.list()) # ['products', 'social']
# Check if a graph exists
if "social" in gm:
print("Social graph exists")
Opening Existing Graphs
from graphqlite import graphs
with graphs("./data") as gm:
# Open an existing graph
social = gm.open("social")
# Or create if it doesn't exist
cache = gm.open_or_create("cache")
# Query the graph
result = social.query("MATCH (n:Person) RETURN n.name")
for row in result:
print(row["n.name"])
Dropping Graphs
from graphqlite import graphs
with graphs("./data") as gm:
# Delete a graph and its file
gm.drop("cache")
Cross-Graph Queries
GraphQLite supports querying across multiple graphs using the FROM clause:
from graphqlite import graphs
with graphs("./data") as gm:
# Create and populate graphs
social = gm.create("social")
social.upsert_node("alice", {"name": "Alice", "user_id": "u1"}, "Person")
purchases = gm.create("purchases")
purchases.upsert_node("order1", {"user_id": "u1", "total": 99.99}, "Order")
# Cross-graph query using FROM clause
result = gm.query(
"""
MATCH (p:Person) FROM social
WHERE p.user_id = 'u1'
RETURN p.name, graph(p) AS source
""",
graphs=["social"]
)
for row in result:
print(f"{row['p.name']} from {row['source']}")
The graph() Function
Use the graph() function to identify which graph a node comes from:
result = gm.query(
"MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source_graph",
graphs=["social"]
)
Raw SQL Cross-Graph Queries
For advanced use cases, you can execute raw SQL across attached graphs:
result = gm.query_sql(
"SELECT COUNT(*) FROM social.nodes",
graphs=["social"]
)
print(f"Node count: {result[0][0]}")
Using GraphManager (Rust)
The Rust API provides similar functionality:
use graphqlite::{graphs, GraphManager}; fn main() -> graphqlite::Result<()> { let mut gm = graphs("./data")?; // Create graphs gm.create("social")?; gm.create("products")?; // List graphs for name in gm.list()? { println!("Graph: {}", name); } // Open and use a graph let social = gm.open_graph("social")?; social.query("CREATE (n:Person {name: 'Alice'})")?; // Cross-graph query let result = gm.query( "MATCH (n:Person) FROM social RETURN n.name", &["social"] )?; for row in &result { println!("{}", row.get::<String>("n.name")?); } // Drop a graph gm.drop("products")?; Ok(()) }
Direct SQL with ATTACH
You can also work with multiple graphs directly using SQLite's ATTACH:
import sqlite3
import graphqlite
# Create separate graph databases
conn1 = sqlite3.connect("social.db")
graphqlite.load(conn1)
conn1.execute("SELECT cypher('CREATE (n:Person {name: \"Alice\"})')")
conn1.close()
conn2 = sqlite3.connect("products.db")
graphqlite.load(conn2)
conn2.execute("SELECT cypher('CREATE (n:Product {name: \"Phone\"})')")
conn2.close()
# Query across both
coordinator = sqlite3.connect(":memory:")
graphqlite.load(coordinator)
coordinator.execute("ATTACH DATABASE 'social.db' AS social")
coordinator.execute("ATTACH DATABASE 'products.db' AS products")
result = coordinator.execute(
"SELECT cypher('MATCH (n:Person) FROM social RETURN n.name')"
).fetchone()
print(result[0])
Best Practices
-
Use GraphManager for convenience: It handles extension loading, connection caching, and cleanup automatically.
-
Commit before cross-graph queries: GraphManager automatically commits open graph connections before cross-graph queries to ensure data visibility.
-
Keep graphs focused: Design your graphs around specific domains or use cases for better performance and maintainability.
-
Use meaningful names: Graph names become SQLite database aliases, so use valid SQL identifiers.
-
Handle errors gracefully: Check for
FileNotFoundErrorwhen opening graphs that might not exist.
Limitations
- Cross-graph queries are read-only for the attached graphs
- The
FROMclause only applies toMATCHpatterns - Graph names must be valid SQL identifiers (alphanumeric, underscores)
- Maximum of ~10 attached databases (SQLite limit)
Use the gqlite CLI
The gqlite command-line tool provides an interactive shell for executing Cypher queries against a SQLite database.
Building
angreal build app
# or
make graphqlite
This creates build/gqlite.
Usage
# Interactive mode with default database (graphqlite.db)
./build/gqlite
# Specify a database file
./build/gqlite mydata.db
# Initialize a fresh database
./build/gqlite -i mydata.db
# Verbose mode (shows query execution details)
./build/gqlite -v mydata.db
Interactive Shell
When you start gqlite, you'll see an interactive prompt:
GraphQLite Interactive Shell
Type .help for help, .quit to exit
Queries must end with semicolon (;)
graphqlite>
Statement Termination
All Cypher queries must end with a semicolon (;). Multi-line statements are supported:
graphqlite> CREATE (a:Person {name: "Alice"});
Query executed successfully
Nodes created: 1
Properties set: 1
graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
...> CREATE (a)-[:KNOWS]->(b);
Query executed successfully
Relationships created: 1
The ...> prompt indicates you're continuing a multi-line statement.
Dot Commands
| Command | Description |
|---|---|
.help | Show help information |
.schema | Display database schema |
.tables | List all tables |
.stats | Show database statistics |
.quit | Exit the shell |
Script Execution
You can pipe Cypher scripts to gqlite:
# Execute a script file
./build/gqlite mydata.db < script.cypher
# Inline script
echo 'CREATE (n:Test {value: 42});
MATCH (n:Test) RETURN n.value;' | ./build/gqlite mydata.db
Script Format
Scripts should have one statement per line or use multi-line statements ending with semicolons:
-- setup.cypher
CREATE (a:Person {name: "Alice", age: 30});
CREATE (b:Person {name: "Bob", age: 25});
CREATE (c:Person {name: "Charlie", age: 35});
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS]->(b);
MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
CREATE (b)-[:KNOWS]->(c);
-- Query friend-of-friend
MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
RETURN fof.name;
Examples
Create and Query a Social Network
$ ./build/gqlite social.db
graphqlite> CREATE (alice:Person {name: "Alice"});
Query executed successfully
Nodes created: 1
Properties set: 1
graphqlite> CREATE (bob:Person {name: "Bob"});
Query executed successfully
Nodes created: 1
Properties set: 1
graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
...> CREATE (a)-[:FRIENDS_WITH]->(b);
Query executed successfully
Relationships created: 1
graphqlite> MATCH (p:Person)-[:FRIENDS_WITH]->(friend)
...> RETURN p.name, friend.name;
p.name friend.name
---------------
Alice Bob
graphqlite> .quit
Goodbye!
Check Database Statistics
graphqlite> .stats
Database Statistics:
===================
Nodes : 2
Edges : 1
Node Labels : 2
Property Keys : 1
Edge Types : FRIENDS_WITH
Command Line Options
| Option | Description |
|---|---|
-h, --help | Show help message |
-v, --verbose | Enable verbose debug output |
-i, --init | Initialize new database (overwrites existing) |
Use Graph Algorithms
GraphQLite includes 15+ built-in graph algorithms. This guide shows how to use them effectively.
Using Algorithms with the Graph API
The Graph class provides direct methods for common algorithms:
from graphqlite import Graph
g = Graph("my_graph.db")
# Centrality
pagerank = g.pagerank(damping=0.85, iterations=20)
degree = g.degree_centrality()
# Community detection
communities = g.community_detection(iterations=10)
# Path finding
path = g.shortest_path("alice", "bob")
# Components
components = g.connected_components()
Using Algorithms with Cypher
For algorithms not exposed directly, use the RETURN clause:
# Betweenness centrality
results = g.query("RETURN betweennessCentrality()")
# Louvain community detection
results = g.query("RETURN louvain(1.0)")
# A* pathfinding
results = g.query("RETURN astar('start', 'end', 'lat', 'lon')")
Working with Results
All algorithms return JSON results that are parsed into Python dictionaries:
pagerank = g.pagerank()
# Results are a list of dicts
for node in pagerank:
print(f"Node {node['user_id']}: score {node['score']}")
# Sort by score
top_nodes = sorted(pagerank, key=lambda x: x['score'], reverse=True)[:10]
# Filter
high_scores = [n for n in pagerank if n['score'] > 0.1]
Using Results in SQL
When using raw SQL, extract values with json_each and json_extract:
-- Get top 10 PageRank nodes
SELECT
json_extract(value, '$.node_id') as id,
json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;
Algorithm Parameters
PageRank
g.pagerank(damping=0.85, iterations=20)
damping: Probability of following a link (default: 0.85)iterations: Number of iterations (default: 20)
Label Propagation
g.community_detection(iterations=10)
iterations: Maximum iterations before stopping (default: 10)
Shortest Path
g.shortest_path(source_id, target_id)
Returns {"distance": int, "path": [node_ids], "found": bool}. When found is false, distance is None and path is empty.
A* Pathfinding
SELECT cypher('RETURN astar("start", "end", "lat", "lon")');
Requires nodes to have coordinate properties for the heuristic.
Performance Considerations
- Small graphs (<10K nodes): All algorithms run instantly
- Medium graphs (10K-100K nodes): Most algorithms complete in under a second
- Large graphs (>100K nodes): Some algorithms (PageRank, community detection) may take several seconds
For large graphs, consider:
- Running algorithms in a background thread
- Caching results if the graph doesn't change frequently
- Using approximate algorithms for real-time queries
Handle Special Characters
This guide explains how to handle special characters in property values to avoid query issues.
The Problem
Property values containing certain characters can break Cypher query parsing:
# This will cause issues
g.query("CREATE (n:Note {text: 'Line1\nLine2'})")
Characters that need special handling:
- Newlines (
\n) - Carriage returns (
\r) - Tabs (
\t) - Single quotes (
') - Backslashes (
\)
Solution 1: Use Parameterized Queries (Recommended)
The safest approach is to use parameterized queries via the Connection.cypher() method:
g.connection.cypher(
"CREATE (n:Note {text: $text})",
{"text": "Line1\nLine2"}
)
Parameters are properly escaped automatically.
Solution 2: Use the Graph API
The high-level Graph API handles escaping for you:
g.upsert_node("note1", {"text": "Line1\nLine2"}, label="Note")
Solution 3: Manual Escaping
If you must build queries manually, escape problematic characters:
def escape_for_cypher(value: str) -> str:
"""Escape a string for use in Cypher property values."""
return (value
.replace("\\", "\\\\") # Backslashes first
.replace("'", "\\'") # Single quotes
.replace("\n", " ") # Newlines
.replace("\r", " ") # Carriage returns
.replace("\t", " ")) # Tabs
text = escape_for_cypher("Line1\nLine2")
g.query(f"CREATE (n:Note {{text: '{text}'}})")
Common Symptoms
Nodes exist but MATCH returns nothing
Symptom: You insert nodes and can verify they exist with raw SQL (SELECT * FROM nodes), but MATCH (n) RETURN n returns empty results.
Cause: Newlines or other control characters in property values break the query.
Solution: Use parameterized queries or escape the values.
Query syntax errors
Symptom: SyntaxError when creating nodes with text content.
Cause: Unescaped single quotes in the value.
Solution: Escape quotes or use parameters:
# Wrong
g.query("CREATE (n:Quote {text: 'It's a test'})")
# Right - escape the quote
g.query("CREATE (n:Quote {text: 'It\\'s a test'})")
# Better - use parameters
g.connection.cypher("CREATE (n:Quote {text: $text})", {"text": "It's a test"})
Best Practices
- Always use parameterized queries for user-provided data
- Use the Graph API for simple CRUD operations
- Validate input before storing if using raw queries
- Consider replacing control characters with spaces or removing them entirely if they're not meaningful
Use with Other Extensions
GraphQLite works alongside other SQLite extensions. This guide shows how to combine them.
Loading Multiple Extensions
Method 1: Use graphqlite.load()
import sqlite3
import graphqlite
conn = sqlite3.connect("combined.db")
graphqlite.load(conn)
# Now load other extensions
conn.enable_load_extension(True)
conn.load_extension("other_extension")
conn.enable_load_extension(False)
Method 2: Access Underlying Connection
import graphqlite
import sqlite_vec # Example: vector search extension
db = graphqlite.connect("combined.db")
sqlite_vec.load(db.sqlite_connection) # Access underlying sqlite3.Connection
Example: GraphQLite + sqlite-vec
Combine graph queries with vector similarity search:
import sqlite3
import graphqlite
import sqlite_vec
# Create connection and load both extensions
conn = sqlite3.connect("knowledge.db")
graphqlite.load(conn)
sqlite_vec.load(conn)
# Create graph nodes
conn.execute("SELECT cypher('CREATE (n:Document {id: \"doc1\", title: \"Introduction\"})')")
# Store embeddings in a vector table
conn.execute("""
CREATE VIRTUAL TABLE IF NOT EXISTS embeddings USING vec0(
doc_id TEXT PRIMARY KEY,
embedding FLOAT[384]
)
""")
# Query: find similar documents, then get their graph neighbors
similar_docs = conn.execute("""
SELECT doc_id FROM embeddings
WHERE embedding MATCH ?
LIMIT 5
""", [query_embedding]).fetchall()
for (doc_id,) in similar_docs:
# Get related nodes from graph
related = conn.execute(f"""
SELECT cypher('
MATCH (d:Document {{id: "{doc_id}"}})-[:RELATED_TO]->(other)
RETURN other.title
')
""").fetchall()
In-Memory Database Considerations
In-memory databases are connection-specific. All extensions must share the same connection:
# Correct: single connection, multiple extensions
conn = sqlite3.connect(":memory:")
graphqlite.load(conn)
other_extension.load(conn)
# Both extensions share the same in-memory database
# Wrong: separate connections don't share data
conn1 = sqlite3.connect(":memory:")
conn2 = sqlite3.connect(":memory:")
# conn1 and conn2 are completely separate databases!
Extension Loading Order
Generally, load GraphQLite first, then other extensions. This ensures the graph schema is created before any dependent operations.
conn = sqlite3.connect("db.sqlite")
# 1. Load GraphQLite first
graphqlite.load(conn)
# 2. Load other extensions
conn.enable_load_extension(True)
conn.load_extension("extension2")
conn.load_extension("extension3")
conn.enable_load_extension(False)
Troubleshooting
Extension conflicts
If extensions conflict, try loading them in different orders or check for table name collisions.
Missing tables
Ensure GraphQLite is loaded before querying graph tables. The schema is created on first load.
Transaction issues
Some extensions may have different transaction semantics. If you encounter issues, try committing between operations:
graphqlite.load(conn)
conn.commit()
other_extension.load(conn)
conn.commit()
Parameterized Queries
Parameterized queries prevent SQL injection and properly handle special characters. This guide shows how to use them.
Basic Usage
Use $parameter syntax in Cypher and pass a dictionary of parameters to Connection.cypher():
from graphqlite import Graph
g = Graph(":memory:")
# Named parameters via the connection
results = g.connection.cypher(
"MATCH (n:Person {name: $name}) WHERE n.age > $age RETURN n",
{"name": "Alice", "age": 30}
)
With the Connection API
The Connection.cypher() method accepts parameters as a dictionary:
from graphqlite import connect
conn = connect(":memory:")
# Create with parameters
conn.cypher(
"CREATE (n:Person {name: $name, age: $age})",
{"name": "Bob", "age": 25}
)
# Query with parameters
results = conn.cypher(
"MATCH (n:Person) WHERE n.age >= $min_age RETURN n.name",
{"min_age": 21}
)
With Raw SQL
When using the SQLite interface directly:
SELECT cypher(
'MATCH (n:Person {name: $name}) RETURN n',
'{"name": "Alice"}'
);
Parameter Types
Parameters support all JSON types:
params = json.dumps({
"string_val": "hello",
"int_val": 42,
"float_val": 3.14,
"bool_val": True,
"null_val": None,
"array_val": [1, 2, 3]
})
Use Cases
User Input
Always use parameters for user-provided values:
def search_by_name(user_input: str):
# Safe - user input is parameterized
return g.connection.cypher(
"MATCH (n:Person {name: $name}) RETURN n",
{"name": user_input}
)
Batch Operations
people = [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
{"name": "Carol", "age": 35},
]
for person in people:
g.connection.cypher(
"CREATE (n:Person {name: $name, age: $age})",
person
)
Complex Values
Parameters handle special characters automatically:
# This works correctly even with quotes and newlines
text = "He said \"hello\"\nand then left."
g.connection.cypher(
"CREATE (n:Note {content: $text})",
{"text": text}
)
Benefits
- Security: Prevents Cypher injection attacks
- Correctness: Properly handles quotes, newlines, and special characters
- Performance: Query plans can be cached (future optimization)
- Clarity: Separates query logic from data
Common Patterns
Optional Parameters
def search(name: str = None, min_age: int = None):
conditions = []
params = {}
if name:
conditions.append("n.name = $name")
params["name"] = name
if min_age:
conditions.append("n.age >= $min_age")
params["min_age"] = min_age
where = f"WHERE {' AND '.join(conditions)}" if conditions else ""
return g.connection.cypher(
f"MATCH (n:Person) {where} RETURN n",
params if params else None
)
Lists in Parameters
names = ["Alice", "Bob", "Carol"]
results = g.connection.cypher(
"MATCH (n:Person) WHERE n.name IN $names RETURN n",
{"names": names}
)
Cypher Support
GraphQLite implements a substantial subset of the Cypher query language.
Overview
Cypher is a declarative graph query language originally developed by Neo4j. GraphQLite supports the core features needed for most graph operations.
Quick Reference
| Feature | Support |
|---|---|
| Node patterns | ✅ Full |
| Relationship patterns | ✅ Full |
| Variable-length paths | ✅ Full |
| shortestPath/allShortestPaths | ✅ Full |
| Parameterized queries | ✅ Full |
| MATCH/OPTIONAL MATCH | ✅ Full |
| CREATE/MERGE | ✅ Full |
| SET/REMOVE/DELETE | ✅ Full |
| WITH/UNWIND/FOREACH | ✅ Full |
| LOAD CSV | ✅ Full |
| UNION/UNION ALL | ✅ Full |
| RETURN with modifiers | ✅ Full |
| Aggregation functions | ✅ Full |
| CASE expressions | ✅ Full |
| List comprehensions | ✅ Full |
| Pattern comprehensions | ✅ Full |
| Map projections | ✅ Full |
| Multi-graph (FROM clause) | ✅ Full |
| Graph algorithms | ✅ 15+ built-in |
| CALL procedures | ❌ Not supported |
| CREATE INDEX/CONSTRAINT | ❌ Use SQLite |
Pattern Syntax
Nodes
(n) -- Any node
(n:Person) -- Node with label
(n:Person {name: 'Alice'}) -- Node with properties
(:Person) -- Anonymous node with label
Relationships
-[r]-> -- Outgoing relationship
<-[r]- -- Incoming relationship
-[r]- -- Either direction
-[:KNOWS]-> -- Relationship with type
-[r:KNOWS {since: 2020}]-> -- With properties
Variable-Length Paths
-[*]-> -- Any length
-[*2]-> -- Exactly 2 hops
-[*1..3]-> -- 1 to 3 hops
-[:KNOWS*1..5]-> -- Typed, 1 to 5 hops
Clauses
See Clauses Reference for detailed documentation.
Functions
See Functions Reference for the complete function list.
Operators
See Operators Reference for comparison and logical operators.
Implementation Notes
GraphQLite implements standard Cypher with some differences from full implementations:
- No CALL procedures - Use built-in graph algorithm functions instead (e.g.,
RETURN pageRank()) - No CREATE INDEX/CONSTRAINT - Use SQLite's indexing and constraint mechanisms directly
- EXPLAIN supported - Returns the generated SQL for debugging instead of a query plan
- Multi-graph support - Use the
FROMclause to query specific graphs with GraphManager - Substring indexing - Uses 0-based indexing (Cypher standard), automatically converted for SQLite
Cypher Clauses
Reading Clauses
MATCH
Find patterns in the graph:
MATCH (n:Person) RETURN n
MATCH (a)-[:KNOWS]->(b) RETURN a, b
MATCH (n:Person {name: 'Alice'}) RETURN n
Shortest Path Patterns
Find shortest paths between nodes:
// Find a single shortest path
MATCH p = shortestPath((a:Person {name: 'Alice'})-[*]-(b:Person {name: 'Bob'}))
RETURN p, length(p)
// Find all shortest paths (all paths with minimum length)
MATCH p = allShortestPaths((a:Person)-[*]-(b:Person))
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN p
// With relationship type filter
MATCH p = shortestPath((a)-[:KNOWS*]->(b))
RETURN nodes(p), relationships(p)
// With length constraints
MATCH p = shortestPath((a)-[*..10]->(b))
RETURN p
OPTIONAL MATCH
Like MATCH, but returns NULL for non-matches (left join semantics):
MATCH (p:Person)
OPTIONAL MATCH (p)-[:MANAGES]->(e)
RETURN p.name, e.name
WHERE
Filter results:
MATCH (n:Person)
WHERE n.age > 21 AND n.city = 'NYC'
RETURN n
Writing Clauses
CREATE
Create nodes and relationships:
CREATE (n:Person {name: 'Alice', age: 30})
CREATE (a)-[:KNOWS {since: 2020}]->(b)
MERGE
Create if not exists, match if exists:
MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()
SET
Update properties:
MATCH (n:Person {name: 'Alice'})
SET n.age = 31, n.city = 'LA'
Add labels:
MATCH (n:Person {name: 'Alice'})
SET n:Employee
REMOVE
Remove properties:
MATCH (n:Person {name: 'Alice'})
REMOVE n.temporary_field
Remove labels:
MATCH (n:Person:Employee {name: 'Alice'})
REMOVE n:Employee
DELETE
Delete nodes (must have no relationships):
MATCH (n:Person {name: 'Alice'})
DELETE n
DETACH DELETE
Delete nodes and all their relationships:
MATCH (n:Person {name: 'Alice'})
DETACH DELETE n
Composing Clauses
WITH
Chain query parts, aggregation, and filtering:
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c, count(p) as employee_count
WHERE employee_count > 10
RETURN c.name, employee_count
UNWIND
Expand a list into rows:
UNWIND [1, 2, 3] AS x
RETURN x
UNWIND $names AS name
CREATE (n:Person {name: name})
FOREACH
Iterate and perform updates:
MATCH p = (start)-[*]->(end)
FOREACH (n IN nodes(p) | SET n.visited = true)
LOAD CSV
Import data from CSV files:
// With headers (access columns by name)
LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
CREATE (n:Person {name: row.name, age: toInteger(row.age)})
// Without headers (access columns by index)
LOAD CSV FROM 'file:///data.csv' AS row
CREATE (n:Item {id: row[0], value: row[1]})
// Custom field terminator
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CREATE (n:Record {field1: row.col1})
Note: File paths are relative to the current working directory. Use file:/// prefix for local files.
Multi-Graph Queries
FROM Clause
Query specific graphs when using GraphManager (multi-graph support):
// Query a specific graph
MATCH (n:Person) FROM social
RETURN n.name
// Combined with other clauses
MATCH (p:Person) FROM social
WHERE p.age > 21
RETURN p.name, graph(p) AS source_graph
The graph() function returns which graph a node came from.
Combining Results
UNION
Combine results from multiple queries, removing duplicates:
MATCH (n:Person) WHERE n.city = 'NYC' RETURN n.name
UNION
MATCH (n:Person) WHERE n.age > 50 RETURN n.name
UNION ALL
Combine results keeping all rows (including duplicates):
MATCH (a:Person)-[:KNOWS]->(b) RETURN b.name AS connection
UNION ALL
MATCH (a:Person)-[:WORKS_WITH]->(b) RETURN b.name AS connection
Return Clause
RETURN
Specify what to return:
MATCH (n:Person) RETURN n
MATCH (n:Person) RETURN n.name, n.age
MATCH (n:Person) RETURN n.name AS name
DISTINCT
Remove duplicates:
MATCH (n:Person)-[:KNOWS]->(m)
RETURN DISTINCT m.city
ORDER BY
Sort results:
MATCH (n:Person)
RETURN n.name, n.age
ORDER BY n.age DESC, n.name ASC
SKIP and LIMIT
Pagination:
MATCH (n:Person)
RETURN n
ORDER BY n.name
SKIP 10
LIMIT 5
Aggregation
Use aggregate functions in RETURN or WITH:
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, count(p), avg(p.salary), collect(p.name)
See Functions Reference for all aggregate functions.
Cypher Functions
String Functions
| Function | Description | Example |
|---|---|---|
toLower(s) | Convert to lowercase | toLower('Hello') → 'hello' |
toUpper(s) | Convert to uppercase | toUpper('Hello') → 'HELLO' |
trim(s) | Remove leading/trailing whitespace | trim(' hi ') → 'hi' |
ltrim(s) | Remove leading whitespace | ltrim(' hi') → 'hi' |
rtrim(s) | Remove trailing whitespace | rtrim('hi ') → 'hi' |
replace(s, from, to) | Replace occurrences | replace('hello', 'l', 'x') → 'hexxo' |
substring(s, start, len) | Extract substring | substring('hello', 1, 3) → 'ell' |
left(s, n) | First n characters | left('hello', 2) → 'he' |
right(s, n) | Last n characters | right('hello', 2) → 'lo' |
split(s, delim) | Split into list | split('a,b,c', ',') → ['a','b','c'] |
reverse(s) | Reverse string | reverse('hello') → 'olleh' |
length(s) | String length | length('hello') → 5 |
size(s) | String length (alias) | size('hello') → 5 |
toString(x) | Convert to string | toString(123) → '123' |
String Predicates
| Function | Description | Example |
|---|---|---|
startsWith(s, prefix) | Check prefix | startsWith('hello', 'he') → true |
endsWith(s, suffix) | Check suffix | endsWith('hello', 'lo') → true |
contains(s, sub) | Check substring | contains('hello', 'ell') → true |
Math Functions
| Function | Description | Example |
|---|---|---|
abs(n) | Absolute value | abs(-5) → 5 |
ceil(n) | Round up | ceil(2.3) → 3 |
floor(n) | Round down | floor(2.7) → 2 |
round(n) | Round to nearest | round(2.5) → 3 |
sign(n) | Sign (-1, 0, 1) | sign(-5) → -1 |
sqrt(n) | Square root | sqrt(16) → 4 |
log(n) | Natural logarithm | log(e()) → 1 |
log10(n) | Base-10 logarithm | log10(100) → 2 |
exp(n) | e^n | exp(1) → 2.718... |
rand() | Random 0-1 | rand() → 0.42... |
random() | Random 0-1 (alias) | random() → 0.42... |
pi() | π constant | pi() → 3.14159... |
e() | e constant | e() → 2.71828... |
Trigonometric Functions
| Function | Description |
|---|---|
sin(n) | Sine |
cos(n) | Cosine |
tan(n) | Tangent |
asin(n) | Arc sine |
acos(n) | Arc cosine |
atan(n) | Arc tangent |
List Functions
| Function | Description | Example |
|---|---|---|
head(list) | First element | head([1,2,3]) → 1 |
tail(list) | All but first | tail([1,2,3]) → [2,3] |
last(list) | Last element | last([1,2,3]) → 3 |
size(list) | Length | size([1,2,3]) → 3 |
range(start, end) | Create range | range(1, 5) → [1,2,3,4,5] |
reverse(list) | Reverse list | reverse([1,2,3]) → [3,2,1] |
keys(map) | Get map keys | keys({a:1, b:2}) → ['a','b'] |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
count(x) | Count items | count(n), count(*) |
sum(x) | Sum values | sum(n.amount) |
avg(x) | Average | avg(n.score) |
min(x) | Minimum | min(n.age) |
max(x) | Maximum | max(n.age) |
collect(x) | Collect into list | collect(n.name) |
Entity Functions
| Function | Description | Example |
|---|---|---|
id(node) | Get node/edge ID | id(n) |
labels(node) | Get node labels | labels(n) → ['Person'] |
type(rel) | Get relationship type | type(r) → 'KNOWS' |
properties(x) | Get all properties | properties(n) |
startNode(rel) | Start node of relationship | startNode(r) |
endNode(rel) | End node of relationship | endNode(r) |
Path Functions
| Function | Description | Example |
|---|---|---|
nodes(path) | Get all nodes in path | nodes(p) |
relationships(path) | Get all relationships | relationships(p) |
rels(path) | Get all relationships (alias) | rels(p) |
length(path) | Path length (edges) | length(p) |
Type Conversion
| Function | Description | Example |
|---|---|---|
toInteger(x) | Convert to integer | toInteger('42') → 42 |
toFloat(x) | Convert to float | toFloat('3.14') → 3.14 |
toBoolean(x) | Convert to boolean | toBoolean('true') → true |
coalesce(x, y, ...) | First non-null value | coalesce(n.name, 'Unknown') |
Temporal Functions
| Function | Description | Example |
|---|---|---|
date() | Current date | date() → '2025-01-15' |
datetime() | Current datetime | datetime() |
time() | Current time | time() |
timestamp() | Unix timestamp (ms) | timestamp() |
localdatetime() | Local datetime | localdatetime() |
randomUUID() | Generate random UUID | randomUUID() → '550e8400-e29b-...' |
Predicate Functions
| Function | Description | Example |
|---|---|---|
exists(pattern) | Pattern exists | EXISTS { (n)-[:KNOWS]->() } |
exists(prop) | Property exists | exists(n.email) |
all(x IN list WHERE pred) | All match | all(x IN [1,2,3] WHERE x > 0) |
any(x IN list WHERE pred) | Any match | any(x IN [1,2,3] WHERE x > 2) |
none(x IN list WHERE pred) | None match | none(x IN [1,2,3] WHERE x < 0) |
single(x IN list WHERE pred) | Exactly one | single(x IN [1,2,3] WHERE x = 2) |
Reduce
| Function | Description | Example |
|---|---|---|
reduce(acc = init, x IN list | expr) | Fold/reduce | reduce(s = 0, x IN [1,2,3] | s + x) → 6 |
CASE Expressions
Searched CASE
Evaluates conditions in order and returns the first matching result:
RETURN CASE
WHEN n.age < 18 THEN 'minor'
WHEN n.age < 65 THEN 'adult'
ELSE 'senior'
END AS category
Simple CASE
Compares an expression against values:
RETURN CASE n.status
WHEN 'A' THEN 'Active'
WHEN 'I' THEN 'Inactive'
WHEN 'P' THEN 'Pending'
ELSE 'Unknown'
END AS status_name
Comprehensions
List Comprehension
Create lists by transforming or filtering:
// Transform each element
RETURN [x IN range(1, 5) | x * 2]
// → [2, 4, 6, 8, 10]
// Filter elements
RETURN [x IN range(1, 10) WHERE x % 2 = 0]
// → [2, 4, 6, 8, 10]
// Filter and transform
RETURN [x IN range(1, 10) WHERE x % 2 = 0 | x * x]
// → [4, 16, 36, 64, 100]
Pattern Comprehension
Extract data from pattern matches within an expression:
// Collect names of friends
MATCH (p:Person)
RETURN p.name, [(p)-[:KNOWS]->(friend) | friend.name] AS friends
// With filtering
RETURN [(p)-[:KNOWS]->(f:Person) WHERE f.age > 21 | f.name] AS adult_friends
Map Projection
Create maps by selecting properties from nodes:
// Select specific properties
MATCH (n:Person)
RETURN n {.name, .age}
// → {name: "Alice", age: 30}
// Include computed values
MATCH (n:Person)
RETURN n {.name, status: 'active', upperName: toUpper(n.name)}
Cypher Operators
Comparison Operators
| Operator | Description | Example |
|---|---|---|
= | Equal | n.age = 30 |
<> | Not equal | n.status <> 'deleted' |
< | Less than | n.age < 18 |
> | Greater than | n.age > 65 |
<= | Less than or equal | n.score <= 100 |
>= | Greater than or equal | n.score >= 0 |
Boolean Operators
| Operator | Description | Example |
|---|---|---|
AND | Logical and | n.age > 18 AND n.active = true |
OR | Logical or | n.role = 'admin' OR n.role = 'mod' |
NOT | Logical not | NOT n.deleted |
XOR | Exclusive or | a.flag XOR b.flag |
Null Operators
| Operator | Description | Example |
|---|---|---|
IS NULL | Check for null | n.email IS NULL |
IS NOT NULL | Check for non-null | n.email IS NOT NULL |
String Operators
| Operator | Description | Example |
|---|---|---|
STARTS WITH | Prefix match | n.name STARTS WITH 'A' |
ENDS WITH | Suffix match | n.email ENDS WITH '.com' |
CONTAINS | Substring match | n.bio CONTAINS 'developer' |
=~ | Regex match | n.email =~ '.*@gmail\\.com' |
List Operators
| Operator | Description | Example |
|---|---|---|
IN | List membership | n.status IN ['active', 'pending'] |
+ | List concatenation | [1, 2] + [3, 4] → [1, 2, 3, 4] |
[index] | Index access | list[0] (first element) |
Arithmetic Operators
| Operator | Description | Example |
|---|---|---|
+ | Addition | n.price + tax |
- | Subtraction | n.total - discount |
* | Multiplication | n.quantity * n.price |
/ | Division | n.total / n.count |
% | Modulo | n.id % 10 |
String Concatenation
| Operator | Description | Example |
|---|---|---|
+ | Concatenate strings | n.first + ' ' + n.last |
Property Access
| Operator | Description | Example |
|---|---|---|
. | Property access | n.name |
Operator Precedence
From highest to lowest:
.[]- Property/index access*/%- Multiplication, division, modulo+-- Addition, subtraction=<><><=>=- ComparisonIS NULLIS NOT NULLINSTARTS WITHENDS WITHCONTAINS=~NOTANDXOROR
Use parentheses to override precedence:
WHERE (n.age > 18 OR n.verified) AND n.active
Graph Algorithms
GraphQLite includes 15+ built-in graph algorithms.
Centrality Algorithms
PageRank
Measures node importance based on incoming links from important nodes.
RETURN pageRank()
RETURN pageRank(0.85, 20) -- damping, iterations
Returns: [{"node_id": int, "user_id": string, "score": float}, ...]
Parameters:
damping(default: 0.85) - Probability of following a linkiterations(default: 20) - Number of iterations
Degree Centrality
Counts incoming and outgoing connections.
RETURN degreeCentrality()
Returns: [{"node_id": int, "user_id": string, "in_degree": int, "out_degree": int, "degree": int}, ...]
Betweenness Centrality
Measures how often a node lies on shortest paths between other nodes.
RETURN betweennessCentrality()
Returns: [{"node_id": int, "user_id": string, "score": float}, ...]
Closeness Centrality
Measures average distance to all other nodes.
RETURN closenessCentrality()
Returns: [{"node_id": int, "user_id": string, "score": float}, ...]
Eigenvector Centrality
Measures influence based on connections to high-scoring nodes.
RETURN eigenvectorCentrality()
RETURN eigenvectorCentrality(100) -- max iterations
Returns: [{"node_id": int, "user_id": string, "score": float}, ...]
Community Detection
Label Propagation
Detects communities by propagating labels through the network.
RETURN labelPropagation()
RETURN labelPropagation(10) -- max iterations
RETURN communities() -- alias
Returns: [{"node_id": int, "user_id": string, "community": int}, ...]
Louvain
Hierarchical community detection optimizing modularity.
RETURN louvain()
RETURN louvain(1.0) -- resolution parameter
Returns: [{"node_id": int, "user_id": string, "community": int}, ...]
Connected Components
Weakly Connected Components (WCC)
Groups nodes reachable by ignoring edge direction.
RETURN wcc()
Returns: [{"node_id": int, "user_id": string, "component": int}, ...]
Strongly Connected Components (SCC)
Groups nodes where every node can reach every other node following edge direction.
RETURN scc()
Returns: [{"node_id": int, "user_id": string, "component": int}, ...]
Path Finding
Dijkstra (Shortest Path)
Finds shortest path between two nodes.
RETURN dijkstra('source_id', 'target_id')
Returns: {"found": bool, "distance": int, "path": [node_ids]}
The found field indicates whether a path exists. When found is false, distance is null and path is empty.
A* Search
Shortest path with heuristic. Can use geographic coordinates for distance estimation or fall back to uniform heuristic.
RETURN astar('source_id', 'target_id')
RETURN astar('source_id', 'target_id', 'lat_prop', 'lon_prop')
When lat_prop and lon_prop are provided, A* uses haversine distance as the heuristic. Without these properties, it behaves similarly to Dijkstra but may explore fewer nodes.
Returns: {"found": bool, "distance": float, "path": [node_ids], "nodes_explored": int}
All-Pairs Shortest Paths (APSP)
Computes shortest distances between all node pairs.
RETURN apsp()
Returns: [{"source": string, "target": string, "distance": int}, ...]
Note: O(n²) space and time complexity. Use with caution on large graphs.
Traversal
Breadth-First Search (BFS)
Explores nodes level by level from a starting point.
RETURN bfs('start_id')
RETURN bfs('start_id', 3) -- max depth
Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]
The order field indicates the traversal order (0 = starting node, then incrementing).
Depth-First Search (DFS)
Explores as far as possible along each branch.
RETURN dfs('start_id')
RETURN dfs('start_id', 5) -- max depth
Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]
Similarity
Node Similarity (Jaccard)
Computes Jaccard similarity between node neighborhoods.
RETURN nodeSimilarity()
Returns: [{"node1": int, "node2": int, "similarity": float}, ...]
K-Nearest Neighbors (KNN)
Finds k most similar nodes to a given node based on Jaccard similarity of neighborhoods.
RETURN knn('node_id', 10) -- node, k
Returns: [{"neighbor": string, "similarity": float, "rank": int}, ...]
Results are ordered by similarity (highest first), with rank starting at 1.
Triangle Count
Counts triangles and computes clustering coefficient.
RETURN triangleCount()
Returns: [{"node_id": int, "user_id": string, "triangles": int, "clustering_coefficient": float}, ...]
Using Results in SQL
Extract algorithm results using SQLite JSON functions:
SELECT
json_extract(value, '$.node_id') as id,
json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;
Python API Reference
Installation
pip install graphqlite
Module Functions
graphqlite.connect()
Create a connection to a SQLite database with GraphQLite loaded.
from graphqlite import connect
conn = connect(":memory:")
conn = connect("graph.db")
conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib")
Parameters:
database(str) - Database path or:memory:extension_path(str, optional) - Path to extension file
Returns: Connection
graphqlite.load()
Load GraphQLite into an existing sqlite3 connection.
import sqlite3
import graphqlite
conn = sqlite3.connect(":memory:")
graphqlite.load(conn)
Parameters:
conn- sqlite3.Connection or apsw.Connectionentry_point(str, optional) - Extension entry point
graphqlite.loadable_path()
Get the path to the loadable extension.
path = graphqlite.loadable_path()
Returns: str
graphqlite.wrap()
Wrap an existing sqlite3 connection with GraphQLite support.
import sqlite3
import graphqlite
conn = sqlite3.connect(":memory:")
wrapped = graphqlite.wrap(conn)
results = wrapped.cypher("RETURN 1 AS x")
Parameters:
conn- sqlite3.Connection objectextension_path(str, optional) - Path to extension file
Returns: Connection
graphqlite.graph()
Factory function to create a Graph instance.
from graphqlite import graph
g = graph(":memory:")
g = graph("graph.db", namespace="myapp")
Parameters:
db_path(str) - Database path or:memory:namespace(str, optional) - Graph namespace (default: "default")extension_path(str, optional) - Path to extension file
Returns: Graph
CypherResult Class
Result container returned by cypher() queries.
results = conn.cypher("MATCH (n:Person) RETURN n.name, n.age")
# Length
print(len(results)) # Number of rows
# Indexing
first_row = results[0] # Get first row as dict
# Iteration
for row in results:
print(row["n.name"])
# Column names
print(results.columns) # ["n.name", "n.age"]
# Convert to list
all_rows = results.to_list() # List of dicts
Properties:
columns- List of column names
Methods:
to_list()- Return all rows as a list of dictionaries
Connection Class
Connection.cypher()
Execute a Cypher query with optional parameters.
conn.cypher("CREATE (n:Person {name: 'Alice'})")
results = conn.cypher("MATCH (n) RETURN n.name")
for row in results:
print(row["n.name"])
# With parameters
results = conn.cypher(
"MATCH (n:Person {name: $name}) RETURN n",
{"name": "Alice"}
)
The query parameter is the Cypher query string. The optional params parameter accepts a dictionary that will be converted to JSON for parameter binding.
Returns: CypherResult object (iterable, supports indexing and len())
Connection.execute()
Execute raw SQL.
conn.execute("SELECT * FROM nodes")
Graph Class
High-level API for graph operations.
Constructor
from graphqlite import Graph
g = Graph(":memory:")
g = Graph("graph.db")
Node Operations
upsert_node()
Create or update a node.
g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
Parameters:
node_id(str) - Unique node identifierproperties(dict) - Node propertieslabel(str, optional) - Node label
get_node()
Get a node by ID.
node = g.get_node("alice")
# {"id": "alice", "label": "Person", "properties": {"name": "Alice", "age": 30}}
Returns: dict or None
has_node()
Check if a node exists.
exists = g.has_node("alice") # True
Returns: bool
delete_node()
Delete a node.
g.delete_node("alice")
get_all_nodes()
Get all nodes, optionally filtered by label.
all_nodes = g.get_all_nodes()
people = g.get_all_nodes(label="Person")
Returns: List of dicts
Edge Operations
upsert_edge()
Create or update an edge.
g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")
Parameters:
source_id(str) - Source node IDtarget_id(str) - Target node IDproperties(dict) - Edge propertiesrel_type(str, optional) - Relationship type
get_edge()
Get an edge between two nodes.
edge = g.get_edge("alice", "bob")
Returns the first edge found between the source and target nodes, or None if no edge exists.
has_edge()
Check if an edge exists.
exists = g.has_edge("alice", "bob")
Returns: bool
delete_edge()
Delete an edge between two nodes.
g.delete_edge("alice", "bob")
get_all_edges()
Get all edges.
edges = g.get_all_edges()
Returns: List of dicts
Graph Operations
get_neighbors()
Get a node's neighbors (connected by edges in either direction).
neighbors = g.get_neighbors("alice")
Parameters:
node_id(str) - Node ID
Returns: List of neighbor node dicts
node_degree()
Get a node's degree, which is the total number of edges connected to the node (both incoming and outgoing).
degree = g.node_degree("alice") # 5
Returns an integer count of connected edges.
stats()
Get graph statistics.
stats = g.stats()
# {"nodes": 100, "edges": 250}
Returns: dict
Query Methods
query()
Execute a Cypher query and return results as a list of dictionaries.
results = g.query("MATCH (n:Person) RETURN n.name")
for row in results:
print(row["n.name"])
This method is for queries that don't require parameters. For parameterized queries, access the underlying connection:
results = g.connection.cypher(
"MATCH (n:Person {name: $name}) RETURN n",
{"name": "Alice"}
)
Algorithm Methods
Centrality Algorithms
pagerank()
Compute PageRank scores for all nodes.
results = g.pagerank(damping=0.85, iterations=20)
# [{"node_id": "alice", "score": 0.25}, ...]
Parameters:
damping(float, default: 0.85) - Damping factoriterations(int, default: 20) - Number of iterations
degree_centrality()
Compute in-degree, out-degree, and total degree for all nodes.
results = g.degree_centrality()
# [{"node_id": "alice", "in_degree": 2, "out_degree": 3, "degree": 5}, ...]
betweenness_centrality()
Compute betweenness centrality (how often a node lies on shortest paths).
results = g.betweenness_centrality()
# Alias: g.betweenness()
Returns: List of {"node_id": str, "score": float}
closeness_centrality()
Compute closeness centrality (average distance to all other nodes).
results = g.closeness_centrality()
# Alias: g.closeness()
Returns: List of {"node_id": str, "score": float}
eigenvector_centrality()
Compute eigenvector centrality (influence based on connections to high-scoring nodes).
results = g.eigenvector_centrality(iterations=100)
Parameters:
iterations(int, default: 100) - Maximum iterations
Community Detection
community_detection()
Detect communities using label propagation.
results = g.community_detection(iterations=10)
# [{"node_id": "alice", "community": 1}, ...]
Parameters:
iterations(int, default: 10) - Maximum iterations
louvain()
Detect communities using the Louvain algorithm (modularity optimization).
results = g.louvain(resolution=1.0)
Parameters:
resolution(float, default: 1.0) - Higher values produce more communities
leiden_communities()
Detect communities using the Leiden algorithm.
results = g.leiden_communities(resolution=1.0, random_seed=42)
Parameters:
resolution(float, default: 1.0) - Resolution parameterrandom_seed(int, optional) - Random seed for reproducibility
Requires: graspologic>=3.0 (pip install graspologic)
Connected Components
weakly_connected_components()
Find weakly connected components (ignoring edge direction).
results = g.weakly_connected_components()
# Aliases: g.connected_components(), g.wcc()
Returns: List of {"node_id": str, "component": int}
strongly_connected_components()
Find strongly connected components (respecting edge direction).
results = g.strongly_connected_components()
# Alias: g.scc()
Returns: List of {"node_id": str, "component": int}
Path Finding
shortest_path()
Find the shortest path between two nodes using Dijkstra's algorithm.
path = g.shortest_path("alice", "bob", weight_property="distance")
# {"distance": 2, "path": ["alice", "carol", "bob"], "found": True}
# Alias: g.dijkstra()
Parameters:
source_id(str) - Starting node IDtarget_id(str) - Ending node IDweight_property(str, optional) - Edge property to use as weight
Returns: {"path": list, "distance": float|None, "found": bool}
astar()
Find the shortest path using A* algorithm with optional geographic heuristic.
path = g.astar("alice", "bob", lat_prop="latitude", lon_prop="longitude")
# Alias: g.a_star()
Parameters:
source_id(str) - Starting node IDtarget_id(str) - Ending node IDlat_prop(str, optional) - Latitude property name for heuristiclon_prop(str, optional) - Longitude property name for heuristic
Returns: {"path": list, "distance": float|None, "found": bool, "nodes_explored": int}
all_pairs_shortest_path()
Compute shortest distances between all node pairs (Floyd-Warshall).
results = g.all_pairs_shortest_path()
# Alias: g.apsp()
Returns: List of {"source": str, "target": str, "distance": float}
Note: O(n²) complexity. Use with caution on large graphs.
Traversal
bfs()
Breadth-first search from a starting node.
results = g.bfs("alice", max_depth=3)
# Alias: g.breadth_first_search()
Parameters:
start_id(str) - Starting node IDmax_depth(int, default: -1) - Maximum depth (-1 for unlimited)
Returns: List of {"user_id": str, "depth": int, "order": int}
dfs()
Depth-first search from a starting node.
results = g.dfs("alice", max_depth=5)
# Alias: g.depth_first_search()
Parameters:
start_id(str) - Starting node IDmax_depth(int, default: -1) - Maximum depth (-1 for unlimited)
Returns: List of {"user_id": str, "depth": int, "order": int}
Similarity
node_similarity()
Compute Jaccard similarity between node neighborhoods.
# All pairs above threshold
results = g.node_similarity(threshold=0.5)
# Specific pair
results = g.node_similarity(node1_id="alice", node2_id="bob")
# Top-k most similar pairs
results = g.node_similarity(top_k=10)
Parameters:
node1_id(str, optional) - First node IDnode2_id(str, optional) - Second node IDthreshold(float, default: 0.0) - Minimum similarity thresholdtop_k(int, default: 0) - Return only top-k pairs (0 for all)
Returns: List of {"node1": str, "node2": str, "similarity": float}
knn()
Find k-nearest neighbors for a node based on Jaccard similarity.
results = g.knn("alice", k=10)
Parameters:
node_id(str) - Node to find neighbors fork(int, default: 10) - Number of neighbors to return
Returns: List of {"neighbor": str, "similarity": float, "rank": int}
triangle_count()
Count triangles and compute clustering coefficients.
results = g.triangle_count()
# Alias: g.triangles()
Returns: List of {"node_id": str, "triangles": int, "clustering_coefficient": float}
Export
to_rustworkx()
Export the graph to a rustworkx PyDiGraph for use with rustworkx algorithms.
graph, node_map = g.to_rustworkx()
Returns: Tuple of (rustworkx.PyDiGraph, dict mapping node IDs to indices)
Requires: rustworkx>=0.13 (pip install rustworkx)
Batch Operations
upsert_nodes_batch()
nodes = [
("alice", {"name": "Alice"}, "Person"),
("bob", {"name": "Bob"}, "Person"),
]
g.upsert_nodes_batch(nodes)
upsert_edges_batch()
edges = [
("alice", "bob", {"since": 2020}, "KNOWS"),
("bob", "carol", {"since": 2021}, "KNOWS"),
]
g.upsert_edges_batch(edges)
GraphManager Class
Manages multiple graph databases in a directory with cross-graph query support.
Constructor
from graphqlite import graphs, GraphManager
# Using factory function (recommended)
gm = graphs("./data")
# Or direct instantiation
gm = GraphManager("./data")
gm = GraphManager("./data", extension_path="/path/to/graphqlite.dylib")
Context Manager
with graphs("./data") as gm:
# Work with graphs...
pass # All connections closed automatically
Graph Management
list()
List all graphs in the directory.
names = gm.list() # ["products", "social", "users"]
Returns: List of graph names (sorted)
exists()
Check if a graph exists.
if gm.exists("social"):
print("Graph exists")
Returns: bool
create()
Create a new graph.
g = gm.create("social")
Parameters:
name(str) - Graph name
Returns: Graph instance
Raises: FileExistsError if graph already exists
open()
Open an existing graph.
g = gm.open("social")
Parameters:
name(str) - Graph name
Returns: Graph instance
Raises: FileNotFoundError if graph doesn't exist
open_or_create()
Open a graph, creating it if it doesn't exist.
g = gm.open_or_create("cache")
Returns: Graph instance
drop()
Delete a graph and its database file.
gm.drop("old_graph")
Raises: FileNotFoundError if graph doesn't exist
Cross-Graph Queries
query()
Execute a Cypher query across multiple graphs.
result = gm.query(
"MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source",
graphs=["social"]
)
for row in result:
print(f"{row['n.name']} from {row['source']}")
Parameters:
cypher(str) - Cypher query with FROM clausesgraphs(list) - Graph names to attachparams(dict, optional) - Query parameters
Returns: CypherResult
query_sql()
Execute raw SQL across attached graphs.
result = gm.query_sql(
"SELECT COUNT(*) FROM social.nodes",
graphs=["social"]
)
Parameters:
sql(str) - SQL query with graph-prefixed table namesgraphs(list) - Graph names to attachparameters(tuple, optional) - Query parameters
Returns: List of tuples
Collection Interface
# Length
len(gm) # Number of graphs
# Membership
"social" in gm # True/False
# Iteration
for name in gm:
print(name)
Utility Functions
escape_string()
Escape a string for use in Cypher.
from graphqlite import escape_string
safe = escape_string("It's a test")
sanitize_rel_type()
Sanitize a relationship type name.
from graphqlite import sanitize_rel_type
safe = sanitize_rel_type("has-friend") # "HAS_FRIEND"
CYPHER_RESERVED
A set of reserved Cypher keywords that need special handling in queries.
from graphqlite import CYPHER_RESERVED
if my_label.upper() in CYPHER_RESERVED:
my_label = f"`{my_label}`" # Quote reserved words
Contains keywords like: MATCH, CREATE, RETURN, WHERE, AND, OR, NOT, IN, AS, WITH, ORDER, BY, LIMIT, SKIP, DELETE, SET, REMOVE, MERGE, ON, CASE, WHEN, THEN, ELSE, END, TRUE, FALSE, NULL, etc.
Rust API Reference
Installation
Add to your Cargo.toml:
[dependencies]
graphqlite = "0.2"
Connection
Opening a Connection
#![allow(unused)] fn main() { use graphqlite::Connection; // In-memory database let conn = Connection::open_in_memory()?; // File-based database let conn = Connection::open("graph.db")?; // With custom extension path let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?; }
Executing Cypher Queries
#![allow(unused)] fn main() { // Execute without results conn.cypher("CREATE (n:Person {name: 'Alice'})")?; // Execute with results let rows = conn.cypher("MATCH (n:Person) RETURN n.name")?; for row in rows { let name: String = row.get(0)?; println!("{}", name); } }
Parameterized Queries
For parameterized queries, embed parameters in the query string:
#![allow(unused)] fn main() { use serde_json::json; let params = json!({"name": "Alice", "age": 30}); let query = format!( "CREATE (n:Person {{name: '{}', age: {}}})", params["name"].as_str().unwrap(), params["age"] ); conn.cypher(&query)?; }
Note: Direct parameter binding is planned for a future release.
Row Access
Access row values by column name using the get() method:
#![allow(unused)] fn main() { let results = conn.cypher("MATCH (n) RETURN n.name AS name, n.age AS age")?; for row in &results { let name: String = row.get("name")?; let age: i32 = row.get("age")?; println!("{} is {} years old", name, age); } }
The column name must match the alias in your RETURN clause. Use AS to create readable column names.
Type Conversions
GraphQLite automatically converts between Cypher and Rust types:
| Cypher Type | Rust Type |
|---|---|
| Integer | i32, i64 |
| Float | f64 |
| String | String, &str |
| Boolean | bool |
| Null | Option<T> |
| List | Vec<T> |
| Map | serde_json::Value |
Error Handling
#![allow(unused)] fn main() { use graphqlite::{Connection, Error}; fn example() -> Result<(), Error> { let conn = Connection::open_in_memory()?; match conn.cypher("INVALID QUERY") { Ok(rows) => { /* process rows */ } Err(Error::Cypher(msg)) => { eprintln!("Cypher query error: {}", msg); } Err(Error::Sqlite(e)) => { eprintln!("SQLite error: {}", e); } Err(e) => { eprintln!("Other error: {}", e); } } Ok(()) } }
Error Variants
The Error enum includes the following variants:
#![allow(unused)] fn main() { pub enum Error { Sqlite(rusqlite::Error), // SQLite database errors Json(serde_json::Error), // JSON parsing errors Cypher(String), // Cypher query errors ExtensionNotFound(String), // Extension file not found TypeError { expected: &'static str, actual: String }, // Type conversion errors ColumnNotFound(String), // Column doesn't exist in result GraphExists(String), // Graph already exists (GraphManager) GraphNotFound { name: String, available: Vec<String> }, // Graph not found Io(std::io::Error), // File I/O errors } }
Complete Example
use graphqlite::Connection; fn main() -> Result<(), graphqlite::Error> { // Open connection let conn = Connection::open_in_memory()?; // Create nodes conn.cypher("CREATE (a:Person {name: 'Alice', age: 30})")?; conn.cypher("CREATE (b:Person {name: 'Bob', age: 25})")?; // Create relationship conn.cypher(" MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'}) CREATE (a)-[:KNOWS {since: 2020}]->(b) ")?; // Query with aliases let results = conn.cypher(" MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name AS from_person, b.name AS to_person ")?; for row in &results { let from: String = row.get("from_person")?; let to: String = row.get("to_person")?; println!("{} knows {}", from, to); } // Query with filter (embedding values directly) let min_age = 26; let results = conn.cypher(&format!( "MATCH (n:Person) WHERE n.age >= {} RETURN n.name AS name", min_age ))?; for row in &results { let name: String = row.get("name")?; println!("Adult: {}", name); } Ok(()) }
Graph Class
High-level API for graph operations, providing ergonomic methods for nodes, edges, and algorithms.
Creating a Graph
#![allow(unused)] fn main() { use graphqlite::Graph; // In-memory graph let g = Graph::open_in_memory()?; // File-based graph let g = Graph::open("graph.db")?; // With custom extension path let g = Graph::open_with_extension("graph.db", "/path/to/graphqlite.so")?; // From existing connection let g = Graph::from_connection(conn)?; }
Node Operations
#![allow(unused)] fn main() { // Create or update a node g.upsert_node("alice", [("name", "Alice"), ("age", "30")], "Person")?; // Check if node exists if g.has_node("alice")? { println!("Alice exists"); } // Get a node if let Some(node) = g.get_node("alice")? { println!("Found: {:?}", node); } // Get all nodes (optionally filtered by label) let all_nodes = g.get_all_nodes(None)?; let people = g.get_all_nodes(Some("Person"))?; // Delete a node (also deletes connected edges) g.delete_node("alice")?; }
Edge Operations
#![allow(unused)] fn main() { // Create or update an edge g.upsert_edge("alice", "bob", [("since", "2020")], "KNOWS")?; // Check if edge exists if g.has_edge("alice", "bob")? { println!("Edge exists"); } // Get an edge if let Some(edge) = g.get_edge("alice", "bob")? { println!("Edge: {:?}", edge); } // Get all edges let edges = g.get_all_edges()?; // Delete an edge g.delete_edge("alice", "bob")?; }
Query Operations
#![allow(unused)] fn main() { // Execute Cypher query let results = g.query("MATCH (n:Person) RETURN n.name")?; // Get graph statistics let stats = g.stats()?; println!("Nodes: {}, Edges: {}", stats.nodes, stats.edges); // Get node degree (connection count) let degree = g.node_degree("alice")?; // Get neighbors let neighbors = g.get_neighbors("alice")?; }
Batch Operations
#![allow(unused)] fn main() { // Batch insert nodes let nodes = vec![ ("alice", vec![("name", "Alice")], "Person"), ("bob", vec![("name", "Bob")], "Person"), ]; g.upsert_nodes_batch(nodes)?; // Batch insert edges let edges = vec![ ("alice", "bob", vec![("since", "2020")], "KNOWS"), ("bob", "carol", vec![("since", "2021")], "KNOWS"), ]; g.upsert_edges_batch(edges)?; }
Algorithm Methods
Centrality
#![allow(unused)] fn main() { // PageRank let results = g.pagerank(0.85, 20)?; // damping, iterations for r in results { println!("{}: {}", r.user_id.unwrap_or_default(), r.score); } // Degree centrality let results = g.degree_centrality()?; for r in results { println!("{}: in={}, out={}, total={}", r.user_id.unwrap_or_default(), r.in_degree, r.out_degree, r.degree); } // Betweenness centrality let results = g.betweenness_centrality()?; // Closeness centrality let results = g.closeness_centrality()?; // Eigenvector centrality let results = g.eigenvector_centrality(100)?; // iterations }
Community Detection
#![allow(unused)] fn main() { // Label propagation let results = g.community_detection(10)?; // iterations for r in results { println!("{} is in community {}", r.user_id.unwrap_or_default(), r.community); } // Louvain algorithm let results = g.louvain(1.0)?; // resolution }
Connected Components
#![allow(unused)] fn main() { // Weakly connected components let results = g.wcc()?; // Strongly connected components let results = g.scc()?; }
Path Finding
#![allow(unused)] fn main() { // Shortest path (Dijkstra) let result = g.shortest_path("alice", "bob", None)?; // optional weight property if result.found { println!("Path: {:?}, Distance: {:?}", result.path, result.distance); } // A* search (with optional lat/lon heuristic) let result = g.astar("alice", "bob", None, None)?; println!("Explored {} nodes", result.nodes_explored); // All-pairs shortest paths let results = g.apsp()?; for r in results { println!("{} -> {}: {}", r.source, r.target, r.distance); } }
Traversal
#![allow(unused)] fn main() { // Breadth-first search let results = g.bfs("alice", Some(3))?; // optional max depth for r in results { println!("{} at depth {} (order {})", r.user_id, r.depth, r.order); } // Depth-first search let results = g.dfs("alice", None)?; // None = unlimited depth }
Similarity
#![allow(unused)] fn main() { // Node similarity (Jaccard) let results = g.node_similarity(None, None, 0.5, 10)?; // node1, node2, threshold, top_k for r in results { println!("{} <-> {}: {}", r.node1, r.node2, r.similarity); } // K-nearest neighbors let results = g.knn("alice", 5)?; for r in results { println!("#{}: {} (similarity: {})", r.rank, r.neighbor, r.similarity); } // Triangle count let results = g.triangle_count()?; for r in results { println!("{}: {} triangles, clustering={}", r.user_id.unwrap_or_default(), r.triangles, r.clustering_coefficient); } }
Algorithm Result Types
All algorithm methods return strongly-typed result structs:
#![allow(unused)] fn main() { // PageRank, Betweenness, Closeness, Eigenvector pub struct PageRankResult { pub node_id: String, pub user_id: Option<String>, pub score: f64, } // Degree Centrality pub struct DegreeCentralityResult { pub node_id: String, pub user_id: Option<String>, pub in_degree: i64, pub out_degree: i64, pub degree: i64, } // Community Detection, Louvain pub struct CommunityResult { pub node_id: String, pub user_id: Option<String>, pub community: i64, } // WCC, SCC pub struct ComponentResult { pub node_id: String, pub user_id: Option<String>, pub component: i64, } // Shortest Path pub struct ShortestPathResult { pub path: Vec<String>, pub distance: Option<f64>, pub found: bool, } // A* Search pub struct AStarResult { pub path: Vec<String>, pub distance: Option<f64>, pub found: bool, pub nodes_explored: i64, } // All-Pairs Shortest Path pub struct ApspResult { pub source: String, pub target: String, pub distance: f64, } // BFS, DFS pub struct TraversalResult { pub user_id: String, pub depth: i64, pub order: i64, } // Node Similarity pub struct NodeSimilarityResult { pub node1: String, pub node2: String, pub similarity: f64, } // KNN pub struct KnnResult { pub neighbor: String, pub similarity: f64, pub rank: i64, } // Triangle Count pub struct TriangleCountResult { pub node_id: String, pub user_id: Option<String>, pub triangles: i64, pub clustering_coefficient: f64, } }
GraphManager
Manages multiple graph databases in a directory with cross-graph query support.
Creating a GraphManager
#![allow(unused)] fn main() { use graphqlite::{graphs, GraphManager}; // Using factory function (recommended) let mut gm = graphs("./data")?; // Or direct instantiation let mut gm = GraphManager::open("./data")?; // With custom extension path let mut gm = GraphManager::open_with_extension("./data", "/path/to/graphqlite.so")?; }
Graph Management
#![allow(unused)] fn main() { // Create a new graph let social = gm.create("social")?; // Open an existing graph let social = gm.open_graph("social")?; // Open or create let cache = gm.open_or_create("cache")?; // List all graphs for name in gm.list()? { println!("Graph: {}", name); } // Check if graph exists if gm.exists("social") { println!("Social graph exists"); } // Delete a graph gm.drop("old_graph")?; }
Cross-Graph Queries
#![allow(unused)] fn main() { // Query across multiple graphs using FROM clause let result = gm.query( "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source", &["social"] )?; for row in &result { let name: String = row.get("n.name")?; let source: String = row.get("source")?; println!("{} from {}", name, source); } }
Raw SQL Cross-Graph Queries
#![allow(unused)] fn main() { let results = gm.query_sql( "SELECT COUNT(*) FROM social.nodes", &["social"] )?; }
Complete Multi-Graph Example
use graphqlite::graphs; fn main() -> graphqlite::Result<()> { let mut gm = graphs("./data")?; // Create and populate graphs { let social = gm.create("social")?; social.query("CREATE (n:Person {name: 'Alice', user_id: 'u1'})")?; social.query("CREATE (n:Person {name: 'Bob', user_id: 'u2'})")?; } { let products = gm.create("products")?; products.query("CREATE (n:Product {name: 'Phone', sku: 'p1'})")?; } // List graphs println!("Graphs: {:?}", gm.list()?); // ["products", "social"] // Cross-graph query let result = gm.query( "MATCH (n:Person) FROM social RETURN n.name ORDER BY n.name", &["social"] )?; for row in &result { println!("Person: {}", row.get::<String>("n.name")?); } // Clean up gm.drop("products")?; gm.drop("social")?; Ok(()) }
Error Handling
#![allow(unused)] fn main() { use graphqlite::{graphs, Error}; let mut gm = graphs("./data")?; match gm.open_graph("nonexistent") { Ok(g) => { /* use graph */ } Err(Error::GraphNotFound { name, available }) => { println!("Graph '{}' not found. Available: {:?}", name, available); } Err(e) => { /* handle other errors */ } } match gm.create("existing") { Ok(g) => { /* use graph */ } Err(Error::GraphExists(name)) => { println!("Graph '{}' already exists", name); } Err(e) => { /* handle other errors */ } } }
Extension Loading
For advanced use cases, wrap an existing rusqlite connection:
#![allow(unused)] fn main() { use rusqlite::Connection as SqliteConnection; use graphqlite::Connection; let sqlite_conn = SqliteConnection::open_in_memory()?; let conn = Connection::from_rusqlite(sqlite_conn)?; }
Or specify a custom extension path:
#![allow(unused)] fn main() { let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?; }
SQL Interface
GraphQLite works as a standard SQLite extension, providing the cypher() function.
Loading the Extension
SQLite CLI
sqlite3 graph.db
.load /path/to/graphqlite
Or with automatic extension loading:
sqlite3 -cmd ".load /path/to/graphqlite" graph.db
Programmatically
SELECT load_extension('/path/to/graphqlite');
The cypher() Function
Basic Usage
SELECT cypher('MATCH (n) RETURN n.name');
With Parameters
SELECT cypher(
'MATCH (n:Person {name: $name}) RETURN n',
'{"name": "Alice"}'
);
Return Format
The cypher() function returns results as JSON:
SELECT cypher('MATCH (n:Person) RETURN n.name, n.age');
-- Returns: [{"n.name": "Alice", "n.age": 30}, {"n.name": "Bob", "n.age": 25}]
Working with Results
Extract Values with JSON Functions
SELECT json_extract(value, '$.n.name') AS name
FROM json_each(cypher('MATCH (n:Person) RETURN n'));
Algorithm Results
SELECT
json_extract(value, '$.node_id') AS id,
json_extract(value, '$.score') AS score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;
Join with Regular Tables
-- Assuming you have a regular 'users' table
SELECT u.email, json_extract(g.value, '$.degree')
FROM users u
JOIN json_each(cypher('RETURN degreeCentrality()')) g
ON u.id = json_extract(g.value, '$.user_id');
Write Operations
-- Create nodes
SELECT cypher('CREATE (n:Person {name: "Alice", age: 30})');
-- Create relationships
SELECT cypher('
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS]->(b)
');
-- Update properties
SELECT cypher('
MATCH (n:Person {name: "Alice"})
SET n.age = 31
');
-- Delete
SELECT cypher('
MATCH (n:Person {name: "Alice"})
DETACH DELETE n
');
Schema Tables
GraphQLite creates these tables automatically. See Storage Model for detailed documentation.
Core Tables
SELECT * FROM nodes;
-- id (auto-increment primary key)
SELECT * FROM node_labels;
-- node_id, label
SELECT * FROM edges;
-- id, source_id, target_id, type
SELECT * FROM property_keys;
-- id, key (normalized property names)
Property Tables
Properties use key_id as a foreign key to property_keys for normalization:
SELECT * FROM node_props_text; -- node_id, key_id, value
SELECT * FROM node_props_int; -- node_id, key_id, value
SELECT * FROM node_props_real; -- node_id, key_id, value
SELECT * FROM node_props_bool; -- node_id, key_id, value
SELECT * FROM edge_props_text; -- edge_id, key_id, value
SELECT * FROM edge_props_int; -- edge_id, key_id, value
SELECT * FROM edge_props_real; -- edge_id, key_id, value
SELECT * FROM edge_props_bool; -- edge_id, key_id, value
Direct SQL Access
You can query the underlying tables directly for debugging or advanced use cases:
-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;
-- Find nodes with a specific property (join through property_keys)
SELECT n.id, pk.key, p.value
FROM nodes n
JOIN node_props_text p ON n.id = p.node_id
JOIN property_keys pk ON p.key_id = pk.id
WHERE pk.key = 'name';
-- Find all properties for a specific node
SELECT pk.key, p.value
FROM node_props_text p
JOIN property_keys pk ON p.key_id = pk.id
WHERE p.node_id = 1;
-- Find edges with their endpoint info
SELECT e.id, e.type, e.source_id, e.target_id
FROM edges e
WHERE e.type = 'KNOWS';
Transaction Support
GraphQLite respects SQLite transactions:
BEGIN;
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:KNOWS]->(b)');
COMMIT;
Or rollback on error:
BEGIN;
SELECT cypher('CREATE (n:Person {name: "Test"})');
ROLLBACK; -- Node is not created
Architecture
This document explains how GraphQLite is structured and how queries flow through the system.
High-Level Overview
┌─────────────────────────────────────────────────────────────┐
│ SQLite Extension │
├─────────────────────────────────────────────────────────────┤
│ cypher() function │
│ │ │
│ ▼ │
│ ┌─────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Parser │───▶│ Transform │───▶│ Executor │ │
│ └─────────┘ └───────────┘ └──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Cypher AST SQL Query Results │
└─────────────────────────────────────────────────────────────┘
Components
Parser
The parser converts Cypher query text into an Abstract Syntax Tree (AST).
Implementation: Flex (lexer) + Bison (parser)
src/backend/parser/cypher_scanner.l- Tokenizersrc/backend/parser/cypher_gram.y- Grammarsrc/backend/parser/cypher_ast.c- AST construction
Transformer
The transformer converts the Cypher AST into SQL that can be executed against the graph schema.
Key files:
src/backend/transform/cypher_transform.c- Main entry pointsrc/backend/transform/transform_match.c- MATCH clause handlingsrc/backend/transform/transform_return.c- RETURN clause handlingsrc/backend/transform/sql_builder.c- SQL construction utilities
Executor
The executor runs the generated SQL and handles special cases like graph algorithms.
Key files:
src/backend/executor/cypher_executor.c- Main entry pointsrc/backend/executor/query_dispatch.c- Pattern-based routingsrc/backend/executor/graph_algorithms.c- Algorithm implementations
Query Flow
1. Entry Point
The cypher() SQL function receives the query:
// In extension.c
static void graphqlite_cypher_func(sqlite3_context *context, int argc, sqlite3_value **argv) {
const char *query = (const char *)sqlite3_value_text(argv[0]);
// ...
}
2. Parsing
The query is tokenized and parsed:
cypher_parse_result *parse_result = parse_cypher_query_ext(query);
ast_node *ast = parse_result->root;
3. Pattern Dispatch
Instead of a giant if-else chain, queries are matched against patterns:
clause_flags flags = analyze_query_clauses(ast);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, ast, result, flags);
4. Transformation
The AST is converted to SQL using the unified SQL builder:
cypher_transform_context *ctx = create_transform_context(db);
transform_query(ctx, ast);
char *sql = sql_builder_to_string(ctx->unified_builder);
5. Execution
The SQL is executed against SQLite:
sqlite3_stmt *stmt;
sqlite3_prepare_v2(db, sql, -1, &stmt, NULL);
while (sqlite3_step(stmt) == SQLITE_ROW) {
// Process results
}
Design Decisions
Why SQLite?
- Zero configuration - single file, no server
- Ubiquitous - available everywhere
- Well-tested - decades of production use
- Extensible - clean extension API
Why Transform to SQL?
Rather than implementing our own storage engine, we transform Cypher to SQL:
- Leverage SQLite's query optimizer
- Benefit from SQLite's transaction handling
- Interop with regular SQL tables
- Simpler implementation
Why Pattern Dispatch?
Replacing if-else chains with table-driven dispatch:
- Easier to add new query patterns
- Clear priority ordering
- Better testability
- Reduced cyclomatic complexity
Extension Loading
When the extension loads:
- Register the
cypher()function - Create schema tables if they don't exist
- Create indexes for efficient lookups
int sqlite3_graphqlite_init(
sqlite3 *db,
char **pzErrMsg,
const sqlite3_api_routines *pApi
) {
SQLITE_EXTENSION_INIT2(pApi);
create_graph_schema(db);
sqlite3_create_function(db, "cypher", -1, SQLITE_UTF8, 0,
graphqlite_cypher_func, 0, 0);
return SQLITE_OK;
}
Storage Model
GraphQLite uses a typed property graph model stored in regular SQLite tables. The schema is designed for query efficiency using an Entity-Attribute-Value (EAV) pattern with property key normalization.
Schema Overview
┌─────────────────────────────────────┐
│ nodes │
│ id (PK, auto-increment) │
├─────────────────────────────────────┤
│ 1 │
│ 2 │
│ 3 │
└─────────────────────────────────────┘
│
│ 1:N
▼
┌─────────────────────────────────────┐
│ node_labels │
│ node_id (FK) │ label │
├───────────────┼─────────────────────┤
│ 1 │ "Person" │
│ 2 │ "Person" │
│ 3 │ "Company" │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ property_keys │
│ id (PK) │ key (UNIQUE) │
├──────────┼──────────────────────────┤
│ 1 │ "name" │
│ 2 │ "age" │
│ 3 │ "id" │
└─────────────────────────────────────┘
│
│ 1:N (via key_id)
▼
┌───────────────────────────────────────────┐
│ node_props_text │
│ node_id (FK) │ key_id (FK) │ value │
├───────────────┼─────────────┼─────────────┤
│ 1 │ 3 │ "alice" │
│ 1 │ 1 │ "Alice" │
│ 2 │ 3 │ "bob" │
│ 2 │ 1 │ "Bob" │
└───────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ edges │
│ id (PK) │ source_id (FK) │ target_id (FK) │ type │
├──────────┼────────────────┼────────────────┼────────────┤
│ 1 │ 1 │ 2 │ "KNOWS" │
│ 2 │ 1 │ 3 │ "WORKS_AT" │
└─────────────────────────────────────────────────────────┘
Core Tables
nodes
The nodes table stores graph nodes with a simple auto-incrementing ID. Node metadata such as labels and properties are stored in separate tables, enabling nodes to have multiple labels and efficient property queries.
| Column | Type | Description |
|---|---|---|
id | INTEGER PRIMARY KEY AUTOINCREMENT | Internal node identifier |
node_labels
Labels are stored in a separate table allowing nodes to have multiple labels. This normalized design enables efficient label-based filtering through indexed lookups.
| Column | Type | Description |
|---|---|---|
node_id | INTEGER FK → nodes(id) | References the node |
label | TEXT | Label name (e.g., "Person") |
The primary key is the composite (node_id, label), which prevents duplicate labels on the same node.
edges
The edges table stores relationships between nodes with a required relationship type.
| Column | Type | Description |
|---|---|---|
id | INTEGER PRIMARY KEY AUTOINCREMENT | Internal edge identifier |
source_id | INTEGER FK → nodes(id) | Source node |
target_id | INTEGER FK → nodes(id) | Target node |
type | TEXT NOT NULL | Relationship type (e.g., "KNOWS") |
Foreign keys use ON DELETE CASCADE so removing a node automatically removes its edges.
property_keys
Property names are normalized into a lookup table to reduce storage overhead and improve query performance. Instead of storing the property name string with every property value, we store a small integer key ID.
| Column | Type | Description |
|---|---|---|
id | INTEGER PRIMARY KEY AUTOINCREMENT | Property key identifier |
key | TEXT UNIQUE | Property name (e.g., "name", "age") |
Property Tables
Properties are stored in separate tables by type. This approach enables type-safe queries, efficient indexing by value, and proper numeric comparisons without type conversion overhead.
Node property tables:
node_props_text— String valuesnode_props_int— Integer valuesnode_props_real— Floating-point valuesnode_props_bool— Boolean values (stored as 0 or 1)
Edge property tables:
edge_props_textedge_props_intedge_props_realedge_props_bool
Each property table has the same structure:
| Column | Type | Description |
|---|---|---|
node_id / edge_id | INTEGER FK | References the owner entity |
key_id | INTEGER FK → property_keys(id) | References the property name |
value | (varies by table) | The property value |
The primary key is the composite (node_id, key_id) or (edge_id, key_id), ensuring each entity has at most one value per property.
Indexes
GraphQLite creates indexes optimized for common graph query patterns:
-- Edge traversal (covers both directions and type filtering)
CREATE INDEX idx_edges_source ON edges(source_id, type);
CREATE INDEX idx_edges_target ON edges(target_id, type);
CREATE INDEX idx_edges_type ON edges(type);
-- Label filtering
CREATE INDEX idx_node_labels_label ON node_labels(label, node_id);
-- Property key lookup
CREATE INDEX idx_property_keys_key ON property_keys(key);
-- Property value queries (enables efficient WHERE clauses)
CREATE INDEX idx_node_props_text_key_value ON node_props_text(key_id, value, node_id);
CREATE INDEX idx_node_props_int_key_value ON node_props_int(key_id, value, node_id);
-- ... similar for other property tables
The property indexes are designed "key-first" to efficiently satisfy queries like WHERE n.name = 'Alice', which translate to lookups by key_id and value.
Why This Design?
Typed property tables provide several advantages over storing all properties as JSON or a single TEXT column. Integer comparisons are performed natively rather than through string parsing. Type-specific indexes enable efficient range queries. Storage is more compact since values don't require type metadata.
Property key normalization through the property_keys table reduces storage by replacing repeated property name strings with integer IDs. This also enables efficient property-first queries and simplifies schema introspection.
Separate label table allows nodes to have multiple labels, which is a common requirement in graph modeling. The label index supports efficient label-based filtering without scanning all nodes.
Query Translation
When you write:
MATCH (p:Person {name: 'Alice'})
WHERE p.age > 25
RETURN p.name, p.age
GraphQLite translates this to SQL that joins the appropriate tables:
SELECT
name_prop.value AS "p.name",
age_prop.value AS "p.age"
FROM nodes p
JOIN node_labels p_label ON p.id = p_label.node_id AND p_label.label = 'Person'
LEFT JOIN node_props_text name_prop
ON p.id = name_prop.node_id
AND name_prop.key_id = (SELECT id FROM property_keys WHERE key = 'name')
LEFT JOIN node_props_int age_prop
ON p.id = age_prop.node_id
AND age_prop.key_id = (SELECT id FROM property_keys WHERE key = 'age')
WHERE name_prop.value = 'Alice'
AND age_prop.value > 25
In practice, the query optimizer uses cached prepared statements for property key lookups, making this translation efficient.
Direct SQL Access
You can query the underlying tables directly for advanced use cases:
-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;
-- Find all properties of a specific node
SELECT pk.key, 'text' as type, pt.value
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
WHERE pt.node_id = 1
UNION ALL
SELECT pk.key, 'int' as type, CAST(pi.value AS TEXT)
FROM node_props_int pi
JOIN property_keys pk ON pi.key_id = pk.id
WHERE pi.node_id = 1;
-- Find nodes with a specific property value
SELECT nl.node_id, nl.label, pt.value as name
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
JOIN node_labels nl ON pt.node_id = nl.node_id
WHERE pk.key = 'name' AND pt.value = 'Alice';
Query Pattern Dispatch System
GraphQLite uses a table-driven pattern dispatch system to execute Cypher queries. This document describes how the system works and how to extend it.
Overview
Instead of a massive if-else chain checking clause combinations, queries are matched against a registry of patterns. Each pattern defines:
- Required clauses: Must all be present
- Forbidden clauses: Must all be absent
- Priority: Higher priority patterns are checked first
- Handler: Function to execute the query
Supported Query Patterns
| Pattern | Required | Forbidden | Priority | Description |
|---|---|---|---|---|
UNWIND+CREATE | UNWIND, CREATE | RETURN, MATCH | 100 | Batch node/edge creation |
WITH+MATCH+RETURN | WITH, MATCH, RETURN | - | 100 | Subquery pipeline |
MATCH+CREATE+RETURN | MATCH, CREATE, RETURN | - | 100 | Match then create with results |
MATCH+SET | MATCH, SET | - | 90 | Update matched nodes |
MATCH+DELETE | MATCH, DELETE | - | 90 | Delete matched nodes |
MATCH+REMOVE | MATCH, REMOVE | - | 90 | Remove properties/labels |
MATCH+MERGE | MATCH, MERGE | - | 90 | Conditional create/match |
MATCH+CREATE | MATCH, CREATE | RETURN | 90 | Match then create |
OPTIONAL_MATCH+RETURN | MATCH, OPTIONAL, RETURN | CREATE, SET, DELETE, MERGE | 80 | Left join pattern |
MULTI_MATCH+RETURN | MATCH, MULTI_MATCH, RETURN | CREATE, SET, DELETE, MERGE | 80 | Multiple match clauses |
MATCH+RETURN | MATCH, RETURN | OPTIONAL, MULTI_MATCH, CREATE, SET, DELETE, MERGE | 70 | Simple query |
UNWIND+RETURN | UNWIND, RETURN | CREATE | 60 | List processing |
CREATE | CREATE | MATCH, UNWIND | 50 | Create nodes/edges |
MERGE | MERGE | MATCH | 50 | Merge nodes/edges |
SET | SET | MATCH | 50 | Standalone set |
FOREACH | FOREACH | - | 50 | Iterate and update |
MATCH | MATCH | RETURN, CREATE, SET, DELETE, MERGE, REMOVE | 40 | Match without return |
RETURN | RETURN | MATCH, UNWIND, WITH | 10 | Expressions, graph algorithms |
GENERIC | - | - | 0 | Fallback for any query |
How Pattern Matching Works
- Analyze: Extract clause flags from query AST
- Match: Find highest-priority pattern where:
- All required flags are present
- No forbidden flags are present
- Execute: Call the pattern's handler function
clause_flags flags = analyze_query_clauses(query);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, query, result, flags);
Debugging
Debug Logging
With GRAPHQLITE_DEBUG defined, pattern matching logs:
[CYPHER_DEBUG] Query clauses: MATCH|RETURN
[CYPHER_DEBUG] Matched pattern: MATCH+RETURN (priority 70)
EXPLAIN Command
Use EXPLAIN to see pattern info without executing:
SELECT cypher('EXPLAIN MATCH (n:Person) RETURN n.name');
Output:
Pattern: MATCH+RETURN
Clauses: MATCH|RETURN
SQL: SELECT ... FROM nodes ...
Adding New Patterns
Step 1: Define the Pattern
Add an entry to the patterns[] array in query_dispatch.c:
{
.name = "MY_PATTERN",
.required = CLAUSE_MATCH | CLAUSE_CUSTOM,
.forbidden = CLAUSE_DELETE,
.handler = handle_my_pattern,
.priority = 85
}
Step 2: Implement the Handler
static int handle_my_pattern(cypher_executor *executor,
cypher_query *query,
cypher_result *result,
clause_flags flags)
{
(void)flags;
CYPHER_DEBUG("Executing MY_PATTERN via pattern dispatch");
// Implementation here
result->success = true;
return 0;
}
Step 3: Add Tests
Add tests to test_query_dispatch.c:
static void test_pattern_my_pattern(void)
{
const query_pattern *p = find_matching_pattern(CLAUSE_MATCH | CLAUSE_CUSTOM);
CU_ASSERT_PTR_NOT_NULL(p);
if (p) {
CU_ASSERT_STRING_EQUAL(p->name, "MY_PATTERN");
CU_ASSERT_EQUAL(p->priority, 85);
}
}
Priority Guidelines
| Priority | Use Case |
|---|---|
| 100 | Most specific multi-clause combinations |
| 90 | MATCH + write operation patterns |
| 80 | Complex read patterns (OPTIONAL, multi-MATCH) |
| 70 | Simple read patterns |
| 50-60 | Standalone clauses with modifiers |
| 40-50 | Standalone write clauses |
| 10 | Expressions and algorithms |
| 0 | Generic fallback |
Files
src/include/executor/query_patterns.h- Types and APIsrc/backend/executor/query_dispatch.c- Pattern registry and handlerstests/test_query_dispatch.c- Unit tests
Graph Algorithm Handling
Graph algorithms (PageRank, Dijkstra, etc.) are detected within the RETURN pattern handler. When a RETURN-only query contains a graph algorithm function call, it's executed via the C-based algorithm implementations for performance.
Performance
This document covers GraphQLite's performance characteristics and optimization strategies.
Benchmarks
Benchmarks on Apple M1 Max (10 cores, 64GB RAM).
Insertion Performance
| Nodes | Edges | Time | Rate |
|---|---|---|---|
| 100K | 500K | 445ms | 1.3M/s |
| 500K | 2.5M | 2.30s | 1.3M/s |
| 1M | 5.0M | 5.16s | 1.1M/s |
Traversal by Topology
| Topology | Nodes | Edges | 1-hop | 2-hop |
|---|---|---|---|---|
| Chain | 100K | 99K | <1ms | <1ms |
| Sparse | 100K | 500K | <1ms | <1ms |
| Moderate | 100K | 2.0M | <1ms | 2ms |
| Dense | 100K | 5.0M | <1ms | 9ms |
| Normal dist. | 100K | 957K | <1ms | 1ms |
| Power-law | 100K | 242K | <1ms | <1ms |
| Moderate | 500K | 10.0M | 1ms | 2ms |
| Moderate | 1M | 20.0M | <1ms | 2ms |
Graph Algorithms
| Algorithm | Nodes | Edges | Time |
|---|---|---|---|
| PageRank | 100K | 500K | 148ms |
| Label Propagation | 100K | 500K | 154ms |
| PageRank | 500K | 2.5M | 953ms |
| Label Propagation | 500K | 2.5M | 811ms |
| PageRank | 1M | 5.0M | 37.81s |
| Label Propagation | 1M | 5.0M | 40.21s |
Cypher Query Performance
| Query Type | G(100K, 500K) | G(500K, 2.5M) | G(1M, 5M) |
|---|---|---|---|
| Node lookup | <1ms | 1ms | <1ms |
| 1-hop | <1ms | <1ms | <1ms |
| 2-hop | <1ms | <1ms | <1ms |
| 3-hop | 1ms | 1ms | 1ms |
| Filter scan | 341ms | 1.98s | 3.79s |
| MATCH all | 360ms | 2.05s | 3.98s |
Optimization Strategies
Use Indexes Effectively
GraphQLite creates indexes on:
nodes(user_id)- Fast node lookup by IDnodes(label)- Fast filtering by labeledges(source_id),edges(target_id)- Fast traversal- Property tables on
(node_id, key)- Fast property access
Queries that leverage these indexes are fast.
Limit Variable-Length Paths
Variable-length paths can be expensive:
-- Expensive: unlimited depth
MATCH (a)-[*]->(b) RETURN b
-- Better: limit depth
MATCH (a)-[*1..3]->(b) RETURN b
Use Specific Labels
Labels help filter early:
-- Slower: scan all nodes
MATCH (n) WHERE n.type = 'Person' RETURN n
-- Faster: use label
MATCH (n:Person) RETURN n
Batch Operations
For bulk inserts, use batch methods:
# Slow: individual inserts
for person in people:
g.upsert_node(person["id"], person, label="Person")
# Fast: batch insert
nodes = [(p["id"], p, "Person") for p in people]
g.upsert_nodes_batch(nodes)
Algorithm Caching
Graph algorithms scan the entire graph. If your graph doesn't change frequently, cache results:
import functools
@functools.lru_cache(maxsize=1)
def get_pagerank():
return g.pagerank()
Memory Usage
GraphQLite uses SQLite's memory management. Key factors:
- Page cache: SQLite caches database pages in memory
- Algorithm scratch space: Algorithms allocate temporary structures
- Result buffers: Query results are buffered before returning
For large graphs, consider:
# Increase SQLite page cache (default: 2MB)
conn.execute("PRAGMA cache_size = -64000") # 64MB
Running Benchmarks
Run benchmarks on your hardware:
make performance
This runs:
- Insertion benchmarks
- Traversal benchmarks across topologies
- Algorithm benchmarks
- Query benchmarks