GraphQLite

GraphQLite is an SQLite extension that adds graph database capabilities using the Cypher query language.

Store and query graph data directly in SQLite—combining the simplicity of a single-file, zero-config embedded database with Cypher's expressive power for modeling relationships. Perfect for applications that need graph queries without a separate database server, or for local development and learning without standing up additional infrastructure.

Key Features

  • Cypher query language - Use the industry-standard graph query language
  • Zero configuration - Works with any SQLite database
  • Embedded - No separate server process required
  • 15+ graph algorithms - PageRank, shortest paths, community detection, and more
  • Multiple bindings - Python, Rust, and raw SQL interfaces

Quick Example

from graphqlite import Graph

g = Graph(":memory:")
g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")

results = g.query("MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name")

How This Documentation is Organized

This documentation follows the Diátaxis framework:

  • Tutorials - Step-by-step lessons to get you started
  • How-to Guides - Practical guides for specific tasks
  • Reference - Technical descriptions of Cypher support, APIs, and algorithms
  • Explanation - Background and design decisions

Getting Started

This tutorial walks you through installing GraphQLite and running your first graph queries.

What You'll Learn

  • Install GraphQLite for Python
  • Create nodes and relationships
  • Query the graph with Cypher
  • Use the high-level Graph API

Prerequisites

  • Python 3.8 or later
  • pip package manager

Step 1: Install GraphQLite

pip install graphqlite

Step 2: Create Your First Graph

Open a Python shell and create an in-memory graph:

from graphqlite import Graph

# Create an in-memory graph database
g = Graph(":memory:")

Step 3: Add Nodes

Add some people to your graph:

g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_node("carol", {"name": "Carol", "age": 35}, label="Person")

print(g.stats())  # {'nodes': 3, 'edges': 0}

Each node has:

  • A unique ID ("alice", "bob", "carol")
  • Properties (key-value pairs like name and age)
  • A label (Person)

Step 4: Create Relationships

Connect the nodes with relationships:

g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")
g.upsert_edge("alice", "carol", {"since": 2018}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2021}, rel_type="KNOWS")

print(g.stats())  # {'nodes': 3, 'edges': 3}

Step 5: Query with Cypher

Find all people that Alice knows:

results = g.query("""
    MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
    RETURN friend.name AS name, friend.age AS age
""")

for row in results:
    print(f"{row['name']} is {row['age']} years old")

Output:

Bob is 25 years old
Carol is 35 years old

Step 6: Explore the Graph

Use built-in methods to explore:

# Get Alice's neighbors
neighbors = g.get_neighbors("alice")
print([n["id"] for n in neighbors])  # ['bob', 'carol']

# Check if an edge exists
print(g.has_edge("alice", "bob"))  # True
print(g.has_edge("bob", "alice"))  # False (directed edge)

# Get node degree (total connections)
print(g.node_degree("alice"))  # 2

Next Steps

Getting Started with SQL

This tutorial shows how to use GraphQLite directly from the SQLite command line.

Prerequisites

  • SQLite3 CLI installed
  • GraphQLite extension built (make extension)

Step 1: Load the Extension

sqlite3 my_graph.db
.load build/graphqlite.dylib
-- On Linux: .load build/graphqlite.so
-- On Windows: .load build/graphqlite.dll

Step 2: Create Nodes

-- Create people
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');

Step 3: Create Relationships

-- Alice knows Bob
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');

-- Bob knows Charlie
SELECT cypher('
    MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c)
');

Step 4: Query the Graph

-- Find all people
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');

-- Find relationships
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');

-- Friends of friends
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name
');

Step 5: Using Parameters

-- Safer queries with parameters
SELECT cypher(
    'MATCH (p:Person {name: $name}) RETURN p.age',
    '{"name": "Alice"}'
);

Complete Example

Save this as getting_started.sql:

.load build/graphqlite.dylib

-- Create nodes
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');

-- Create relationships
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');
SELECT cypher('
    MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c)
');

-- Query
SELECT 'All people:';
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');

SELECT '';
SELECT 'Who knows who:';
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');

SELECT '';
SELECT 'Friends of friends:';
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name
');

Run it:

sqlite3 < getting_started.sql

Next Steps

Query Patterns in SQL

This tutorial covers common MATCH patterns for traversing graphs using SQL.

Setup

.load build/graphqlite.dylib
.mode column
.headers on

-- Build a social network
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('CREATE (c:Person {name: "Charlie"})');
SELECT cypher('CREATE (d:Person {name: "Diana"})');
SELECT cypher('CREATE (e:Person {name: "Eve"})');

SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:FOLLOWS]->(b)');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (c:Person {name: "Charlie"}) CREATE (a)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"}) CREATE (b)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (d:Person {name: "Diana"}) CREATE (b)-[:FOLLOWS]->(d)');
SELECT cypher('MATCH (c:Person {name: "Charlie"}), (e:Person {name: "Eve"}) CREATE (c)-[:FOLLOWS]->(e)');
SELECT cypher('MATCH (d:Person {name: "Diana"}), (e:Person {name: "Eve"}) CREATE (d)-[:FOLLOWS]->(e)');

Direct Connections

Outgoing relationships

-- Who does Alice follow?
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->(b)
    RETURN b.name
');

Incoming relationships

-- Who follows Charlie?
SELECT cypher('
    MATCH (a)-[:FOLLOWS]->(b:Person {name: "Charlie"})
    RETURN a.name
');

Multi-Hop Patterns

Two hops

-- Friends of friends (people Alice's follows follow)
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(c)
    RETURN DISTINCT c.name
');

Variable length paths

-- Everyone reachable from Alice in 1-3 hops
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS*1..3]->(b)
    RETURN DISTINCT b.name
');

Aggregation

Count connections

-- Follower counts
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    RETURN b.name, count(a) as followers
    ORDER BY followers DESC
');

Collect into list

-- Group followers by person
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    RETURN b.name, collect(a.name) as followed_by
');

Complex Patterns

Multiple relationship types

-- Find mutual follows
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)-[:FOLLOWS]->(a)
    RETURN a.name, b.name
');

OPTIONAL MATCH

-- All people and their followers (NULL if none)
SELECT cypher('
    MATCH (p:Person)
    OPTIONAL MATCH (follower)-[:FOLLOWS]->(p)
    RETURN p.name, count(follower) as follower_count
');

Filter with WHERE

-- People followed by more than one person
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    WITH b, count(a) as followers
    WHERE followers > 1
    RETURN b.name, followers
');

Working with Results in SQL

Extract JSON fields

SELECT
    json_extract(value, '$.a.name') as person,
    json_extract(value, '$.b.name') as follows
FROM json_each(
    cypher('MATCH (a:Person)-[:FOLLOWS]->(b) RETURN a, b')
);

Join with regular tables

-- Assuming you have a users table
WITH graph_data AS (
    SELECT
        json_extract(value, '$.name') as name,
        json_extract(value, '$.followers') as followers
    FROM json_each(
        cypher('MATCH (a)-[:FOLLOWS]->(b) RETURN b.name as name, count(a) as followers')
    )
)
SELECT u.email, g.followers
FROM users u
JOIN graph_data g ON u.username = g.name;

Next Steps

Building a Knowledge Graph

This tutorial shows how to build a knowledge graph for storing and querying interconnected information.

What You'll Build

A knowledge graph of companies, people, and their relationships—similar to what you might find in a business intelligence system.

What You'll Learn

  • Model complex domains with multiple node types
  • Create various relationship types
  • Write sophisticated Cypher queries
  • Use aggregation and path queries

Step 1: Design the Schema

Our knowledge graph will have:

Node Types (Labels):

  • Company - Organizations
  • Person - Individuals
  • Technology - Products and technologies

Relationship Types:

  • WORKS_AT - Person works at Company
  • FOUNDED - Person founded Company
  • USES - Company uses Technology
  • KNOWS - Person knows Person

Step 2: Create the Graph

from graphqlite import Graph

g = Graph("knowledge.db")  # Persistent database

# Companies
g.upsert_node("acme", {"name": "Acme Corp", "founded": 2010, "industry": "Software"}, label="Company")
g.upsert_node("globex", {"name": "Globex Inc", "founded": 2015, "industry": "AI"}, label="Company")

# People
g.upsert_node("alice", {"name": "Alice Chen", "role": "CEO"}, label="Person")
g.upsert_node("bob", {"name": "Bob Smith", "role": "CTO"}, label="Person")
g.upsert_node("carol", {"name": "Carol Jones", "role": "Engineer"}, label="Person")

# Technologies
g.upsert_node("python", {"name": "Python", "type": "Language"}, label="Technology")
g.upsert_node("graphql", {"name": "GraphQL", "type": "API"}, label="Technology")

Step 3: Add Relationships

# Employment
g.upsert_edge("alice", "acme", {"since": 2010, "title": "CEO"}, rel_type="WORKS_AT")
g.upsert_edge("bob", "acme", {"since": 2012, "title": "CTO"}, rel_type="WORKS_AT")
g.upsert_edge("carol", "globex", {"since": 2020, "title": "Senior Engineer"}, rel_type="WORKS_AT")

# Founding
g.upsert_edge("alice", "acme", {"year": 2010}, rel_type="FOUNDED")

# Technology usage
g.upsert_edge("acme", "python", {"primary": True}, rel_type="USES")
g.upsert_edge("acme", "graphql", {"primary": False}, rel_type="USES")
g.upsert_edge("globex", "python", {"primary": True}, rel_type="USES")

# Personal connections
g.upsert_edge("alice", "bob", {"since": 2010}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2019}, rel_type="KNOWS")

Step 4: Query the Knowledge Graph

Find all employees of a company

results = g.query("""
    MATCH (p:Person)-[r:WORKS_AT]->(c:Company {name: 'Acme Corp'})
    RETURN p.name AS employee, r.title AS title, r.since AS since
    ORDER BY r.since
""")

Find companies using a technology

results = g.query("""
    MATCH (c:Company)-[:USES]->(t:Technology {name: 'Python'})
    RETURN c.name AS company, c.industry AS industry
""")

Find connections between people

results = g.query("""
    MATCH path = (a:Person {name: 'Alice Chen'})-[:KNOWS*1..3]->(b:Person)
    RETURN b.name AS connected_person, length(path) AS distance
""")

Aggregate: Count employees per company

results = g.query("""
    MATCH (p:Person)-[:WORKS_AT]->(c:Company)
    RETURN c.name AS company, count(p) AS employee_count
    ORDER BY employee_count DESC
""")

Find founders who still work at their company

results = g.query("""
    MATCH (p:Person)-[:FOUNDED]->(c:Company),
          (p)-[:WORKS_AT]->(c)
    RETURN p.name AS founder, c.name AS company
""")

Step 5: Update the Graph

Add new information as it becomes available:

# Carol moves to Acme
g.query("""
    MATCH (p:Person {name: 'Carol Jones'})-[r:WORKS_AT]->(:Company)
    DELETE r
""")
g.upsert_edge("carol", "acme", {"since": 2024, "title": "Staff Engineer"}, rel_type="WORKS_AT")

# Add a new technology
g.upsert_node("rust", {"name": "Rust", "type": "Language"}, label="Technology")
g.upsert_edge("globex", "rust", {"primary": False}, rel_type="USES")

Next Steps

Graph Analytics

This tutorial shows how to use GraphQLite's built-in graph algorithms for analysis.

What You'll Learn

  • Run centrality algorithms to find important nodes
  • Detect communities in your graph
  • Find shortest paths between nodes
  • Use algorithm results in your applications

Setup: Create a Social Network

from graphqlite import Graph

g = Graph(":memory:")

# Create a small social network
people = ["alice", "bob", "carol", "dave", "eve", "frank", "grace", "henry"]
for person in people:
    g.upsert_node(person, {"name": person.title()}, label="Person")

# Create connections (who follows whom)
connections = [
    ("alice", "bob"), ("alice", "carol"), ("alice", "dave"),
    ("bob", "carol"), ("bob", "eve"),
    ("carol", "dave"), ("carol", "eve"), ("carol", "frank"),
    ("dave", "frank"),
    ("eve", "frank"), ("eve", "grace"),
    ("frank", "grace"), ("frank", "henry"),
    ("grace", "henry"),
]
for source, target in connections:
    g.upsert_edge(source, target, {}, rel_type="FOLLOWS")

print(g.stats())  # {'nodes': 8, 'edges': 14}

Centrality: Finding Important Nodes

PageRank

PageRank identifies nodes that are linked to by other important nodes:

results = g.pagerank(damping=0.85, iterations=20)
for r in sorted(results, key=lambda x: x["score"], reverse=True)[:3]:
    print(f"{r['user_id']}: {r['score']:.4f}")

Output:

frank: 0.1842
grace: 0.1536
eve: 0.1298

Frank is the most "important" because many well-connected people follow him.

Degree Centrality

Count incoming and outgoing connections:

results = g.degree_centrality()
for r in results:
    print(f"{r['user_id']}: in={r['in_degree']}, out={r['out_degree']}")

Betweenness Centrality

Find nodes that act as bridges between communities:

results = g.query("RETURN betweennessCentrality()")
# Carol and Eve have high betweenness - they connect different groups

Community Detection

Label Propagation

Find clusters of densely connected nodes:

results = g.community_detection(max_iterations=10)
communities = {}
for r in results:
    label = r["community"]
    if label not in communities:
        communities[label] = []
    communities[label].append(r["user_id"])

for label, members in communities.items():
    print(f"Community {label}: {members}")

Louvain Algorithm

For larger graphs, Louvain provides hierarchical community detection:

results = g.query("RETURN louvain(1.0)")

Path Finding

Shortest Path

Find the shortest path between two nodes:

path = g.shortest_path("alice", "henry")
print(f"Distance: {path['distance']}")
print(f"Path: {' -> '.join(path['path'])}")

Output:

Distance: 4
Path: alice -> carol -> frank -> henry

All-Pairs Shortest Paths

Compute distances between all node pairs:

results = g.query("RETURN apsp()")

Connected Components

Weakly Connected Components

Find groups of nodes that are connected (ignoring edge direction):

results = g.connected_components()

Strongly Connected Components

Find groups where every node can reach every other node:

results = g.query("RETURN scc()")

Using Results in Your Application

Algorithm results are returned as lists of dictionaries, making them easy to process:

# Find the top influencers
influencers = g.pagerank()
top_3 = sorted(influencers, key=lambda x: x["score"], reverse=True)[:3]

# Get full node data for top influencers
for inf in top_3:
    node = g.get_node(inf["user_id"])
    print(f"{node['properties']['name']}: PageRank {inf['score']:.4f}")

Combining Algorithms with Cypher

Use algorithm results to guide Cypher queries:

# Find the most central node
pagerank = g.pagerank()
most_central = max(pagerank, key=lambda x: x["score"])["user_id"]

# Query their connections
results = g.query(f"""
    MATCH (p:Person {{name: '{most_central.title()}'}})-[:FOLLOWS]->(friend)
    RETURN friend.name AS friend
""")
print(f"Top influencer {most_central} follows: {[r['friend'] for r in results]}")

Next Steps

Graph Algorithms in SQL

This tutorial shows how to run graph algorithms and work with their results in SQL.

Setup: Citation Network

.load build/graphqlite.dylib
.mode column
.headers on

-- Create papers
SELECT cypher('CREATE (p:Paper {title: "Foundations"})');
SELECT cypher('CREATE (p:Paper {title: "Methods"})');
SELECT cypher('CREATE (p:Paper {title: "Applications"})');
SELECT cypher('CREATE (p:Paper {title: "Survey"})');
SELECT cypher('CREATE (p:Paper {title: "Analysis"})');
SELECT cypher('CREATE (p:Paper {title: "Review"})');

-- Create citations (citing paper -> cited paper)
SELECT cypher('MATCH (a:Paper {title: "Methods"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Applications"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Analysis"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Review"}), (b:Paper {title: "Survey"}) CREATE (a)-[:CITES]->(b)');

PageRank

Find influential papers based on citation structure.

Basic usage

SELECT cypher('RETURN pageRank(0.85, 20)');

Extract as table

SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.user_id') as paper_id,
    printf('%.4f', json_extract(value, '$.score')) as score
FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
ORDER BY json_extract(value, '$.score') DESC;

Join with node properties

WITH rankings AS (
    SELECT
        json_extract(value, '$.node_id') as node_id,
        json_extract(value, '$.score') as score
    FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
)
SELECT
    n.user_id as paper,
    printf('%.4f', r.score) as influence
FROM rankings r
JOIN nodes n ON n.id = r.node_id
ORDER BY r.score DESC;

Community Detection

Label Propagation

SELECT cypher('RETURN labelPropagation(10)');

Group by community

WITH communities AS (
    SELECT
        json_extract(value, '$.node_id') as node_id,
        json_extract(value, '$.community') as community
    FROM json_each(cypher('RETURN labelPropagation(10)'))
)
SELECT
    c.community,
    group_concat(n.user_id) as papers
FROM communities c
JOIN nodes n ON n.id = c.node_id
GROUP BY c.community;

Louvain

SELECT cypher('RETURN louvain(1.0)');

Centrality Metrics

Degree Centrality

SELECT
    json_extract(value, '$.user_id') as paper,
    json_extract(value, '$.in_degree') as cited_by,
    json_extract(value, '$.out_degree') as cites
FROM json_each(cypher('RETURN degreeCentrality()'))
ORDER BY json_extract(value, '$.in_degree') DESC;

Betweenness Centrality

SELECT
    json_extract(value, '$.user_id') as paper,
    printf('%.4f', json_extract(value, '$.score')) as betweenness
FROM json_each(cypher('RETURN betweennessCentrality()'))
ORDER BY json_extract(value, '$.score') DESC;

Path Finding

Shortest Path

SELECT cypher('RETURN dijkstra("Review", "Foundations")');

Result shows path and distance:

{"distance": 3, "path": ["Review", "Survey", "Foundations"]}

Combining Algorithms with Queries

Find most influential paper's citations

-- Get top paper by PageRank
WITH top_paper AS (
    SELECT json_extract(value, '$.user_id') as paper_id
    FROM json_each(cypher('RETURN pageRank()'))
    ORDER BY json_extract(value, '$.score') DESC
    LIMIT 1
)
-- Find what it cites
SELECT cypher(
    'MATCH (p:Paper {title: "' || paper_id || '"})-[:CITES]->(cited) RETURN cited.title'
)
FROM top_paper;

Export for visualization

-- Export nodes
.mode csv
.output nodes.csv
SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.user_id') as label,
    json_extract(value, '$.score') as pagerank
FROM json_each(cypher('RETURN pageRank()'));

-- Export edges
.output edges.csv
SELECT
    source_id, target_id, label as type
FROM edges;
.output stdout

Performance Tips

  1. Limit output for large graphs:

    SELECT * FROM json_each(cypher('RETURN pageRank()')) LIMIT 100;
    
  2. Create views for repeated queries:

    CREATE VIEW paper_influence AS
    SELECT
        json_extract(value, '$.node_id') as node_id,
        json_extract(value, '$.score') as score
    FROM json_each(cypher('RETURN pageRank()'));
    
  3. Index algorithm results if needed repeatedly:

    CREATE TABLE pagerank_cache AS
    SELECT * FROM json_each(cypher('RETURN pageRank()'));
    CREATE INDEX idx_pagerank ON pagerank_cache(json_extract(value, '$.score'));
    

Next Steps

Building a GraphRAG System

This tutorial shows how to build a Graph Retrieval-Augmented Generation (GraphRAG) system using GraphQLite.

What is GraphRAG?

GraphRAG combines:

  1. Document chunking - Split documents into processable pieces
  2. Entity extraction - Identify entities and relationships
  3. Graph storage - Store entities as nodes, relationships as edges
  4. Vector search - Find relevant chunks by semantic similarity
  5. Graph traversal - Expand context using graph structure

Prerequisites

pip install graphqlite sentence-transformers sqlite-vec spacy
python -m spacy download en_core_web_sm

Architecture

Query: "Who are the tech leaders?"
         │
         ▼
┌─────────────────────┐
│  1. Vector Search   │  Find chunks similar to query
└─────────┬───────────┘
         │
         ▼
┌─────────────────────┐
│  2. Graph Lookup    │  MATCH (chunk)-[:MENTIONS]->(entity)
└─────────┬───────────┘
         │
         ▼
┌─────────────────────┐
│  3. Graph Traversal │  MATCH (entity)-[*1..2]-(related)
└─────────┬───────────┘
         │
         ▼
    Context for LLM

Step 1: Document Chunking

from dataclasses import dataclass
from typing import List

@dataclass
class Chunk:
    chunk_id: str
    doc_id: str
    text: str
    start_char: int
    end_char: int

def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50, doc_id: str = "doc") -> List[Chunk]:
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    start = 0
    chunk_index = 0

    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk_words = words[start:end]
        chunk_text = " ".join(chunk_words)

        # Calculate character positions
        start_char = len(" ".join(words[:start])) + (1 if start > 0 else 0)
        end_char = start_char + len(chunk_text)

        chunks.append(Chunk(
            chunk_id=f"{doc_id}_chunk_{chunk_index}",
            doc_id=doc_id,
            text=chunk_text,
            start_char=start_char,
            end_char=end_char,
        ))

        start += chunk_size - overlap
        chunk_index += 1

    return chunks

Step 2: Entity Extraction

import spacy

nlp = spacy.load("en_core_web_sm")

def extract_entities(text: str) -> List[dict]:
    """Extract named entities from text."""
    doc = nlp(text)
    entities = []

    for ent in doc.ents:
        entities.append({
            "text": ent.text,
            "label": ent.label_,
            "start": ent.start_char,
            "end": ent.end_char,
        })

    return entities

def extract_relationships(entities: List[dict]) -> List[tuple]:
    """Create co-occurrence relationships between entities."""
    relationships = []

    for i, e1 in enumerate(entities):
        for e2 in entities[i+1:]:
            relationships.append((
                e1["text"],
                e2["text"],
                "CO_OCCURS",
            ))

    return relationships

Step 3: Build the Knowledge Graph

from graphqlite import Graph

g = Graph("knowledge.db")

def ingest_document(doc_id: str, text: str):
    """Process a document and add to knowledge graph."""

    # Chunk the document
    chunks = chunk_text(text, doc_id=doc_id)

    for chunk in chunks:
        # Store chunk as node
        g.upsert_node(
            chunk.chunk_id,
            {"text": chunk.text[:500], "doc_id": doc_id},  # Truncate for storage
            label="Chunk"
        )

        # Extract and store entities
        entities = extract_entities(chunk.text)

        for entity in entities:
            entity_id = entity["text"].lower().replace(" ", "_")

            # Create entity node
            g.upsert_node(
                entity_id,
                {"name": entity["text"], "type": entity["label"]},
                label="Entity"
            )

            # Link chunk to entity
            g.upsert_edge(
                chunk.chunk_id,
                entity_id,
                {},
                rel_type="MENTIONS"
            )

        # Create entity co-occurrence edges
        relationships = extract_relationships(entities)
        for source, target, rel_type in relationships:
            source_id = source.lower().replace(" ", "_")
            target_id = target.lower().replace(" ", "_")
            g.upsert_edge(source_id, target_id, {}, rel_type=rel_type)
import sqlite3
import sqlite_vec
from sentence_transformers import SentenceTransformer

# Initialize embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

def setup_vector_search(conn: sqlite3.Connection):
    """Set up vector search table."""
    sqlite_vec.load(conn)
    conn.execute("""
        CREATE VIRTUAL TABLE IF NOT EXISTS chunk_embeddings USING vec0(
            chunk_id TEXT PRIMARY KEY,
            embedding FLOAT[384]
        )
    """)

def embed_chunks(conn: sqlite3.Connection, chunks: List[Chunk]):
    """Embed chunks and store vectors."""
    texts = [c.text for c in chunks]
    embeddings = model.encode(texts)

    for chunk, embedding in zip(chunks, embeddings):
        conn.execute(
            "INSERT OR REPLACE INTO chunk_embeddings (chunk_id, embedding) VALUES (?, ?)",
            [chunk.chunk_id, embedding.tobytes()]
        )
    conn.commit()

def vector_search(conn: sqlite3.Connection, query: str, k: int = 5) -> List[str]:
    """Find chunks similar to query."""
    query_embedding = model.encode([query])[0]

    results = conn.execute("""
        SELECT chunk_id
        FROM chunk_embeddings
        WHERE embedding MATCH ?
        LIMIT ?
    """, [query_embedding.tobytes(), k]).fetchall()

    return [r[0] for r in results]

Step 5: GraphRAG Retrieval

def graphrag_retrieve(query: str, k_chunks: int = 5, expand_hops: int = 1) -> dict:
    """
    Retrieve context using GraphRAG:
    1. Vector search for relevant chunks
    2. Find entities mentioned in those chunks
    3. Expand to related entities via graph
    """

    # Get underlying connection for vector search
    conn = g.connection.sqlite_connection

    # Step 1: Vector search
    chunk_ids = vector_search(conn, query, k=k_chunks)

    # Step 2: Get entities from chunks
    entities = set()
    for chunk_id in chunk_ids:
        results = g.query(f"""
            MATCH (c:Chunk {{id: '{chunk_id}'}})-[:MENTIONS]->(e:Entity)
            RETURN e.name
        """)
        for r in results:
            entities.add(r["e.name"])

    # Step 3: Expand via graph
    related_entities = set()
    for entity in entities:
        entity_id = entity.lower().replace(" ", "_")
        results = g.query(f"""
            MATCH (e:Entity {{name: '{entity}'}})-[*1..{expand_hops}]-(related:Entity)
            RETURN DISTINCT related.name
        """)
        for r in results:
            related_entities.add(r["related.name"])

    # Get chunk texts
    chunk_texts = []
    for chunk_id in chunk_ids:
        node = g.get_node(chunk_id)
        if node:
            chunk_texts.append(node["properties"].get("text", ""))

    return {
        "chunks": chunk_texts,
        "entities": list(entities),
        "related_entities": list(related_entities - entities),
    }

Step 6: Complete Pipeline

# Initialize
g = Graph("graphrag.db")
conn = sqlite3.connect("graphrag.db")
setup_vector_search(conn)

# Ingest documents
documents = [
    {"id": "doc1", "text": "Apple Inc. was founded by Steve Jobs..."},
    {"id": "doc2", "text": "Microsoft, led by Satya Nadella..."},
]

for doc in documents:
    ingest_document(doc["id"], doc["text"])
    chunks = chunk_text(doc["text"], doc_id=doc["id"])
    embed_chunks(conn, chunks)

# Query
context = graphrag_retrieve("Who are the tech industry leaders?")
print("Relevant chunks:", len(context["chunks"]))
print("Entities:", context["entities"])
print("Related:", context["related_entities"])

# Use context with an LLM
# response = llm.generate(query, context=context)

Graph Algorithms for GraphRAG

Use graph algorithms to enhance retrieval:

# Find important entities
important = g.pagerank()
top_entities = sorted(important, key=lambda x: x["score"], reverse=True)[:10]

# Find entity communities
communities = g.community_detection()

# Find central entities (good for summarization)
central = g.query("RETURN betweennessCentrality()")

Example Project

See examples/llm-graphrag/ for a complete GraphRAG implementation using the HotpotQA multi-hop reasoning dataset:

  • Graph-based knowledge storage with Cypher queries
  • sqlite-vec for vector similarity search
  • Ollama integration for local LLM inference
  • Community detection for topic-based retrieval
cd examples/llm-graphrag
uv sync
uv run python ingest.py      # Ingest HotpotQA dataset
uv run python rag.py          # Interactive query mode

Next Steps

Installation

pip install graphqlite

This installs pre-built binaries for:

  • Linux (x86_64, aarch64)
  • macOS (arm64, x86_64)
  • Windows (x86_64)

Rust

Add to your Cargo.toml:

[dependencies]
graphqlite = "0.2"

From Source

Building from source requires:

  • GCC or Clang
  • Bison (3.0+)
  • Flex
  • SQLite development headers

macOS

brew install bison flex sqlite
export PATH="$(brew --prefix bison)/bin:$PATH"
make extension RELEASE=1

Linux (Debian/Ubuntu)

sudo apt-get install build-essential bison flex libsqlite3-dev
make extension RELEASE=1

Windows (MSYS2)

pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-sqlite3 bison flex make
make extension RELEASE=1

The extension will be built to:

  • build/graphqlite.dylib (macOS)
  • build/graphqlite.so (Linux)
  • build/graphqlite.dll (Windows)

Verifying Installation

Python

import graphqlite
print(graphqlite.__version__)

# Quick test
from graphqlite import Graph
g = Graph(":memory:")
g.upsert_node("test", {"name": "Test"})
print(g.stats())  # {'nodes': 1, 'edges': 0}

SQL

sqlite3
.load /path/to/graphqlite
SELECT cypher('RETURN 1 + 1 AS result');

Troubleshooting

Extension not found

If you get FileNotFoundError: GraphQLite extension not found:

  1. Build the extension: make extension RELEASE=1
  2. Set the path explicitly:
    from graphqlite import connect
    conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib")
    
  3. Or set an environment variable:
    export GRAPHQLITE_EXTENSION_PATH=/path/to/graphqlite.dylib
    

macOS: Library not loaded

If you see errors about missing SQLite libraries, ensure you're using Homebrew's Python or set DYLD_LIBRARY_PATH:

export DYLD_LIBRARY_PATH="$(brew --prefix sqlite)/lib:$DYLD_LIBRARY_PATH"

Working with Multiple Graphs

GraphQLite supports managing and querying across multiple graph databases. This is useful for:

  • Separation of concerns: Keep different data domains in separate graphs
  • Access control: Different graphs can have different permissions
  • Performance: Smaller, focused graphs can be faster to query
  • Cross-domain queries: Query relationships across different datasets

Using GraphManager (Python)

The GraphManager class manages multiple graph databases in a directory:

from graphqlite import graphs

# Create a manager for a directory
with graphs("./data") as gm:
    # Create graphs
    social = gm.create("social")
    products = gm.create("products")

    # Add data to each graph
    social.upsert_node("alice", {"name": "Alice", "age": 30}, "Person")
    social.upsert_node("bob", {"name": "Bob", "age": 25}, "Person")
    social.upsert_edge("alice", "bob", {"since": 2020}, "KNOWS")

    products.upsert_node("phone", {"name": "iPhone", "price": 999}, "Product")
    products.upsert_node("laptop", {"name": "MacBook", "price": 1999}, "Product")

    # List all graphs
    print(gm.list())  # ['products', 'social']

    # Check if a graph exists
    if "social" in gm:
        print("Social graph exists")

Opening Existing Graphs

from graphqlite import graphs

with graphs("./data") as gm:
    # Open an existing graph
    social = gm.open("social")

    # Or create if it doesn't exist
    cache = gm.open_or_create("cache")

    # Query the graph
    result = social.query("MATCH (n:Person) RETURN n.name")
    for row in result:
        print(row["n.name"])

Dropping Graphs

from graphqlite import graphs

with graphs("./data") as gm:
    # Delete a graph and its file
    gm.drop("cache")

Cross-Graph Queries

GraphQLite supports querying across multiple graphs using the FROM clause:

from graphqlite import graphs

with graphs("./data") as gm:
    # Create and populate graphs
    social = gm.create("social")
    social.upsert_node("alice", {"name": "Alice", "user_id": "u1"}, "Person")

    purchases = gm.create("purchases")
    purchases.upsert_node("order1", {"user_id": "u1", "total": 99.99}, "Order")

    # Cross-graph query using FROM clause
    result = gm.query(
        """
        MATCH (p:Person) FROM social
        WHERE p.user_id = 'u1'
        RETURN p.name, graph(p) AS source
        """,
        graphs=["social"]
    )

    for row in result:
        print(f"{row['p.name']} from {row['source']}")

The graph() Function

Use the graph() function to identify which graph a node comes from:

result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source_graph",
    graphs=["social"]
)

Raw SQL Cross-Graph Queries

For advanced use cases, you can execute raw SQL across attached graphs:

result = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    graphs=["social"]
)
print(f"Node count: {result[0][0]}")

Using GraphManager (Rust)

The Rust API provides similar functionality:

use graphqlite::{graphs, GraphManager};

fn main() -> graphqlite::Result<()> {
    let mut gm = graphs("./data")?;

    // Create graphs
    gm.create("social")?;
    gm.create("products")?;

    // List graphs
    for name in gm.list()? {
        println!("Graph: {}", name);
    }

    // Open and use a graph
    let social = gm.open_graph("social")?;
    social.query("CREATE (n:Person {name: 'Alice'})")?;

    // Cross-graph query
    let result = gm.query(
        "MATCH (n:Person) FROM social RETURN n.name",
        &["social"]
    )?;

    for row in &result {
        println!("{}", row.get::<String>("n.name")?);
    }

    // Drop a graph
    gm.drop("products")?;

    Ok(())
}

Direct SQL with ATTACH

You can also work with multiple graphs directly using SQLite's ATTACH:

import sqlite3
import graphqlite

# Create separate graph databases
conn1 = sqlite3.connect("social.db")
graphqlite.load(conn1)
conn1.execute("SELECT cypher('CREATE (n:Person {name: \"Alice\"})')")
conn1.close()

conn2 = sqlite3.connect("products.db")
graphqlite.load(conn2)
conn2.execute("SELECT cypher('CREATE (n:Product {name: \"Phone\"})')")
conn2.close()

# Query across both
coordinator = sqlite3.connect(":memory:")
graphqlite.load(coordinator)
coordinator.execute("ATTACH DATABASE 'social.db' AS social")
coordinator.execute("ATTACH DATABASE 'products.db' AS products")

result = coordinator.execute(
    "SELECT cypher('MATCH (n:Person) FROM social RETURN n.name')"
).fetchone()
print(result[0])

Best Practices

  1. Use GraphManager for convenience: It handles extension loading, connection caching, and cleanup automatically.

  2. Commit before cross-graph queries: GraphManager automatically commits open graph connections before cross-graph queries to ensure data visibility.

  3. Keep graphs focused: Design your graphs around specific domains or use cases for better performance and maintainability.

  4. Use meaningful names: Graph names become SQLite database aliases, so use valid SQL identifiers.

  5. Handle errors gracefully: Check for FileNotFoundError when opening graphs that might not exist.

Limitations

  • Cross-graph queries are read-only for the attached graphs
  • The FROM clause only applies to MATCH patterns
  • Graph names must be valid SQL identifiers (alphanumeric, underscores)
  • Maximum of ~10 attached databases (SQLite limit)

Use the gqlite CLI

The gqlite command-line tool provides an interactive shell for executing Cypher queries against a SQLite database.

Building

angreal build app
# or
make graphqlite

This creates build/gqlite.

Usage

# Interactive mode with default database (graphqlite.db)
./build/gqlite

# Specify a database file
./build/gqlite mydata.db

# Initialize a fresh database
./build/gqlite -i mydata.db

# Verbose mode (shows query execution details)
./build/gqlite -v mydata.db

Interactive Shell

When you start gqlite, you'll see an interactive prompt:

GraphQLite Interactive Shell
Type .help for help, .quit to exit
Queries must end with semicolon (;)

graphqlite>

Statement Termination

All Cypher queries must end with a semicolon (;). Multi-line statements are supported:

graphqlite> CREATE (a:Person {name: "Alice"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
       ...>     CREATE (a)-[:KNOWS]->(b);
Query executed successfully
  Relationships created: 1

The ...> prompt indicates you're continuing a multi-line statement.

Dot Commands

CommandDescription
.helpShow help information
.schemaDisplay database schema
.tablesList all tables
.statsShow database statistics
.quitExit the shell

Script Execution

You can pipe Cypher scripts to gqlite:

# Execute a script file
./build/gqlite mydata.db < script.cypher

# Inline script
echo 'CREATE (n:Test {value: 42});
MATCH (n:Test) RETURN n.value;' | ./build/gqlite mydata.db

Script Format

Scripts should have one statement per line or use multi-line statements ending with semicolons:

-- setup.cypher
CREATE (a:Person {name: "Alice", age: 30});
CREATE (b:Person {name: "Bob", age: 25});
CREATE (c:Person {name: "Charlie", age: 35});

MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b);

MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c);

-- Query friend-of-friend
MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name;

Examples

Create and Query a Social Network

$ ./build/gqlite social.db
graphqlite> CREATE (alice:Person {name: "Alice"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> CREATE (bob:Person {name: "Bob"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
       ...>     CREATE (a)-[:FRIENDS_WITH]->(b);
Query executed successfully
  Relationships created: 1

graphqlite> MATCH (p:Person)-[:FRIENDS_WITH]->(friend)
       ...>     RETURN p.name, friend.name;
p.name         friend.name
---------------
Alice          Bob

graphqlite> .quit
Goodbye!

Check Database Statistics

graphqlite> .stats

Database Statistics:
===================
  Nodes           : 2
  Edges           : 1
  Node Labels     : 2
  Property Keys   : 1
  Edge Types      : FRIENDS_WITH

Command Line Options

OptionDescription
-h, --helpShow help message
-v, --verboseEnable verbose debug output
-i, --initInitialize new database (overwrites existing)

Use Graph Algorithms

GraphQLite includes 15+ built-in graph algorithms. This guide shows how to use them effectively.

Using Algorithms with the Graph API

The Graph class provides direct methods for common algorithms:

from graphqlite import Graph

g = Graph("my_graph.db")

# Centrality
pagerank = g.pagerank(damping=0.85, iterations=20)
degree = g.degree_centrality()

# Community detection
communities = g.community_detection(iterations=10)

# Path finding
path = g.shortest_path("alice", "bob")

# Components
components = g.connected_components()

Using Algorithms with Cypher

For algorithms not exposed directly, use the RETURN clause:

# Betweenness centrality
results = g.query("RETURN betweennessCentrality()")

# Louvain community detection
results = g.query("RETURN louvain(1.0)")

# A* pathfinding
results = g.query("RETURN astar('start', 'end', 'lat', 'lon')")

Working with Results

All algorithms return JSON results that are parsed into Python dictionaries:

pagerank = g.pagerank()

# Results are a list of dicts
for node in pagerank:
    print(f"Node {node['user_id']}: score {node['score']}")

# Sort by score
top_nodes = sorted(pagerank, key=lambda x: x['score'], reverse=True)[:10]

# Filter
high_scores = [n for n in pagerank if n['score'] > 0.1]

Using Results in SQL

When using raw SQL, extract values with json_each and json_extract:

-- Get top 10 PageRank nodes
SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Algorithm Parameters

PageRank

g.pagerank(damping=0.85, iterations=20)
  • damping: Probability of following a link (default: 0.85)
  • iterations: Number of iterations (default: 20)

Label Propagation

g.community_detection(iterations=10)
  • iterations: Maximum iterations before stopping (default: 10)

Shortest Path

g.shortest_path(source_id, target_id)

Returns {"distance": int, "path": [node_ids], "found": bool}. When found is false, distance is None and path is empty.

A* Pathfinding

SELECT cypher('RETURN astar("start", "end", "lat", "lon")');

Requires nodes to have coordinate properties for the heuristic.

Performance Considerations

  • Small graphs (<10K nodes): All algorithms run instantly
  • Medium graphs (10K-100K nodes): Most algorithms complete in under a second
  • Large graphs (>100K nodes): Some algorithms (PageRank, community detection) may take several seconds

For large graphs, consider:

  1. Running algorithms in a background thread
  2. Caching results if the graph doesn't change frequently
  3. Using approximate algorithms for real-time queries

Handle Special Characters

This guide explains how to handle special characters in property values to avoid query issues.

The Problem

Property values containing certain characters can break Cypher query parsing:

# This will cause issues
g.query("CREATE (n:Note {text: 'Line1\nLine2'})")

Characters that need special handling:

  • Newlines (\n)
  • Carriage returns (\r)
  • Tabs (\t)
  • Single quotes (')
  • Backslashes (\)

The safest approach is to use parameterized queries via the Connection.cypher() method:

g.connection.cypher(
    "CREATE (n:Note {text: $text})",
    {"text": "Line1\nLine2"}
)

Parameters are properly escaped automatically.

Solution 2: Use the Graph API

The high-level Graph API handles escaping for you:

g.upsert_node("note1", {"text": "Line1\nLine2"}, label="Note")

Solution 3: Manual Escaping

If you must build queries manually, escape problematic characters:

def escape_for_cypher(value: str) -> str:
    """Escape a string for use in Cypher property values."""
    return (value
        .replace("\\", "\\\\")   # Backslashes first
        .replace("'", "\\'")      # Single quotes
        .replace("\n", " ")       # Newlines
        .replace("\r", " ")       # Carriage returns
        .replace("\t", " "))      # Tabs

text = escape_for_cypher("Line1\nLine2")
g.query(f"CREATE (n:Note {{text: '{text}'}})")

Common Symptoms

Nodes exist but MATCH returns nothing

Symptom: You insert nodes and can verify they exist with raw SQL (SELECT * FROM nodes), but MATCH (n) RETURN n returns empty results.

Cause: Newlines or other control characters in property values break the query.

Solution: Use parameterized queries or escape the values.

Query syntax errors

Symptom: SyntaxError when creating nodes with text content.

Cause: Unescaped single quotes in the value.

Solution: Escape quotes or use parameters:

# Wrong
g.query("CREATE (n:Quote {text: 'It's a test'})")

# Right - escape the quote
g.query("CREATE (n:Quote {text: 'It\\'s a test'})")

# Better - use parameters
g.connection.cypher("CREATE (n:Quote {text: $text})", {"text": "It's a test"})

Best Practices

  1. Always use parameterized queries for user-provided data
  2. Use the Graph API for simple CRUD operations
  3. Validate input before storing if using raw queries
  4. Consider replacing control characters with spaces or removing them entirely if they're not meaningful

Use with Other Extensions

GraphQLite works alongside other SQLite extensions. This guide shows how to combine them.

Loading Multiple Extensions

Method 1: Use graphqlite.load()

import sqlite3
import graphqlite

conn = sqlite3.connect("combined.db")
graphqlite.load(conn)

# Now load other extensions
conn.enable_load_extension(True)
conn.load_extension("other_extension")
conn.enable_load_extension(False)

Method 2: Access Underlying Connection

import graphqlite
import sqlite_vec  # Example: vector search extension

db = graphqlite.connect("combined.db")
sqlite_vec.load(db.sqlite_connection)  # Access underlying sqlite3.Connection

Example: GraphQLite + sqlite-vec

Combine graph queries with vector similarity search:

import sqlite3
import graphqlite
import sqlite_vec

# Create connection and load both extensions
conn = sqlite3.connect("knowledge.db")
graphqlite.load(conn)
sqlite_vec.load(conn)

# Create graph nodes
conn.execute("SELECT cypher('CREATE (n:Document {id: \"doc1\", title: \"Introduction\"})')")

# Store embeddings in a vector table
conn.execute("""
    CREATE VIRTUAL TABLE IF NOT EXISTS embeddings USING vec0(
        doc_id TEXT PRIMARY KEY,
        embedding FLOAT[384]
    )
""")

# Query: find similar documents, then get their graph neighbors
similar_docs = conn.execute("""
    SELECT doc_id FROM embeddings
    WHERE embedding MATCH ?
    LIMIT 5
""", [query_embedding]).fetchall()

for (doc_id,) in similar_docs:
    # Get related nodes from graph
    related = conn.execute(f"""
        SELECT cypher('
            MATCH (d:Document {{id: "{doc_id}"}})-[:RELATED_TO]->(other)
            RETURN other.title
        ')
    """).fetchall()

In-Memory Database Considerations

In-memory databases are connection-specific. All extensions must share the same connection:

# Correct: single connection, multiple extensions
conn = sqlite3.connect(":memory:")
graphqlite.load(conn)
other_extension.load(conn)
# Both extensions share the same in-memory database

# Wrong: separate connections don't share data
conn1 = sqlite3.connect(":memory:")
conn2 = sqlite3.connect(":memory:")
# conn1 and conn2 are completely separate databases!

Extension Loading Order

Generally, load GraphQLite first, then other extensions. This ensures the graph schema is created before any dependent operations.

conn = sqlite3.connect("db.sqlite")

# 1. Load GraphQLite first
graphqlite.load(conn)

# 2. Load other extensions
conn.enable_load_extension(True)
conn.load_extension("extension2")
conn.load_extension("extension3")
conn.enable_load_extension(False)

Troubleshooting

Extension conflicts

If extensions conflict, try loading them in different orders or check for table name collisions.

Missing tables

Ensure GraphQLite is loaded before querying graph tables. The schema is created on first load.

Transaction issues

Some extensions may have different transaction semantics. If you encounter issues, try committing between operations:

graphqlite.load(conn)
conn.commit()

other_extension.load(conn)
conn.commit()

Parameterized Queries

Parameterized queries prevent SQL injection and properly handle special characters. This guide shows how to use them.

Basic Usage

Use $parameter syntax in Cypher and pass a dictionary of parameters to Connection.cypher():

from graphqlite import Graph

g = Graph(":memory:")

# Named parameters via the connection
results = g.connection.cypher(
    "MATCH (n:Person {name: $name}) WHERE n.age > $age RETURN n",
    {"name": "Alice", "age": 30}
)

With the Connection API

The Connection.cypher() method accepts parameters as a dictionary:

from graphqlite import connect

conn = connect(":memory:")

# Create with parameters
conn.cypher(
    "CREATE (n:Person {name: $name, age: $age})",
    {"name": "Bob", "age": 25}
)

# Query with parameters
results = conn.cypher(
    "MATCH (n:Person) WHERE n.age >= $min_age RETURN n.name",
    {"min_age": 21}
)

With Raw SQL

When using the SQLite interface directly:

SELECT cypher(
    'MATCH (n:Person {name: $name}) RETURN n',
    '{"name": "Alice"}'
);

Parameter Types

Parameters support all JSON types:

params = json.dumps({
    "string_val": "hello",
    "int_val": 42,
    "float_val": 3.14,
    "bool_val": True,
    "null_val": None,
    "array_val": [1, 2, 3]
})

Use Cases

User Input

Always use parameters for user-provided values:

def search_by_name(user_input: str):
    # Safe - user input is parameterized
    return g.connection.cypher(
        "MATCH (n:Person {name: $name}) RETURN n",
        {"name": user_input}
    )

Batch Operations

people = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Carol", "age": 35},
]

for person in people:
    g.connection.cypher(
        "CREATE (n:Person {name: $name, age: $age})",
        person
    )

Complex Values

Parameters handle special characters automatically:

# This works correctly even with quotes and newlines
text = "He said \"hello\"\nand then left."
g.connection.cypher(
    "CREATE (n:Note {content: $text})",
    {"text": text}
)

Benefits

  1. Security: Prevents Cypher injection attacks
  2. Correctness: Properly handles quotes, newlines, and special characters
  3. Performance: Query plans can be cached (future optimization)
  4. Clarity: Separates query logic from data

Common Patterns

Optional Parameters

def search(name: str = None, min_age: int = None):
    conditions = []
    params = {}

    if name:
        conditions.append("n.name = $name")
        params["name"] = name
    if min_age:
        conditions.append("n.age >= $min_age")
        params["min_age"] = min_age

    where = f"WHERE {' AND '.join(conditions)}" if conditions else ""

    return g.connection.cypher(
        f"MATCH (n:Person) {where} RETURN n",
        params if params else None
    )

Lists in Parameters

names = ["Alice", "Bob", "Carol"]
results = g.connection.cypher(
    "MATCH (n:Person) WHERE n.name IN $names RETURN n",
    {"names": names}
)

Cypher Support

GraphQLite implements a substantial subset of the Cypher query language.

Overview

Cypher is a declarative graph query language originally developed by Neo4j. GraphQLite supports the core features needed for most graph operations.

Quick Reference

FeatureSupport
Node patterns✅ Full
Relationship patterns✅ Full
Variable-length paths✅ Full
shortestPath/allShortestPaths✅ Full
Parameterized queries✅ Full
MATCH/OPTIONAL MATCH✅ Full
CREATE/MERGE✅ Full
SET/REMOVE/DELETE✅ Full
WITH/UNWIND/FOREACH✅ Full
LOAD CSV✅ Full
UNION/UNION ALL✅ Full
RETURN with modifiers✅ Full
Aggregation functions✅ Full
CASE expressions✅ Full
List comprehensions✅ Full
Pattern comprehensions✅ Full
Map projections✅ Full
Multi-graph (FROM clause)✅ Full
Graph algorithms✅ 15+ built-in
CALL procedures❌ Not supported
CREATE INDEX/CONSTRAINT❌ Use SQLite

Pattern Syntax

Nodes

(n)                           -- Any node
(n:Person)                    -- Node with label
(n:Person {name: 'Alice'})    -- Node with properties
(:Person)                     -- Anonymous node with label

Relationships

-[r]->                        -- Outgoing relationship
<-[r]-                        -- Incoming relationship
-[r]-                         -- Either direction
-[:KNOWS]->                   -- Relationship with type
-[r:KNOWS {since: 2020}]->    -- With properties

Variable-Length Paths

-[*]->                        -- Any length
-[*2]->                       -- Exactly 2 hops
-[*1..3]->                    -- 1 to 3 hops
-[:KNOWS*1..5]->              -- Typed, 1 to 5 hops

Clauses

See Clauses Reference for detailed documentation.

Functions

See Functions Reference for the complete function list.

Operators

See Operators Reference for comparison and logical operators.

Implementation Notes

GraphQLite implements standard Cypher with some differences from full implementations:

  1. No CALL procedures - Use built-in graph algorithm functions instead (e.g., RETURN pageRank())
  2. No CREATE INDEX/CONSTRAINT - Use SQLite's indexing and constraint mechanisms directly
  3. EXPLAIN supported - Returns the generated SQL for debugging instead of a query plan
  4. Multi-graph support - Use the FROM clause to query specific graphs with GraphManager
  5. Substring indexing - Uses 0-based indexing (Cypher standard), automatically converted for SQLite

Cypher Clauses

Reading Clauses

MATCH

Find patterns in the graph:

MATCH (n:Person) RETURN n
MATCH (a)-[:KNOWS]->(b) RETURN a, b
MATCH (n:Person {name: 'Alice'}) RETURN n

Shortest Path Patterns

Find shortest paths between nodes:

// Find a single shortest path
MATCH p = shortestPath((a:Person {name: 'Alice'})-[*]-(b:Person {name: 'Bob'}))
RETURN p, length(p)

// Find all shortest paths (all paths with minimum length)
MATCH p = allShortestPaths((a:Person)-[*]-(b:Person))
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN p

// With relationship type filter
MATCH p = shortestPath((a)-[:KNOWS*]->(b))
RETURN nodes(p), relationships(p)

// With length constraints
MATCH p = shortestPath((a)-[*..10]->(b))
RETURN p

OPTIONAL MATCH

Like MATCH, but returns NULL for non-matches (left join semantics):

MATCH (p:Person)
OPTIONAL MATCH (p)-[:MANAGES]->(e)
RETURN p.name, e.name

WHERE

Filter results:

MATCH (n:Person)
WHERE n.age > 21 AND n.city = 'NYC'
RETURN n

Writing Clauses

CREATE

Create nodes and relationships:

CREATE (n:Person {name: 'Alice', age: 30})
CREATE (a)-[:KNOWS {since: 2020}]->(b)

MERGE

Create if not exists, match if exists:

MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()

SET

Update properties:

MATCH (n:Person {name: 'Alice'})
SET n.age = 31, n.city = 'LA'

Add labels:

MATCH (n:Person {name: 'Alice'})
SET n:Employee

REMOVE

Remove properties:

MATCH (n:Person {name: 'Alice'})
REMOVE n.temporary_field

Remove labels:

MATCH (n:Person:Employee {name: 'Alice'})
REMOVE n:Employee

DELETE

Delete nodes (must have no relationships):

MATCH (n:Person {name: 'Alice'})
DELETE n

DETACH DELETE

Delete nodes and all their relationships:

MATCH (n:Person {name: 'Alice'})
DETACH DELETE n

Composing Clauses

WITH

Chain query parts, aggregation, and filtering:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c, count(p) as employee_count
WHERE employee_count > 10
RETURN c.name, employee_count

UNWIND

Expand a list into rows:

UNWIND [1, 2, 3] AS x
RETURN x

UNWIND $names AS name
CREATE (n:Person {name: name})

FOREACH

Iterate and perform updates:

MATCH p = (start)-[*]->(end)
FOREACH (n IN nodes(p) | SET n.visited = true)

LOAD CSV

Import data from CSV files:

// With headers (access columns by name)
LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
CREATE (n:Person {name: row.name, age: toInteger(row.age)})

// Without headers (access columns by index)
LOAD CSV FROM 'file:///data.csv' AS row
CREATE (n:Item {id: row[0], value: row[1]})

// Custom field terminator
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CREATE (n:Record {field1: row.col1})

Note: File paths are relative to the current working directory. Use file:/// prefix for local files.

Multi-Graph Queries

FROM Clause

Query specific graphs when using GraphManager (multi-graph support):

// Query a specific graph
MATCH (n:Person) FROM social
RETURN n.name

// Combined with other clauses
MATCH (p:Person) FROM social
WHERE p.age > 21
RETURN p.name, graph(p) AS source_graph

The graph() function returns which graph a node came from.

Combining Results

UNION

Combine results from multiple queries, removing duplicates:

MATCH (n:Person) WHERE n.city = 'NYC' RETURN n.name
UNION
MATCH (n:Person) WHERE n.age > 50 RETURN n.name

UNION ALL

Combine results keeping all rows (including duplicates):

MATCH (a:Person)-[:KNOWS]->(b) RETURN b.name AS connection
UNION ALL
MATCH (a:Person)-[:WORKS_WITH]->(b) RETURN b.name AS connection

Return Clause

RETURN

Specify what to return:

MATCH (n:Person) RETURN n
MATCH (n:Person) RETURN n.name, n.age
MATCH (n:Person) RETURN n.name AS name

DISTINCT

Remove duplicates:

MATCH (n:Person)-[:KNOWS]->(m)
RETURN DISTINCT m.city

ORDER BY

Sort results:

MATCH (n:Person)
RETURN n.name, n.age
ORDER BY n.age DESC, n.name ASC

SKIP and LIMIT

Pagination:

MATCH (n:Person)
RETURN n
ORDER BY n.name
SKIP 10
LIMIT 5

Aggregation

Use aggregate functions in RETURN or WITH:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, count(p), avg(p.salary), collect(p.name)

See Functions Reference for all aggregate functions.

Cypher Functions

String Functions

FunctionDescriptionExample
toLower(s)Convert to lowercasetoLower('Hello')'hello'
toUpper(s)Convert to uppercasetoUpper('Hello')'HELLO'
trim(s)Remove leading/trailing whitespacetrim(' hi ')'hi'
ltrim(s)Remove leading whitespaceltrim(' hi')'hi'
rtrim(s)Remove trailing whitespacertrim('hi ')'hi'
replace(s, from, to)Replace occurrencesreplace('hello', 'l', 'x')'hexxo'
substring(s, start, len)Extract substringsubstring('hello', 1, 3)'ell'
left(s, n)First n charactersleft('hello', 2)'he'
right(s, n)Last n charactersright('hello', 2)'lo'
split(s, delim)Split into listsplit('a,b,c', ',')['a','b','c']
reverse(s)Reverse stringreverse('hello')'olleh'
length(s)String lengthlength('hello')5
size(s)String length (alias)size('hello')5
toString(x)Convert to stringtoString(123)'123'

String Predicates

FunctionDescriptionExample
startsWith(s, prefix)Check prefixstartsWith('hello', 'he')true
endsWith(s, suffix)Check suffixendsWith('hello', 'lo')true
contains(s, sub)Check substringcontains('hello', 'ell')true

Math Functions

FunctionDescriptionExample
abs(n)Absolute valueabs(-5)5
ceil(n)Round upceil(2.3)3
floor(n)Round downfloor(2.7)2
round(n)Round to nearestround(2.5)3
sign(n)Sign (-1, 0, 1)sign(-5)-1
sqrt(n)Square rootsqrt(16)4
log(n)Natural logarithmlog(e())1
log10(n)Base-10 logarithmlog10(100)2
exp(n)e^nexp(1)2.718...
rand()Random 0-1rand()0.42...
random()Random 0-1 (alias)random()0.42...
pi()π constantpi()3.14159...
e()e constante()2.71828...

Trigonometric Functions

FunctionDescription
sin(n)Sine
cos(n)Cosine
tan(n)Tangent
asin(n)Arc sine
acos(n)Arc cosine
atan(n)Arc tangent

List Functions

FunctionDescriptionExample
head(list)First elementhead([1,2,3])1
tail(list)All but firsttail([1,2,3])[2,3]
last(list)Last elementlast([1,2,3])3
size(list)Lengthsize([1,2,3])3
range(start, end)Create rangerange(1, 5)[1,2,3,4,5]
reverse(list)Reverse listreverse([1,2,3])[3,2,1]
keys(map)Get map keyskeys({a:1, b:2})['a','b']

Aggregate Functions

FunctionDescriptionExample
count(x)Count itemscount(n), count(*)
sum(x)Sum valuessum(n.amount)
avg(x)Averageavg(n.score)
min(x)Minimummin(n.age)
max(x)Maximummax(n.age)
collect(x)Collect into listcollect(n.name)

Entity Functions

FunctionDescriptionExample
id(node)Get node/edge IDid(n)
labels(node)Get node labelslabels(n)['Person']
type(rel)Get relationship typetype(r)'KNOWS'
properties(x)Get all propertiesproperties(n)
startNode(rel)Start node of relationshipstartNode(r)
endNode(rel)End node of relationshipendNode(r)

Path Functions

FunctionDescriptionExample
nodes(path)Get all nodes in pathnodes(p)
relationships(path)Get all relationshipsrelationships(p)
rels(path)Get all relationships (alias)rels(p)
length(path)Path length (edges)length(p)

Type Conversion

FunctionDescriptionExample
toInteger(x)Convert to integertoInteger('42')42
toFloat(x)Convert to floattoFloat('3.14')3.14
toBoolean(x)Convert to booleantoBoolean('true')true
coalesce(x, y, ...)First non-null valuecoalesce(n.name, 'Unknown')

Temporal Functions

FunctionDescriptionExample
date()Current datedate()'2025-01-15'
datetime()Current datetimedatetime()
time()Current timetime()
timestamp()Unix timestamp (ms)timestamp()
localdatetime()Local datetimelocaldatetime()
randomUUID()Generate random UUIDrandomUUID()'550e8400-e29b-...'

Predicate Functions

FunctionDescriptionExample
exists(pattern)Pattern existsEXISTS { (n)-[:KNOWS]->() }
exists(prop)Property existsexists(n.email)
all(x IN list WHERE pred)All matchall(x IN [1,2,3] WHERE x > 0)
any(x IN list WHERE pred)Any matchany(x IN [1,2,3] WHERE x > 2)
none(x IN list WHERE pred)None matchnone(x IN [1,2,3] WHERE x < 0)
single(x IN list WHERE pred)Exactly onesingle(x IN [1,2,3] WHERE x = 2)

Reduce

FunctionDescriptionExample
reduce(acc = init, x IN list | expr)Fold/reducereduce(s = 0, x IN [1,2,3] | s + x)6

CASE Expressions

Searched CASE

Evaluates conditions in order and returns the first matching result:

RETURN CASE
    WHEN n.age < 18 THEN 'minor'
    WHEN n.age < 65 THEN 'adult'
    ELSE 'senior'
END AS category

Simple CASE

Compares an expression against values:

RETURN CASE n.status
    WHEN 'A' THEN 'Active'
    WHEN 'I' THEN 'Inactive'
    WHEN 'P' THEN 'Pending'
    ELSE 'Unknown'
END AS status_name

Comprehensions

List Comprehension

Create lists by transforming or filtering:

// Transform each element
RETURN [x IN range(1, 5) | x * 2]
// → [2, 4, 6, 8, 10]

// Filter elements
RETURN [x IN range(1, 10) WHERE x % 2 = 0]
// → [2, 4, 6, 8, 10]

// Filter and transform
RETURN [x IN range(1, 10) WHERE x % 2 = 0 | x * x]
// → [4, 16, 36, 64, 100]

Pattern Comprehension

Extract data from pattern matches within an expression:

// Collect names of friends
MATCH (p:Person)
RETURN p.name, [(p)-[:KNOWS]->(friend) | friend.name] AS friends

// With filtering
RETURN [(p)-[:KNOWS]->(f:Person) WHERE f.age > 21 | f.name] AS adult_friends

Map Projection

Create maps by selecting properties from nodes:

// Select specific properties
MATCH (n:Person)
RETURN n {.name, .age}
// → {name: "Alice", age: 30}

// Include computed values
MATCH (n:Person)
RETURN n {.name, status: 'active', upperName: toUpper(n.name)}

Cypher Operators

Comparison Operators

OperatorDescriptionExample
=Equaln.age = 30
<>Not equaln.status <> 'deleted'
<Less thann.age < 18
>Greater thann.age > 65
<=Less than or equaln.score <= 100
>=Greater than or equaln.score >= 0

Boolean Operators

OperatorDescriptionExample
ANDLogical andn.age > 18 AND n.active = true
ORLogical orn.role = 'admin' OR n.role = 'mod'
NOTLogical notNOT n.deleted
XORExclusive ora.flag XOR b.flag

Null Operators

OperatorDescriptionExample
IS NULLCheck for nulln.email IS NULL
IS NOT NULLCheck for non-nulln.email IS NOT NULL

String Operators

OperatorDescriptionExample
STARTS WITHPrefix matchn.name STARTS WITH 'A'
ENDS WITHSuffix matchn.email ENDS WITH '.com'
CONTAINSSubstring matchn.bio CONTAINS 'developer'
=~Regex matchn.email =~ '.*@gmail\\.com'

List Operators

OperatorDescriptionExample
INList membershipn.status IN ['active', 'pending']
+List concatenation[1, 2] + [3, 4][1, 2, 3, 4]
[index]Index accesslist[0] (first element)

Arithmetic Operators

OperatorDescriptionExample
+Additionn.price + tax
-Subtractionn.total - discount
*Multiplicationn.quantity * n.price
/Divisionn.total / n.count
%Modulon.id % 10

String Concatenation

OperatorDescriptionExample
+Concatenate stringsn.first + ' ' + n.last

Property Access

OperatorDescriptionExample
.Property accessn.name

Operator Precedence

From highest to lowest:

  1. . [] - Property/index access
  2. * / % - Multiplication, division, modulo
  3. + - - Addition, subtraction
  4. = <> < > <= >= - Comparison
  5. IS NULL IS NOT NULL
  6. IN STARTS WITH ENDS WITH CONTAINS =~
  7. NOT
  8. AND
  9. XOR
  10. OR

Use parentheses to override precedence:

WHERE (n.age > 18 OR n.verified) AND n.active

Graph Algorithms

GraphQLite includes 15+ built-in graph algorithms.

Centrality Algorithms

PageRank

Measures node importance based on incoming links from important nodes.

RETURN pageRank()
RETURN pageRank(0.85, 20)  -- damping, iterations

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Parameters:

  • damping (default: 0.85) - Probability of following a link
  • iterations (default: 20) - Number of iterations

Degree Centrality

Counts incoming and outgoing connections.

RETURN degreeCentrality()

Returns: [{"node_id": int, "user_id": string, "in_degree": int, "out_degree": int, "degree": int}, ...]

Betweenness Centrality

Measures how often a node lies on shortest paths between other nodes.

RETURN betweennessCentrality()

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Closeness Centrality

Measures average distance to all other nodes.

RETURN closenessCentrality()

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Eigenvector Centrality

Measures influence based on connections to high-scoring nodes.

RETURN eigenvectorCentrality()
RETURN eigenvectorCentrality(100)  -- max iterations

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Community Detection

Label Propagation

Detects communities by propagating labels through the network.

RETURN labelPropagation()
RETURN labelPropagation(10)  -- max iterations
RETURN communities()         -- alias

Returns: [{"node_id": int, "user_id": string, "community": int}, ...]

Louvain

Hierarchical community detection optimizing modularity.

RETURN louvain()
RETURN louvain(1.0)  -- resolution parameter

Returns: [{"node_id": int, "user_id": string, "community": int}, ...]

Connected Components

Weakly Connected Components (WCC)

Groups nodes reachable by ignoring edge direction.

RETURN wcc()

Returns: [{"node_id": int, "user_id": string, "component": int}, ...]

Strongly Connected Components (SCC)

Groups nodes where every node can reach every other node following edge direction.

RETURN scc()

Returns: [{"node_id": int, "user_id": string, "component": int}, ...]

Path Finding

Dijkstra (Shortest Path)

Finds shortest path between two nodes.

RETURN dijkstra('source_id', 'target_id')

Returns: {"found": bool, "distance": int, "path": [node_ids]}

The found field indicates whether a path exists. When found is false, distance is null and path is empty.

Shortest path with heuristic. Can use geographic coordinates for distance estimation or fall back to uniform heuristic.

RETURN astar('source_id', 'target_id')
RETURN astar('source_id', 'target_id', 'lat_prop', 'lon_prop')

When lat_prop and lon_prop are provided, A* uses haversine distance as the heuristic. Without these properties, it behaves similarly to Dijkstra but may explore fewer nodes.

Returns: {"found": bool, "distance": float, "path": [node_ids], "nodes_explored": int}

All-Pairs Shortest Paths (APSP)

Computes shortest distances between all node pairs.

RETURN apsp()

Returns: [{"source": string, "target": string, "distance": int}, ...]

Note: O(n²) space and time complexity. Use with caution on large graphs.

Traversal

Breadth-First Search (BFS)

Explores nodes level by level from a starting point.

RETURN bfs('start_id')
RETURN bfs('start_id', 3)  -- max depth

Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]

The order field indicates the traversal order (0 = starting node, then incrementing).

Depth-First Search (DFS)

Explores as far as possible along each branch.

RETURN dfs('start_id')
RETURN dfs('start_id', 5)  -- max depth

Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]

Similarity

Node Similarity (Jaccard)

Computes Jaccard similarity between node neighborhoods.

RETURN nodeSimilarity()

Returns: [{"node1": int, "node2": int, "similarity": float}, ...]

K-Nearest Neighbors (KNN)

Finds k most similar nodes to a given node based on Jaccard similarity of neighborhoods.

RETURN knn('node_id', 10)  -- node, k

Returns: [{"neighbor": string, "similarity": float, "rank": int}, ...]

Results are ordered by similarity (highest first), with rank starting at 1.

Triangle Count

Counts triangles and computes clustering coefficient.

RETURN triangleCount()

Returns: [{"node_id": int, "user_id": string, "triangles": int, "clustering_coefficient": float}, ...]

Using Results in SQL

Extract algorithm results using SQLite JSON functions:

SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Python API Reference

Installation

pip install graphqlite

Module Functions

graphqlite.connect()

Create a connection to a SQLite database with GraphQLite loaded.

from graphqlite import connect

conn = connect(":memory:")
conn = connect("graph.db")
conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib")

Parameters:

  • database (str) - Database path or :memory:
  • extension_path (str, optional) - Path to extension file

Returns: Connection

graphqlite.load()

Load GraphQLite into an existing sqlite3 connection.

import sqlite3
import graphqlite

conn = sqlite3.connect(":memory:")
graphqlite.load(conn)

Parameters:

  • conn - sqlite3.Connection or apsw.Connection
  • entry_point (str, optional) - Extension entry point

graphqlite.loadable_path()

Get the path to the loadable extension.

path = graphqlite.loadable_path()

Returns: str

graphqlite.wrap()

Wrap an existing sqlite3 connection with GraphQLite support.

import sqlite3
import graphqlite

conn = sqlite3.connect(":memory:")
wrapped = graphqlite.wrap(conn)
results = wrapped.cypher("RETURN 1 AS x")

Parameters:

  • conn - sqlite3.Connection object
  • extension_path (str, optional) - Path to extension file

Returns: Connection

graphqlite.graph()

Factory function to create a Graph instance.

from graphqlite import graph

g = graph(":memory:")
g = graph("graph.db", namespace="myapp")

Parameters:

  • db_path (str) - Database path or :memory:
  • namespace (str, optional) - Graph namespace (default: "default")
  • extension_path (str, optional) - Path to extension file

Returns: Graph

CypherResult Class

Result container returned by cypher() queries.

results = conn.cypher("MATCH (n:Person) RETURN n.name, n.age")

# Length
print(len(results))  # Number of rows

# Indexing
first_row = results[0]  # Get first row as dict

# Iteration
for row in results:
    print(row["n.name"])

# Column names
print(results.columns)  # ["n.name", "n.age"]

# Convert to list
all_rows = results.to_list()  # List of dicts

Properties:

  • columns - List of column names

Methods:

  • to_list() - Return all rows as a list of dictionaries

Connection Class

Connection.cypher()

Execute a Cypher query with optional parameters.

conn.cypher("CREATE (n:Person {name: 'Alice'})")
results = conn.cypher("MATCH (n) RETURN n.name")
for row in results:
    print(row["n.name"])

# With parameters
results = conn.cypher(
    "MATCH (n:Person {name: $name}) RETURN n",
    {"name": "Alice"}
)

The query parameter is the Cypher query string. The optional params parameter accepts a dictionary that will be converted to JSON for parameter binding.

Returns: CypherResult object (iterable, supports indexing and len())

Connection.execute()

Execute raw SQL.

conn.execute("SELECT * FROM nodes")

Graph Class

High-level API for graph operations.

Constructor

from graphqlite import Graph

g = Graph(":memory:")
g = Graph("graph.db")

Node Operations

upsert_node()

Create or update a node.

g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")

Parameters:

  • node_id (str) - Unique node identifier
  • properties (dict) - Node properties
  • label (str, optional) - Node label

get_node()

Get a node by ID.

node = g.get_node("alice")
# {"id": "alice", "label": "Person", "properties": {"name": "Alice", "age": 30}}

Returns: dict or None

has_node()

Check if a node exists.

exists = g.has_node("alice")  # True

Returns: bool

delete_node()

Delete a node.

g.delete_node("alice")

get_all_nodes()

Get all nodes, optionally filtered by label.

all_nodes = g.get_all_nodes()
people = g.get_all_nodes(label="Person")

Returns: List of dicts

Edge Operations

upsert_edge()

Create or update an edge.

g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")

Parameters:

  • source_id (str) - Source node ID
  • target_id (str) - Target node ID
  • properties (dict) - Edge properties
  • rel_type (str, optional) - Relationship type

get_edge()

Get an edge between two nodes.

edge = g.get_edge("alice", "bob")

Returns the first edge found between the source and target nodes, or None if no edge exists.

has_edge()

Check if an edge exists.

exists = g.has_edge("alice", "bob")

Returns: bool

delete_edge()

Delete an edge between two nodes.

g.delete_edge("alice", "bob")

get_all_edges()

Get all edges.

edges = g.get_all_edges()

Returns: List of dicts

Graph Operations

get_neighbors()

Get a node's neighbors (connected by edges in either direction).

neighbors = g.get_neighbors("alice")

Parameters:

  • node_id (str) - Node ID

Returns: List of neighbor node dicts

node_degree()

Get a node's degree, which is the total number of edges connected to the node (both incoming and outgoing).

degree = g.node_degree("alice")  # 5

Returns an integer count of connected edges.

stats()

Get graph statistics.

stats = g.stats()
# {"nodes": 100, "edges": 250}

Returns: dict

Query Methods

query()

Execute a Cypher query and return results as a list of dictionaries.

results = g.query("MATCH (n:Person) RETURN n.name")
for row in results:
    print(row["n.name"])

This method is for queries that don't require parameters. For parameterized queries, access the underlying connection:

results = g.connection.cypher(
    "MATCH (n:Person {name: $name}) RETURN n",
    {"name": "Alice"}
)

Algorithm Methods

Centrality Algorithms

pagerank()

Compute PageRank scores for all nodes.

results = g.pagerank(damping=0.85, iterations=20)
# [{"node_id": "alice", "score": 0.25}, ...]

Parameters:

  • damping (float, default: 0.85) - Damping factor
  • iterations (int, default: 20) - Number of iterations
degree_centrality()

Compute in-degree, out-degree, and total degree for all nodes.

results = g.degree_centrality()
# [{"node_id": "alice", "in_degree": 2, "out_degree": 3, "degree": 5}, ...]
betweenness_centrality()

Compute betweenness centrality (how often a node lies on shortest paths).

results = g.betweenness_centrality()
# Alias: g.betweenness()

Returns: List of {"node_id": str, "score": float}

closeness_centrality()

Compute closeness centrality (average distance to all other nodes).

results = g.closeness_centrality()
# Alias: g.closeness()

Returns: List of {"node_id": str, "score": float}

eigenvector_centrality()

Compute eigenvector centrality (influence based on connections to high-scoring nodes).

results = g.eigenvector_centrality(iterations=100)

Parameters:

  • iterations (int, default: 100) - Maximum iterations

Community Detection

community_detection()

Detect communities using label propagation.

results = g.community_detection(iterations=10)
# [{"node_id": "alice", "community": 1}, ...]

Parameters:

  • iterations (int, default: 10) - Maximum iterations
louvain()

Detect communities using the Louvain algorithm (modularity optimization).

results = g.louvain(resolution=1.0)

Parameters:

  • resolution (float, default: 1.0) - Higher values produce more communities
leiden_communities()

Detect communities using the Leiden algorithm.

results = g.leiden_communities(resolution=1.0, random_seed=42)

Parameters:

  • resolution (float, default: 1.0) - Resolution parameter
  • random_seed (int, optional) - Random seed for reproducibility

Requires: graspologic>=3.0 (pip install graspologic)

Connected Components

weakly_connected_components()

Find weakly connected components (ignoring edge direction).

results = g.weakly_connected_components()
# Aliases: g.connected_components(), g.wcc()

Returns: List of {"node_id": str, "component": int}

strongly_connected_components()

Find strongly connected components (respecting edge direction).

results = g.strongly_connected_components()
# Alias: g.scc()

Returns: List of {"node_id": str, "component": int}

Path Finding

shortest_path()

Find the shortest path between two nodes using Dijkstra's algorithm.

path = g.shortest_path("alice", "bob", weight_property="distance")
# {"distance": 2, "path": ["alice", "carol", "bob"], "found": True}
# Alias: g.dijkstra()

Parameters:

  • source_id (str) - Starting node ID
  • target_id (str) - Ending node ID
  • weight_property (str, optional) - Edge property to use as weight

Returns: {"path": list, "distance": float|None, "found": bool}

astar()

Find the shortest path using A* algorithm with optional geographic heuristic.

path = g.astar("alice", "bob", lat_prop="latitude", lon_prop="longitude")
# Alias: g.a_star()

Parameters:

  • source_id (str) - Starting node ID
  • target_id (str) - Ending node ID
  • lat_prop (str, optional) - Latitude property name for heuristic
  • lon_prop (str, optional) - Longitude property name for heuristic

Returns: {"path": list, "distance": float|None, "found": bool, "nodes_explored": int}

all_pairs_shortest_path()

Compute shortest distances between all node pairs (Floyd-Warshall).

results = g.all_pairs_shortest_path()
# Alias: g.apsp()

Returns: List of {"source": str, "target": str, "distance": float}

Note: O(n²) complexity. Use with caution on large graphs.

Traversal

bfs()

Breadth-first search from a starting node.

results = g.bfs("alice", max_depth=3)
# Alias: g.breadth_first_search()

Parameters:

  • start_id (str) - Starting node ID
  • max_depth (int, default: -1) - Maximum depth (-1 for unlimited)

Returns: List of {"user_id": str, "depth": int, "order": int}

dfs()

Depth-first search from a starting node.

results = g.dfs("alice", max_depth=5)
# Alias: g.depth_first_search()

Parameters:

  • start_id (str) - Starting node ID
  • max_depth (int, default: -1) - Maximum depth (-1 for unlimited)

Returns: List of {"user_id": str, "depth": int, "order": int}

Similarity

node_similarity()

Compute Jaccard similarity between node neighborhoods.

# All pairs above threshold
results = g.node_similarity(threshold=0.5)

# Specific pair
results = g.node_similarity(node1_id="alice", node2_id="bob")

# Top-k most similar pairs
results = g.node_similarity(top_k=10)

Parameters:

  • node1_id (str, optional) - First node ID
  • node2_id (str, optional) - Second node ID
  • threshold (float, default: 0.0) - Minimum similarity threshold
  • top_k (int, default: 0) - Return only top-k pairs (0 for all)

Returns: List of {"node1": str, "node2": str, "similarity": float}

knn()

Find k-nearest neighbors for a node based on Jaccard similarity.

results = g.knn("alice", k=10)

Parameters:

  • node_id (str) - Node to find neighbors for
  • k (int, default: 10) - Number of neighbors to return

Returns: List of {"neighbor": str, "similarity": float, "rank": int}

triangle_count()

Count triangles and compute clustering coefficients.

results = g.triangle_count()
# Alias: g.triangles()

Returns: List of {"node_id": str, "triangles": int, "clustering_coefficient": float}

Export

to_rustworkx()

Export the graph to a rustworkx PyDiGraph for use with rustworkx algorithms.

graph, node_map = g.to_rustworkx()

Returns: Tuple of (rustworkx.PyDiGraph, dict mapping node IDs to indices)

Requires: rustworkx>=0.13 (pip install rustworkx)

Batch Operations

upsert_nodes_batch()

nodes = [
    ("alice", {"name": "Alice"}, "Person"),
    ("bob", {"name": "Bob"}, "Person"),
]
g.upsert_nodes_batch(nodes)

upsert_edges_batch()

edges = [
    ("alice", "bob", {"since": 2020}, "KNOWS"),
    ("bob", "carol", {"since": 2021}, "KNOWS"),
]
g.upsert_edges_batch(edges)

GraphManager Class

Manages multiple graph databases in a directory with cross-graph query support.

Constructor

from graphqlite import graphs, GraphManager

# Using factory function (recommended)
gm = graphs("./data")

# Or direct instantiation
gm = GraphManager("./data")
gm = GraphManager("./data", extension_path="/path/to/graphqlite.dylib")

Context Manager

with graphs("./data") as gm:
    # Work with graphs...
    pass  # All connections closed automatically

Graph Management

list()

List all graphs in the directory.

names = gm.list()  # ["products", "social", "users"]

Returns: List of graph names (sorted)

exists()

Check if a graph exists.

if gm.exists("social"):
    print("Graph exists")

Returns: bool

create()

Create a new graph.

g = gm.create("social")

Parameters:

  • name (str) - Graph name

Returns: Graph instance

Raises: FileExistsError if graph already exists

open()

Open an existing graph.

g = gm.open("social")

Parameters:

  • name (str) - Graph name

Returns: Graph instance

Raises: FileNotFoundError if graph doesn't exist

open_or_create()

Open a graph, creating it if it doesn't exist.

g = gm.open_or_create("cache")

Returns: Graph instance

drop()

Delete a graph and its database file.

gm.drop("old_graph")

Raises: FileNotFoundError if graph doesn't exist

Cross-Graph Queries

query()

Execute a Cypher query across multiple graphs.

result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source",
    graphs=["social"]
)
for row in result:
    print(f"{row['n.name']} from {row['source']}")

Parameters:

  • cypher (str) - Cypher query with FROM clauses
  • graphs (list) - Graph names to attach
  • params (dict, optional) - Query parameters

Returns: CypherResult

query_sql()

Execute raw SQL across attached graphs.

result = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    graphs=["social"]
)

Parameters:

  • sql (str) - SQL query with graph-prefixed table names
  • graphs (list) - Graph names to attach
  • parameters (tuple, optional) - Query parameters

Returns: List of tuples

Collection Interface

# Length
len(gm)  # Number of graphs

# Membership
"social" in gm  # True/False

# Iteration
for name in gm:
    print(name)

Utility Functions

escape_string()

Escape a string for use in Cypher.

from graphqlite import escape_string

safe = escape_string("It's a test")

sanitize_rel_type()

Sanitize a relationship type name.

from graphqlite import sanitize_rel_type

safe = sanitize_rel_type("has-friend")  # "HAS_FRIEND"

CYPHER_RESERVED

A set of reserved Cypher keywords that need special handling in queries.

from graphqlite import CYPHER_RESERVED

if my_label.upper() in CYPHER_RESERVED:
    my_label = f"`{my_label}`"  # Quote reserved words

Contains keywords like: MATCH, CREATE, RETURN, WHERE, AND, OR, NOT, IN, AS, WITH, ORDER, BY, LIMIT, SKIP, DELETE, SET, REMOVE, MERGE, ON, CASE, WHEN, THEN, ELSE, END, TRUE, FALSE, NULL, etc.

Rust API Reference

Installation

Add to your Cargo.toml:

[dependencies]
graphqlite = "0.2"

Connection

Opening a Connection

#![allow(unused)]
fn main() {
use graphqlite::Connection;

// In-memory database
let conn = Connection::open_in_memory()?;

// File-based database
let conn = Connection::open("graph.db")?;

// With custom extension path
let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?;
}

Executing Cypher Queries

#![allow(unused)]
fn main() {
// Execute without results
conn.cypher("CREATE (n:Person {name: 'Alice'})")?;

// Execute with results
let rows = conn.cypher("MATCH (n:Person) RETURN n.name")?;
for row in rows {
    let name: String = row.get(0)?;
    println!("{}", name);
}
}

Parameterized Queries

For parameterized queries, embed parameters in the query string:

#![allow(unused)]
fn main() {
use serde_json::json;

let params = json!({"name": "Alice", "age": 30});
let query = format!(
    "CREATE (n:Person {{name: '{}', age: {}}})",
    params["name"].as_str().unwrap(),
    params["age"]
);
conn.cypher(&query)?;
}

Note: Direct parameter binding is planned for a future release.

Row Access

Access row values by column name using the get() method:

#![allow(unused)]
fn main() {
let results = conn.cypher("MATCH (n) RETURN n.name AS name, n.age AS age")?;
for row in &results {
    let name: String = row.get("name")?;
    let age: i32 = row.get("age")?;
    println!("{} is {} years old", name, age);
}
}

The column name must match the alias in your RETURN clause. Use AS to create readable column names.

Type Conversions

GraphQLite automatically converts between Cypher and Rust types:

Cypher TypeRust Type
Integeri32, i64
Floatf64
StringString, &str
Booleanbool
NullOption<T>
ListVec<T>
Mapserde_json::Value

Error Handling

#![allow(unused)]
fn main() {
use graphqlite::{Connection, Error};

fn example() -> Result<(), Error> {
    let conn = Connection::open_in_memory()?;

    match conn.cypher("INVALID QUERY") {
        Ok(rows) => { /* process rows */ }
        Err(Error::Cypher(msg)) => {
            eprintln!("Cypher query error: {}", msg);
        }
        Err(Error::Sqlite(e)) => {
            eprintln!("SQLite error: {}", e);
        }
        Err(e) => {
            eprintln!("Other error: {}", e);
        }
    }

    Ok(())
}
}

Error Variants

The Error enum includes the following variants:

#![allow(unused)]
fn main() {
pub enum Error {
    Sqlite(rusqlite::Error),           // SQLite database errors
    Json(serde_json::Error),           // JSON parsing errors
    Cypher(String),                    // Cypher query errors
    ExtensionNotFound(String),         // Extension file not found
    TypeError { expected: &'static str, actual: String }, // Type conversion errors
    ColumnNotFound(String),            // Column doesn't exist in result
    GraphExists(String),               // Graph already exists (GraphManager)
    GraphNotFound { name: String, available: Vec<String> }, // Graph not found
    Io(std::io::Error),                // File I/O errors
}
}

Complete Example

use graphqlite::Connection;

fn main() -> Result<(), graphqlite::Error> {
    // Open connection
    let conn = Connection::open_in_memory()?;

    // Create nodes
    conn.cypher("CREATE (a:Person {name: 'Alice', age: 30})")?;
    conn.cypher("CREATE (b:Person {name: 'Bob', age: 25})")?;

    // Create relationship
    conn.cypher("
        MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
        CREATE (a)-[:KNOWS {since: 2020}]->(b)
    ")?;

    // Query with aliases
    let results = conn.cypher("
        MATCH (a:Person)-[:KNOWS]->(b:Person)
        RETURN a.name AS from_person, b.name AS to_person
    ")?;

    for row in &results {
        let from: String = row.get("from_person")?;
        let to: String = row.get("to_person")?;
        println!("{} knows {}", from, to);
    }

    // Query with filter (embedding values directly)
    let min_age = 26;
    let results = conn.cypher(&format!(
        "MATCH (n:Person) WHERE n.age >= {} RETURN n.name AS name",
        min_age
    ))?;

    for row in &results {
        let name: String = row.get("name")?;
        println!("Adult: {}", name);
    }

    Ok(())
}

Graph Class

High-level API for graph operations, providing ergonomic methods for nodes, edges, and algorithms.

Creating a Graph

#![allow(unused)]
fn main() {
use graphqlite::Graph;

// In-memory graph
let g = Graph::open_in_memory()?;

// File-based graph
let g = Graph::open("graph.db")?;

// With custom extension path
let g = Graph::open_with_extension("graph.db", "/path/to/graphqlite.so")?;

// From existing connection
let g = Graph::from_connection(conn)?;
}

Node Operations

#![allow(unused)]
fn main() {
// Create or update a node
g.upsert_node("alice", [("name", "Alice"), ("age", "30")], "Person")?;

// Check if node exists
if g.has_node("alice")? {
    println!("Alice exists");
}

// Get a node
if let Some(node) = g.get_node("alice")? {
    println!("Found: {:?}", node);
}

// Get all nodes (optionally filtered by label)
let all_nodes = g.get_all_nodes(None)?;
let people = g.get_all_nodes(Some("Person"))?;

// Delete a node (also deletes connected edges)
g.delete_node("alice")?;
}

Edge Operations

#![allow(unused)]
fn main() {
// Create or update an edge
g.upsert_edge("alice", "bob", [("since", "2020")], "KNOWS")?;

// Check if edge exists
if g.has_edge("alice", "bob")? {
    println!("Edge exists");
}

// Get an edge
if let Some(edge) = g.get_edge("alice", "bob")? {
    println!("Edge: {:?}", edge);
}

// Get all edges
let edges = g.get_all_edges()?;

// Delete an edge
g.delete_edge("alice", "bob")?;
}

Query Operations

#![allow(unused)]
fn main() {
// Execute Cypher query
let results = g.query("MATCH (n:Person) RETURN n.name")?;

// Get graph statistics
let stats = g.stats()?;
println!("Nodes: {}, Edges: {}", stats.nodes, stats.edges);

// Get node degree (connection count)
let degree = g.node_degree("alice")?;

// Get neighbors
let neighbors = g.get_neighbors("alice")?;
}

Batch Operations

#![allow(unused)]
fn main() {
// Batch insert nodes
let nodes = vec![
    ("alice", vec![("name", "Alice")], "Person"),
    ("bob", vec![("name", "Bob")], "Person"),
];
g.upsert_nodes_batch(nodes)?;

// Batch insert edges
let edges = vec![
    ("alice", "bob", vec![("since", "2020")], "KNOWS"),
    ("bob", "carol", vec![("since", "2021")], "KNOWS"),
];
g.upsert_edges_batch(edges)?;
}

Algorithm Methods

Centrality

#![allow(unused)]
fn main() {
// PageRank
let results = g.pagerank(0.85, 20)?;  // damping, iterations
for r in results {
    println!("{}: {}", r.user_id.unwrap_or_default(), r.score);
}

// Degree centrality
let results = g.degree_centrality()?;
for r in results {
    println!("{}: in={}, out={}, total={}",
        r.user_id.unwrap_or_default(), r.in_degree, r.out_degree, r.degree);
}

// Betweenness centrality
let results = g.betweenness_centrality()?;

// Closeness centrality
let results = g.closeness_centrality()?;

// Eigenvector centrality
let results = g.eigenvector_centrality(100)?;  // iterations
}

Community Detection

#![allow(unused)]
fn main() {
// Label propagation
let results = g.community_detection(10)?;  // iterations
for r in results {
    println!("{} is in community {}", r.user_id.unwrap_or_default(), r.community);
}

// Louvain algorithm
let results = g.louvain(1.0)?;  // resolution
}

Connected Components

#![allow(unused)]
fn main() {
// Weakly connected components
let results = g.wcc()?;

// Strongly connected components
let results = g.scc()?;
}

Path Finding

#![allow(unused)]
fn main() {
// Shortest path (Dijkstra)
let result = g.shortest_path("alice", "bob", None)?;  // optional weight property
if result.found {
    println!("Path: {:?}, Distance: {:?}", result.path, result.distance);
}

// A* search (with optional lat/lon heuristic)
let result = g.astar("alice", "bob", None, None)?;
println!("Explored {} nodes", result.nodes_explored);

// All-pairs shortest paths
let results = g.apsp()?;
for r in results {
    println!("{} -> {}: {}", r.source, r.target, r.distance);
}
}

Traversal

#![allow(unused)]
fn main() {
// Breadth-first search
let results = g.bfs("alice", Some(3))?;  // optional max depth
for r in results {
    println!("{} at depth {} (order {})", r.user_id, r.depth, r.order);
}

// Depth-first search
let results = g.dfs("alice", None)?;  // None = unlimited depth
}

Similarity

#![allow(unused)]
fn main() {
// Node similarity (Jaccard)
let results = g.node_similarity(None, None, 0.5, 10)?;  // node1, node2, threshold, top_k
for r in results {
    println!("{} <-> {}: {}", r.node1, r.node2, r.similarity);
}

// K-nearest neighbors
let results = g.knn("alice", 5)?;
for r in results {
    println!("#{}: {} (similarity: {})", r.rank, r.neighbor, r.similarity);
}

// Triangle count
let results = g.triangle_count()?;
for r in results {
    println!("{}: {} triangles, clustering={}",
        r.user_id.unwrap_or_default(), r.triangles, r.clustering_coefficient);
}
}

Algorithm Result Types

All algorithm methods return strongly-typed result structs:

#![allow(unused)]
fn main() {
// PageRank, Betweenness, Closeness, Eigenvector
pub struct PageRankResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub score: f64,
}

// Degree Centrality
pub struct DegreeCentralityResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub in_degree: i64,
    pub out_degree: i64,
    pub degree: i64,
}

// Community Detection, Louvain
pub struct CommunityResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub community: i64,
}

// WCC, SCC
pub struct ComponentResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub component: i64,
}

// Shortest Path
pub struct ShortestPathResult {
    pub path: Vec<String>,
    pub distance: Option<f64>,
    pub found: bool,
}

// A* Search
pub struct AStarResult {
    pub path: Vec<String>,
    pub distance: Option<f64>,
    pub found: bool,
    pub nodes_explored: i64,
}

// All-Pairs Shortest Path
pub struct ApspResult {
    pub source: String,
    pub target: String,
    pub distance: f64,
}

// BFS, DFS
pub struct TraversalResult {
    pub user_id: String,
    pub depth: i64,
    pub order: i64,
}

// Node Similarity
pub struct NodeSimilarityResult {
    pub node1: String,
    pub node2: String,
    pub similarity: f64,
}

// KNN
pub struct KnnResult {
    pub neighbor: String,
    pub similarity: f64,
    pub rank: i64,
}

// Triangle Count
pub struct TriangleCountResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub triangles: i64,
    pub clustering_coefficient: f64,
}
}

GraphManager

Manages multiple graph databases in a directory with cross-graph query support.

Creating a GraphManager

#![allow(unused)]
fn main() {
use graphqlite::{graphs, GraphManager};

// Using factory function (recommended)
let mut gm = graphs("./data")?;

// Or direct instantiation
let mut gm = GraphManager::open("./data")?;

// With custom extension path
let mut gm = GraphManager::open_with_extension("./data", "/path/to/graphqlite.so")?;
}

Graph Management

#![allow(unused)]
fn main() {
// Create a new graph
let social = gm.create("social")?;

// Open an existing graph
let social = gm.open_graph("social")?;

// Open or create
let cache = gm.open_or_create("cache")?;

// List all graphs
for name in gm.list()? {
    println!("Graph: {}", name);
}

// Check if graph exists
if gm.exists("social") {
    println!("Social graph exists");
}

// Delete a graph
gm.drop("old_graph")?;
}

Cross-Graph Queries

#![allow(unused)]
fn main() {
// Query across multiple graphs using FROM clause
let result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source",
    &["social"]
)?;

for row in &result {
    let name: String = row.get("n.name")?;
    let source: String = row.get("source")?;
    println!("{} from {}", name, source);
}
}

Raw SQL Cross-Graph Queries

#![allow(unused)]
fn main() {
let results = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    &["social"]
)?;
}

Complete Multi-Graph Example

use graphqlite::graphs;

fn main() -> graphqlite::Result<()> {
    let mut gm = graphs("./data")?;

    // Create and populate graphs
    {
        let social = gm.create("social")?;
        social.query("CREATE (n:Person {name: 'Alice', user_id: 'u1'})")?;
        social.query("CREATE (n:Person {name: 'Bob', user_id: 'u2'})")?;
    }

    {
        let products = gm.create("products")?;
        products.query("CREATE (n:Product {name: 'Phone', sku: 'p1'})")?;
    }

    // List graphs
    println!("Graphs: {:?}", gm.list()?);  // ["products", "social"]

    // Cross-graph query
    let result = gm.query(
        "MATCH (n:Person) FROM social RETURN n.name ORDER BY n.name",
        &["social"]
    )?;

    for row in &result {
        println!("Person: {}", row.get::<String>("n.name")?);
    }

    // Clean up
    gm.drop("products")?;
    gm.drop("social")?;

    Ok(())
}

Error Handling

#![allow(unused)]
fn main() {
use graphqlite::{graphs, Error};

let mut gm = graphs("./data")?;

match gm.open_graph("nonexistent") {
    Ok(g) => { /* use graph */ }
    Err(Error::GraphNotFound { name, available }) => {
        println!("Graph '{}' not found. Available: {:?}", name, available);
    }
    Err(e) => { /* handle other errors */ }
}

match gm.create("existing") {
    Ok(g) => { /* use graph */ }
    Err(Error::GraphExists(name)) => {
        println!("Graph '{}' already exists", name);
    }
    Err(e) => { /* handle other errors */ }
}
}

Extension Loading

For advanced use cases, wrap an existing rusqlite connection:

#![allow(unused)]
fn main() {
use rusqlite::Connection as SqliteConnection;
use graphqlite::Connection;

let sqlite_conn = SqliteConnection::open_in_memory()?;
let conn = Connection::from_rusqlite(sqlite_conn)?;
}

Or specify a custom extension path:

#![allow(unused)]
fn main() {
let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?;
}

SQL Interface

GraphQLite works as a standard SQLite extension, providing the cypher() function.

Loading the Extension

SQLite CLI

sqlite3 graph.db
.load /path/to/graphqlite

Or with automatic extension loading:

sqlite3 -cmd ".load /path/to/graphqlite" graph.db

Programmatically

SELECT load_extension('/path/to/graphqlite');

The cypher() Function

Basic Usage

SELECT cypher('MATCH (n) RETURN n.name');

With Parameters

SELECT cypher(
    'MATCH (n:Person {name: $name}) RETURN n',
    '{"name": "Alice"}'
);

Return Format

The cypher() function returns results as JSON:

SELECT cypher('MATCH (n:Person) RETURN n.name, n.age');
-- Returns: [{"n.name": "Alice", "n.age": 30}, {"n.name": "Bob", "n.age": 25}]

Working with Results

Extract Values with JSON Functions

SELECT json_extract(value, '$.n.name') AS name
FROM json_each(cypher('MATCH (n:Person) RETURN n'));

Algorithm Results

SELECT
    json_extract(value, '$.node_id') AS id,
    json_extract(value, '$.score') AS score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Join with Regular Tables

-- Assuming you have a regular 'users' table
SELECT u.email, json_extract(g.value, '$.degree')
FROM users u
JOIN json_each(cypher('RETURN degreeCentrality()')) g
    ON u.id = json_extract(g.value, '$.user_id');

Write Operations

-- Create nodes
SELECT cypher('CREATE (n:Person {name: "Alice", age: 30})');

-- Create relationships
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');

-- Update properties
SELECT cypher('
    MATCH (n:Person {name: "Alice"})
    SET n.age = 31
');

-- Delete
SELECT cypher('
    MATCH (n:Person {name: "Alice"})
    DETACH DELETE n
');

Schema Tables

GraphQLite creates these tables automatically. See Storage Model for detailed documentation.

Core Tables

SELECT * FROM nodes;
-- id (auto-increment primary key)

SELECT * FROM node_labels;
-- node_id, label

SELECT * FROM edges;
-- id, source_id, target_id, type

SELECT * FROM property_keys;
-- id, key (normalized property names)

Property Tables

Properties use key_id as a foreign key to property_keys for normalization:

SELECT * FROM node_props_text;   -- node_id, key_id, value
SELECT * FROM node_props_int;    -- node_id, key_id, value
SELECT * FROM node_props_real;   -- node_id, key_id, value
SELECT * FROM node_props_bool;   -- node_id, key_id, value

SELECT * FROM edge_props_text;   -- edge_id, key_id, value
SELECT * FROM edge_props_int;    -- edge_id, key_id, value
SELECT * FROM edge_props_real;   -- edge_id, key_id, value
SELECT * FROM edge_props_bool;   -- edge_id, key_id, value

Direct SQL Access

You can query the underlying tables directly for debugging or advanced use cases:

-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;

-- Find nodes with a specific property (join through property_keys)
SELECT n.id, pk.key, p.value
FROM nodes n
JOIN node_props_text p ON n.id = p.node_id
JOIN property_keys pk ON p.key_id = pk.id
WHERE pk.key = 'name';

-- Find all properties for a specific node
SELECT pk.key, p.value
FROM node_props_text p
JOIN property_keys pk ON p.key_id = pk.id
WHERE p.node_id = 1;

-- Find edges with their endpoint info
SELECT e.id, e.type, e.source_id, e.target_id
FROM edges e
WHERE e.type = 'KNOWS';

Transaction Support

GraphQLite respects SQLite transactions:

BEGIN;
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:KNOWS]->(b)');
COMMIT;

Or rollback on error:

BEGIN;
SELECT cypher('CREATE (n:Person {name: "Test"})');
ROLLBACK;  -- Node is not created

Architecture

This document explains how GraphQLite is structured and how queries flow through the system.

High-Level Overview

┌─────────────────────────────────────────────────────────────┐
│                     SQLite Extension                        │
├─────────────────────────────────────────────────────────────┤
│  cypher() function                                          │
│      │                                                      │
│      ▼                                                      │
│  ┌─────────┐    ┌───────────┐    ┌──────────┐              │
│  │ Parser  │───▶│ Transform │───▶│ Executor │              │
│  └─────────┘    └───────────┘    └──────────┘              │
│      │              │                 │                     │
│      ▼              ▼                 ▼                     │
│  Cypher AST     SQL Query        Results                    │
└─────────────────────────────────────────────────────────────┘

Components

Parser

The parser converts Cypher query text into an Abstract Syntax Tree (AST).

Implementation: Flex (lexer) + Bison (parser)

  • src/backend/parser/cypher_scanner.l - Tokenizer
  • src/backend/parser/cypher_gram.y - Grammar
  • src/backend/parser/cypher_ast.c - AST construction

Transformer

The transformer converts the Cypher AST into SQL that can be executed against the graph schema.

Key files:

  • src/backend/transform/cypher_transform.c - Main entry point
  • src/backend/transform/transform_match.c - MATCH clause handling
  • src/backend/transform/transform_return.c - RETURN clause handling
  • src/backend/transform/sql_builder.c - SQL construction utilities

Executor

The executor runs the generated SQL and handles special cases like graph algorithms.

Key files:

  • src/backend/executor/cypher_executor.c - Main entry point
  • src/backend/executor/query_dispatch.c - Pattern-based routing
  • src/backend/executor/graph_algorithms.c - Algorithm implementations

Query Flow

1. Entry Point

The cypher() SQL function receives the query:

// In extension.c
static void graphqlite_cypher_func(sqlite3_context *context, int argc, sqlite3_value **argv) {
    const char *query = (const char *)sqlite3_value_text(argv[0]);
    // ...
}

2. Parsing

The query is tokenized and parsed:

cypher_parse_result *parse_result = parse_cypher_query_ext(query);
ast_node *ast = parse_result->root;

3. Pattern Dispatch

Instead of a giant if-else chain, queries are matched against patterns:

clause_flags flags = analyze_query_clauses(ast);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, ast, result, flags);

4. Transformation

The AST is converted to SQL using the unified SQL builder:

cypher_transform_context *ctx = create_transform_context(db);
transform_query(ctx, ast);
char *sql = sql_builder_to_string(ctx->unified_builder);

5. Execution

The SQL is executed against SQLite:

sqlite3_stmt *stmt;
sqlite3_prepare_v2(db, sql, -1, &stmt, NULL);
while (sqlite3_step(stmt) == SQLITE_ROW) {
    // Process results
}

Design Decisions

Why SQLite?

  • Zero configuration - single file, no server
  • Ubiquitous - available everywhere
  • Well-tested - decades of production use
  • Extensible - clean extension API

Why Transform to SQL?

Rather than implementing our own storage engine, we transform Cypher to SQL:

  • Leverage SQLite's query optimizer
  • Benefit from SQLite's transaction handling
  • Interop with regular SQL tables
  • Simpler implementation

Why Pattern Dispatch?

Replacing if-else chains with table-driven dispatch:

  • Easier to add new query patterns
  • Clear priority ordering
  • Better testability
  • Reduced cyclomatic complexity

Extension Loading

When the extension loads:

  1. Register the cypher() function
  2. Create schema tables if they don't exist
  3. Create indexes for efficient lookups
int sqlite3_graphqlite_init(
    sqlite3 *db,
    char **pzErrMsg,
    const sqlite3_api_routines *pApi
) {
    SQLITE_EXTENSION_INIT2(pApi);
    create_graph_schema(db);
    sqlite3_create_function(db, "cypher", -1, SQLITE_UTF8, 0,
                           graphqlite_cypher_func, 0, 0);
    return SQLITE_OK;
}

Storage Model

GraphQLite uses a typed property graph model stored in regular SQLite tables. The schema is designed for query efficiency using an Entity-Attribute-Value (EAV) pattern with property key normalization.

Schema Overview

┌─────────────────────────────────────┐
│              nodes                   │
│  id (PK, auto-increment)            │
├─────────────────────────────────────┤
│  1                                  │
│  2                                  │
│  3                                  │
└─────────────────────────────────────┘
           │
           │ 1:N
           ▼
┌─────────────────────────────────────┐
│           node_labels                │
│  node_id (FK) │ label               │
├───────────────┼─────────────────────┤
│  1            │ "Person"            │
│  2            │ "Person"            │
│  3            │ "Company"           │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│           property_keys              │
│  id (PK) │ key (UNIQUE)             │
├──────────┼──────────────────────────┤
│  1       │ "name"                   │
│  2       │ "age"                    │
│  3       │ "id"                     │
└─────────────────────────────────────┘
           │
           │ 1:N (via key_id)
           ▼
┌───────────────────────────────────────────┐
│            node_props_text                 │
│  node_id (FK) │ key_id (FK) │ value       │
├───────────────┼─────────────┼─────────────┤
│  1            │ 3           │ "alice"     │
│  1            │ 1           │ "Alice"     │
│  2            │ 3           │ "bob"       │
│  2            │ 1           │ "Bob"       │
└───────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│                         edges                            │
│  id (PK) │ source_id (FK) │ target_id (FK) │ type       │
├──────────┼────────────────┼────────────────┼────────────┤
│  1       │ 1              │ 2              │ "KNOWS"    │
│  2       │ 1              │ 3              │ "WORKS_AT" │
└─────────────────────────────────────────────────────────┘

Core Tables

nodes

The nodes table stores graph nodes with a simple auto-incrementing ID. Node metadata such as labels and properties are stored in separate tables, enabling nodes to have multiple labels and efficient property queries.

ColumnTypeDescription
idINTEGER PRIMARY KEY AUTOINCREMENTInternal node identifier

node_labels

Labels are stored in a separate table allowing nodes to have multiple labels. This normalized design enables efficient label-based filtering through indexed lookups.

ColumnTypeDescription
node_idINTEGER FK → nodes(id)References the node
labelTEXTLabel name (e.g., "Person")

The primary key is the composite (node_id, label), which prevents duplicate labels on the same node.

edges

The edges table stores relationships between nodes with a required relationship type.

ColumnTypeDescription
idINTEGER PRIMARY KEY AUTOINCREMENTInternal edge identifier
source_idINTEGER FK → nodes(id)Source node
target_idINTEGER FK → nodes(id)Target node
typeTEXT NOT NULLRelationship type (e.g., "KNOWS")

Foreign keys use ON DELETE CASCADE so removing a node automatically removes its edges.

property_keys

Property names are normalized into a lookup table to reduce storage overhead and improve query performance. Instead of storing the property name string with every property value, we store a small integer key ID.

ColumnTypeDescription
idINTEGER PRIMARY KEY AUTOINCREMENTProperty key identifier
keyTEXT UNIQUEProperty name (e.g., "name", "age")

Property Tables

Properties are stored in separate tables by type. This approach enables type-safe queries, efficient indexing by value, and proper numeric comparisons without type conversion overhead.

Node property tables:

  • node_props_text — String values
  • node_props_int — Integer values
  • node_props_real — Floating-point values
  • node_props_bool — Boolean values (stored as 0 or 1)

Edge property tables:

  • edge_props_text
  • edge_props_int
  • edge_props_real
  • edge_props_bool

Each property table has the same structure:

ColumnTypeDescription
node_id / edge_idINTEGER FKReferences the owner entity
key_idINTEGER FK → property_keys(id)References the property name
value(varies by table)The property value

The primary key is the composite (node_id, key_id) or (edge_id, key_id), ensuring each entity has at most one value per property.

Indexes

GraphQLite creates indexes optimized for common graph query patterns:

-- Edge traversal (covers both directions and type filtering)
CREATE INDEX idx_edges_source ON edges(source_id, type);
CREATE INDEX idx_edges_target ON edges(target_id, type);
CREATE INDEX idx_edges_type ON edges(type);

-- Label filtering
CREATE INDEX idx_node_labels_label ON node_labels(label, node_id);

-- Property key lookup
CREATE INDEX idx_property_keys_key ON property_keys(key);

-- Property value queries (enables efficient WHERE clauses)
CREATE INDEX idx_node_props_text_key_value ON node_props_text(key_id, value, node_id);
CREATE INDEX idx_node_props_int_key_value ON node_props_int(key_id, value, node_id);
-- ... similar for other property tables

The property indexes are designed "key-first" to efficiently satisfy queries like WHERE n.name = 'Alice', which translate to lookups by key_id and value.

Why This Design?

Typed property tables provide several advantages over storing all properties as JSON or a single TEXT column. Integer comparisons are performed natively rather than through string parsing. Type-specific indexes enable efficient range queries. Storage is more compact since values don't require type metadata.

Property key normalization through the property_keys table reduces storage by replacing repeated property name strings with integer IDs. This also enables efficient property-first queries and simplifies schema introspection.

Separate label table allows nodes to have multiple labels, which is a common requirement in graph modeling. The label index supports efficient label-based filtering without scanning all nodes.

Query Translation

When you write:

MATCH (p:Person {name: 'Alice'})
WHERE p.age > 25
RETURN p.name, p.age

GraphQLite translates this to SQL that joins the appropriate tables:

SELECT
    name_prop.value AS "p.name",
    age_prop.value AS "p.age"
FROM nodes p
JOIN node_labels p_label ON p.id = p_label.node_id AND p_label.label = 'Person'
LEFT JOIN node_props_text name_prop
    ON p.id = name_prop.node_id
    AND name_prop.key_id = (SELECT id FROM property_keys WHERE key = 'name')
LEFT JOIN node_props_int age_prop
    ON p.id = age_prop.node_id
    AND age_prop.key_id = (SELECT id FROM property_keys WHERE key = 'age')
WHERE name_prop.value = 'Alice'
    AND age_prop.value > 25

In practice, the query optimizer uses cached prepared statements for property key lookups, making this translation efficient.

Direct SQL Access

You can query the underlying tables directly for advanced use cases:

-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;

-- Find all properties of a specific node
SELECT pk.key, 'text' as type, pt.value
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
WHERE pt.node_id = 1
UNION ALL
SELECT pk.key, 'int' as type, CAST(pi.value AS TEXT)
FROM node_props_int pi
JOIN property_keys pk ON pi.key_id = pk.id
WHERE pi.node_id = 1;

-- Find nodes with a specific property value
SELECT nl.node_id, nl.label, pt.value as name
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
JOIN node_labels nl ON pt.node_id = nl.node_id
WHERE pk.key = 'name' AND pt.value = 'Alice';

Query Pattern Dispatch System

GraphQLite uses a table-driven pattern dispatch system to execute Cypher queries. This document describes how the system works and how to extend it.

Overview

Instead of a massive if-else chain checking clause combinations, queries are matched against a registry of patterns. Each pattern defines:

  • Required clauses: Must all be present
  • Forbidden clauses: Must all be absent
  • Priority: Higher priority patterns are checked first
  • Handler: Function to execute the query

Supported Query Patterns

PatternRequiredForbiddenPriorityDescription
UNWIND+CREATEUNWIND, CREATERETURN, MATCH100Batch node/edge creation
WITH+MATCH+RETURNWITH, MATCH, RETURN-100Subquery pipeline
MATCH+CREATE+RETURNMATCH, CREATE, RETURN-100Match then create with results
MATCH+SETMATCH, SET-90Update matched nodes
MATCH+DELETEMATCH, DELETE-90Delete matched nodes
MATCH+REMOVEMATCH, REMOVE-90Remove properties/labels
MATCH+MERGEMATCH, MERGE-90Conditional create/match
MATCH+CREATEMATCH, CREATERETURN90Match then create
OPTIONAL_MATCH+RETURNMATCH, OPTIONAL, RETURNCREATE, SET, DELETE, MERGE80Left join pattern
MULTI_MATCH+RETURNMATCH, MULTI_MATCH, RETURNCREATE, SET, DELETE, MERGE80Multiple match clauses
MATCH+RETURNMATCH, RETURNOPTIONAL, MULTI_MATCH, CREATE, SET, DELETE, MERGE70Simple query
UNWIND+RETURNUNWIND, RETURNCREATE60List processing
CREATECREATEMATCH, UNWIND50Create nodes/edges
MERGEMERGEMATCH50Merge nodes/edges
SETSETMATCH50Standalone set
FOREACHFOREACH-50Iterate and update
MATCHMATCHRETURN, CREATE, SET, DELETE, MERGE, REMOVE40Match without return
RETURNRETURNMATCH, UNWIND, WITH10Expressions, graph algorithms
GENERIC--0Fallback for any query

How Pattern Matching Works

  1. Analyze: Extract clause flags from query AST
  2. Match: Find highest-priority pattern where:
    • All required flags are present
    • No forbidden flags are present
  3. Execute: Call the pattern's handler function
clause_flags flags = analyze_query_clauses(query);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, query, result, flags);

Debugging

Debug Logging

With GRAPHQLITE_DEBUG defined, pattern matching logs:

[CYPHER_DEBUG] Query clauses: MATCH|RETURN
[CYPHER_DEBUG] Matched pattern: MATCH+RETURN (priority 70)

EXPLAIN Command

Use EXPLAIN to see pattern info without executing:

SELECT cypher('EXPLAIN MATCH (n:Person) RETURN n.name');

Output:

Pattern: MATCH+RETURN
Clauses: MATCH|RETURN
SQL: SELECT ... FROM nodes ...

Adding New Patterns

Step 1: Define the Pattern

Add an entry to the patterns[] array in query_dispatch.c:

{
    .name = "MY_PATTERN",
    .required = CLAUSE_MATCH | CLAUSE_CUSTOM,
    .forbidden = CLAUSE_DELETE,
    .handler = handle_my_pattern,
    .priority = 85
}

Step 2: Implement the Handler

static int handle_my_pattern(cypher_executor *executor,
                             cypher_query *query,
                             cypher_result *result,
                             clause_flags flags)
{
    (void)flags;
    CYPHER_DEBUG("Executing MY_PATTERN via pattern dispatch");

    // Implementation here

    result->success = true;
    return 0;
}

Step 3: Add Tests

Add tests to test_query_dispatch.c:

static void test_pattern_my_pattern(void)
{
    const query_pattern *p = find_matching_pattern(CLAUSE_MATCH | CLAUSE_CUSTOM);
    CU_ASSERT_PTR_NOT_NULL(p);
    if (p) {
        CU_ASSERT_STRING_EQUAL(p->name, "MY_PATTERN");
        CU_ASSERT_EQUAL(p->priority, 85);
    }
}

Priority Guidelines

PriorityUse Case
100Most specific multi-clause combinations
90MATCH + write operation patterns
80Complex read patterns (OPTIONAL, multi-MATCH)
70Simple read patterns
50-60Standalone clauses with modifiers
40-50Standalone write clauses
10Expressions and algorithms
0Generic fallback

Files

  • src/include/executor/query_patterns.h - Types and API
  • src/backend/executor/query_dispatch.c - Pattern registry and handlers
  • tests/test_query_dispatch.c - Unit tests

Graph Algorithm Handling

Graph algorithms (PageRank, Dijkstra, etc.) are detected within the RETURN pattern handler. When a RETURN-only query contains a graph algorithm function call, it's executed via the C-based algorithm implementations for performance.

Performance

This document covers GraphQLite's performance characteristics and optimization strategies.

Benchmarks

Benchmarks on Apple M1 Max (10 cores, 64GB RAM).

Insertion Performance

NodesEdgesTimeRate
100K500K445ms1.3M/s
500K2.5M2.30s1.3M/s
1M5.0M5.16s1.1M/s

Traversal by Topology

TopologyNodesEdges1-hop2-hop
Chain100K99K<1ms<1ms
Sparse100K500K<1ms<1ms
Moderate100K2.0M<1ms2ms
Dense100K5.0M<1ms9ms
Normal dist.100K957K<1ms1ms
Power-law100K242K<1ms<1ms
Moderate500K10.0M1ms2ms
Moderate1M20.0M<1ms2ms

Graph Algorithms

AlgorithmNodesEdgesTime
PageRank100K500K148ms
Label Propagation100K500K154ms
PageRank500K2.5M953ms
Label Propagation500K2.5M811ms
PageRank1M5.0M37.81s
Label Propagation1M5.0M40.21s

Cypher Query Performance

Query TypeG(100K, 500K)G(500K, 2.5M)G(1M, 5M)
Node lookup<1ms1ms<1ms
1-hop<1ms<1ms<1ms
2-hop<1ms<1ms<1ms
3-hop1ms1ms1ms
Filter scan341ms1.98s3.79s
MATCH all360ms2.05s3.98s

Optimization Strategies

Use Indexes Effectively

GraphQLite creates indexes on:

  • nodes(user_id) - Fast node lookup by ID
  • nodes(label) - Fast filtering by label
  • edges(source_id), edges(target_id) - Fast traversal
  • Property tables on (node_id, key) - Fast property access

Queries that leverage these indexes are fast.

Limit Variable-Length Paths

Variable-length paths can be expensive:

-- Expensive: unlimited depth
MATCH (a)-[*]->(b) RETURN b

-- Better: limit depth
MATCH (a)-[*1..3]->(b) RETURN b

Use Specific Labels

Labels help filter early:

-- Slower: scan all nodes
MATCH (n) WHERE n.type = 'Person' RETURN n

-- Faster: use label
MATCH (n:Person) RETURN n

Batch Operations

For bulk inserts, use batch methods:

# Slow: individual inserts
for person in people:
    g.upsert_node(person["id"], person, label="Person")

# Fast: batch insert
nodes = [(p["id"], p, "Person") for p in people]
g.upsert_nodes_batch(nodes)

Algorithm Caching

Graph algorithms scan the entire graph. If your graph doesn't change frequently, cache results:

import functools

@functools.lru_cache(maxsize=1)
def get_pagerank():
    return g.pagerank()

Memory Usage

GraphQLite uses SQLite's memory management. Key factors:

  • Page cache: SQLite caches database pages in memory
  • Algorithm scratch space: Algorithms allocate temporary structures
  • Result buffers: Query results are buffered before returning

For large graphs, consider:

# Increase SQLite page cache (default: 2MB)
conn.execute("PRAGMA cache_size = -64000")  # 64MB

Running Benchmarks

Run benchmarks on your hardware:

make performance

This runs:

  • Insertion benchmarks
  • Traversal benchmarks across topologies
  • Algorithm benchmarks
  • Query benchmarks