GraphQLite

GraphQLite is an SQLite extension that adds graph database capabilities using the Cypher query language.

Store and query graph data directly in SQLite—combining the simplicity of a single-file, zero-config embedded database with Cypher's expressive power for modeling relationships. Perfect for applications that need graph queries without a separate database server, or for local development and learning without standing up additional infrastructure.

Key Features

Cypher query language - Use the industry-standard graph query language
Zero configuration - Works with any SQLite database
Embedded - No separate server process required
15+ graph algorithms - PageRank, shortest paths, community detection, and more
Multiple bindings - Python, Rust, and raw SQL interfaces

Quick Example

from graphqlite import Graph

g = Graph(":memory:")
g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")

results = g.query("MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name")

How This Documentation is Organized

This documentation follows the Diátaxis framework:

Tutorials - Step-by-step lessons to get you started
How-to Guides - Practical guides for specific tasks
Reference - Technical descriptions of Cypher support, APIs, and algorithms
Explanation - Background and design decisions

Getting Started

This tutorial walks you through installing GraphQLite and running your first graph queries.

What You'll Learn

Install GraphQLite for Python
Create nodes and relationships
Query the graph with Cypher
Use the high-level Graph API

Prerequisites

Python 3.8 or later
pip package manager

Step 1: Install GraphQLite

pip install graphqlite

Step 2: Create Your First Graph

Open a Python shell and create an in-memory graph:

from graphqlite import Graph

# Create an in-memory graph database
g = Graph(":memory:")

Step 3: Add Nodes

Add some people to your graph:

g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")
g.upsert_node("bob", {"name": "Bob", "age": 25}, label="Person")
g.upsert_node("carol", {"name": "Carol", "age": 35}, label="Person")

print(g.stats())  # {'nodes': 3, 'edges': 0}

Each node has:

A unique ID ("alice", "bob", "carol")
Properties (key-value pairs like name and age)
A label (Person)

Step 4: Create Relationships

Connect the nodes with relationships:

g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")
g.upsert_edge("alice", "carol", {"since": 2018}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2021}, rel_type="KNOWS")

print(g.stats())  # {'nodes': 3, 'edges': 3}

Step 5: Query with Cypher

Find all people that Alice knows:

results = g.query("""
    MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)
    RETURN friend.name AS name, friend.age AS age
""")

for row in results:
    print(f"{row['name']} is {row['age']} years old")

Output:

Bob is 25 years old
Carol is 35 years old

Step 6: Explore the Graph

Use built-in methods to explore:

# Get Alice's neighbors
neighbors = g.get_neighbors("alice")
print([n["id"] for n in neighbors])  # ['bob', 'carol']

# Check if an edge exists
print(g.has_edge("alice", "bob"))  # True
print(g.has_edge("bob", "alice"))  # False (directed edge)

# Get node degree (total connections)
print(g.node_degree("alice"))  # 2

Next Steps

Building a Knowledge Graph - A more complex tutorial
Graph Analytics - Use graph algorithms
Cypher Reference - Full language reference

Getting Started with SQL

This tutorial shows how to use GraphQLite directly from the SQLite command line.

Prerequisites

SQLite3 CLI installed
GraphQLite extension built (make extension)

Step 1: Load the Extension

sqlite3 my_graph.db

.load build/graphqlite.dylib
-- On Linux: .load build/graphqlite.so
-- On Windows: .load build/graphqlite.dll

Step 2: Create Nodes

-- Create people
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');

Step 3: Create Relationships

-- Alice knows Bob
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');

-- Bob knows Charlie
SELECT cypher('
    MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c)
');

Step 4: Query the Graph

-- Find all people
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');

-- Find relationships
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');

-- Friends of friends
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name
');

Step 5: Using Parameters

-- Safer queries with parameters
SELECT cypher(
    'MATCH (p:Person {name: $name}) RETURN p.age',
    '{"name": "Alice"}'
);

Complete Example

Save this as getting_started.sql:

.load build/graphqlite.dylib

-- Create nodes
SELECT cypher('CREATE (a:Person {name: "Alice", age: 30})');
SELECT cypher('CREATE (b:Person {name: "Bob", age: 25})');
SELECT cypher('CREATE (c:Person {name: "Charlie", age: 35})');

-- Create relationships
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');
SELECT cypher('
    MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c)
');

-- Query
SELECT 'All people:';
SELECT cypher('MATCH (p:Person) RETURN p.name, p.age');

SELECT '';
SELECT 'Who knows who:';
SELECT cypher('MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name');

SELECT '';
SELECT 'Friends of friends:';
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name
');

Run it:

sqlite3 < getting_started.sql

Next Steps

Query Patterns - More complex pattern matching
Graph Algorithms in SQL - PageRank, communities

Query Patterns in SQL

This tutorial covers common MATCH patterns for traversing graphs using SQL.

Setup

.load build/graphqlite.dylib
.mode column
.headers on

-- Build a social network
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('CREATE (c:Person {name: "Charlie"})');
SELECT cypher('CREATE (d:Person {name: "Diana"})');
SELECT cypher('CREATE (e:Person {name: "Eve"})');

SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:FOLLOWS]->(b)');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (c:Person {name: "Charlie"}) CREATE (a)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"}) CREATE (b)-[:FOLLOWS]->(c)');
SELECT cypher('MATCH (b:Person {name: "Bob"}), (d:Person {name: "Diana"}) CREATE (b)-[:FOLLOWS]->(d)');
SELECT cypher('MATCH (c:Person {name: "Charlie"}), (e:Person {name: "Eve"}) CREATE (c)-[:FOLLOWS]->(e)');
SELECT cypher('MATCH (d:Person {name: "Diana"}), (e:Person {name: "Eve"}) CREATE (d)-[:FOLLOWS]->(e)');

Direct Connections

Outgoing relationships

-- Who does Alice follow?
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->(b)
    RETURN b.name
');

Incoming relationships

-- Who follows Charlie?
SELECT cypher('
    MATCH (a)-[:FOLLOWS]->(b:Person {name: "Charlie"})
    RETURN a.name
');

Multi-Hop Patterns

Two hops

-- Friends of friends (people Alice's follows follow)
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(c)
    RETURN DISTINCT c.name
');

Variable length paths

-- Everyone reachable from Alice in 1-3 hops
SELECT cypher('
    MATCH (a:Person {name: "Alice"})-[:FOLLOWS*1..3]->(b)
    RETURN DISTINCT b.name
');

Aggregation

Count connections

-- Follower counts
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    RETURN b.name, count(a) as followers
    ORDER BY followers DESC
');

Collect into list

-- Group followers by person
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    RETURN b.name, collect(a.name) as followed_by
');

Complex Patterns

Multiple relationship types

-- Find mutual follows
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)-[:FOLLOWS]->(a)
    RETURN a.name, b.name
');

OPTIONAL MATCH

-- All people and their followers (NULL if none)
SELECT cypher('
    MATCH (p:Person)
    OPTIONAL MATCH (follower)-[:FOLLOWS]->(p)
    RETURN p.name, count(follower) as follower_count
');

Filter with WHERE

-- People followed by more than one person
SELECT cypher('
    MATCH (a:Person)-[:FOLLOWS]->(b:Person)
    WITH b, count(a) as followers
    WHERE followers > 1
    RETURN b.name, followers
');

Working with Results in SQL

Extract JSON fields

SELECT
    json_extract(value, '$.a.name') as person,
    json_extract(value, '$.b.name') as follows
FROM json_each(
    cypher('MATCH (a:Person)-[:FOLLOWS]->(b) RETURN a, b')
);

Join with regular tables

-- Assuming you have a users table
WITH graph_data AS (
    SELECT
        json_extract(value, '$.name') as name,
        json_extract(value, '$.followers') as followers
    FROM json_each(
        cypher('MATCH (a)-[:FOLLOWS]->(b) RETURN b.name as name, count(a) as followers')
    )
)
SELECT u.email, g.followers
FROM users u
JOIN graph_data g ON u.username = g.name;

Next Steps

Graph Algorithms in SQL - PageRank and community detection
SQL Interface Reference - Complete SQL reference

Building a Knowledge Graph

This tutorial shows how to build a knowledge graph for storing and querying interconnected information.

What You'll Build

A knowledge graph of companies, people, and their relationships—similar to what you might find in a business intelligence system.

What You'll Learn

Model complex domains with multiple node types
Create various relationship types
Write sophisticated Cypher queries
Use aggregation and path queries

Step 1: Design the Schema

Our knowledge graph will have:

Node Types (Labels):

Company - Organizations
Person - Individuals
Technology - Products and technologies

Relationship Types:

WORKS_AT - Person works at Company
FOUNDED - Person founded Company
USES - Company uses Technology
KNOWS - Person knows Person

Step 2: Create the Graph

from graphqlite import Graph

g = Graph("knowledge.db")  # Persistent database

# Companies
g.upsert_node("acme", {"name": "Acme Corp", "founded": 2010, "industry": "Software"}, label="Company")
g.upsert_node("globex", {"name": "Globex Inc", "founded": 2015, "industry": "AI"}, label="Company")

# People
g.upsert_node("alice", {"name": "Alice Chen", "role": "CEO"}, label="Person")
g.upsert_node("bob", {"name": "Bob Smith", "role": "CTO"}, label="Person")
g.upsert_node("carol", {"name": "Carol Jones", "role": "Engineer"}, label="Person")

# Technologies
g.upsert_node("python", {"name": "Python", "type": "Language"}, label="Technology")
g.upsert_node("graphql", {"name": "GraphQL", "type": "API"}, label="Technology")

Step 3: Add Relationships

# Employment
g.upsert_edge("alice", "acme", {"since": 2010, "title": "CEO"}, rel_type="WORKS_AT")
g.upsert_edge("bob", "acme", {"since": 2012, "title": "CTO"}, rel_type="WORKS_AT")
g.upsert_edge("carol", "globex", {"since": 2020, "title": "Senior Engineer"}, rel_type="WORKS_AT")

# Founding
g.upsert_edge("alice", "acme", {"year": 2010}, rel_type="FOUNDED")

# Technology usage
g.upsert_edge("acme", "python", {"primary": True}, rel_type="USES")
g.upsert_edge("acme", "graphql", {"primary": False}, rel_type="USES")
g.upsert_edge("globex", "python", {"primary": True}, rel_type="USES")

# Personal connections
g.upsert_edge("alice", "bob", {"since": 2010}, rel_type="KNOWS")
g.upsert_edge("bob", "carol", {"since": 2019}, rel_type="KNOWS")

Step 4: Query the Knowledge Graph

Find all employees of a company

results = g.query("""
    MATCH (p:Person)-[r:WORKS_AT]->(c:Company {name: 'Acme Corp'})
    RETURN p.name AS employee, r.title AS title, r.since AS since
    ORDER BY r.since
""")

Find companies using a technology

results = g.query("""
    MATCH (c:Company)-[:USES]->(t:Technology {name: 'Python'})
    RETURN c.name AS company, c.industry AS industry
""")

Find connections between people

results = g.query("""
    MATCH path = (a:Person {name: 'Alice Chen'})-[:KNOWS*1..3]->(b:Person)
    RETURN b.name AS connected_person, length(path) AS distance
""")

Aggregate: Count employees per company

results = g.query("""
    MATCH (p:Person)-[:WORKS_AT]->(c:Company)
    RETURN c.name AS company, count(p) AS employee_count
    ORDER BY employee_count DESC
""")

Find founders who still work at their company

results = g.query("""
    MATCH (p:Person)-[:FOUNDED]->(c:Company),
          (p)-[:WORKS_AT]->(c)
    RETURN p.name AS founder, c.name AS company
""")

Step 5: Update the Graph

Add new information as it becomes available:

# Carol moves to Acme
g.query("""
    MATCH (p:Person {name: 'Carol Jones'})-[r:WORKS_AT]->(:Company)
    DELETE r
""")
g.upsert_edge("carol", "acme", {"since": 2024, "title": "Staff Engineer"}, rel_type="WORKS_AT")

# Add a new technology
g.upsert_node("rust", {"name": "Rust", "type": "Language"}, label="Technology")
g.upsert_edge("globex", "rust", {"primary": False}, rel_type="USES")

Next Steps

Graph Analytics - Run algorithms on your knowledge graph
Graph Algorithms Reference - Available algorithms

Graph Analytics

This tutorial shows how to use GraphQLite's built-in graph algorithms for analysis.

What You'll Learn

Run centrality algorithms to find important nodes
Detect communities in your graph
Find shortest paths between nodes
Use algorithm results in your applications

from graphqlite import Graph

g = Graph(":memory:")

# Create a small social network
people = ["alice", "bob", "carol", "dave", "eve", "frank", "grace", "henry"]
for person in people:
    g.upsert_node(person, {"name": person.title()}, label="Person")

# Create connections (who follows whom)
connections = [
    ("alice", "bob"), ("alice", "carol"), ("alice", "dave"),
    ("bob", "carol"), ("bob", "eve"),
    ("carol", "dave"), ("carol", "eve"), ("carol", "frank"),
    ("dave", "frank"),
    ("eve", "frank"), ("eve", "grace"),
    ("frank", "grace"), ("frank", "henry"),
    ("grace", "henry"),
]
for source, target in connections:
    g.upsert_edge(source, target, {}, rel_type="FOLLOWS")

print(g.stats())  # {'nodes': 8, 'edges': 14}

Centrality: Finding Important Nodes

PageRank

PageRank identifies nodes that are linked to by other important nodes:

results = g.pagerank(damping=0.85, iterations=20)
for r in sorted(results, key=lambda x: x["score"], reverse=True)[:3]:
    print(f"{r['user_id']}: {r['score']:.4f}")

Output:

frank: 0.1842
grace: 0.1536
eve: 0.1298

Frank is the most "important" because many well-connected people follow him.

Degree Centrality

Count incoming and outgoing connections:

results = g.degree_centrality()
for r in results:
    print(f"{r['user_id']}: in={r['in_degree']}, out={r['out_degree']}")

Betweenness Centrality

Find nodes that act as bridges between communities:

results = g.query("RETURN betweennessCentrality()")
# Carol and Eve have high betweenness - they connect different groups

Community Detection

Label Propagation

Find clusters of densely connected nodes:

results = g.community_detection(iterations=10)
communities = {}
for r in results:
    label = r["community"]
    if label not in communities:
        communities[label] = []
    communities[label].append(r["user_id"])

for label, members in communities.items():
    print(f"Community {label}: {members}")

Louvain Algorithm

For larger graphs, Louvain provides hierarchical community detection:

results = g.query("RETURN louvain(1.0)")

Path Finding

Shortest Path

Find the shortest path between two nodes:

path = g.shortest_path("alice", "henry")
print(f"Distance: {path['distance']}")
print(f"Path: {' -> '.join(path['path'])}")

Output:

Distance: 4
Path: alice -> carol -> frank -> henry

All-Pairs Shortest Paths

Compute distances between all node pairs:

results = g.query("RETURN apsp()")

Connected Components

Weakly Connected Components

Find groups of nodes that are connected (ignoring edge direction):

results = g.connected_components()

Strongly Connected Components

Find groups where every node can reach every other node:

results = g.query("RETURN scc()")

Using Results in Your Application

Algorithm results are returned as lists of dictionaries, making them easy to process:

# Find the top influencers
influencers = g.pagerank()
top_3 = sorted(influencers, key=lambda x: x["score"], reverse=True)[:3]

# Get full node data for top influencers
for inf in top_3:
    node = g.get_node(inf["user_id"])
    print(f"{node['properties']['name']}: PageRank {inf['score']:.4f}")

Combining Algorithms with Cypher

Use algorithm results to guide Cypher queries:

# Find the most central node
pagerank = g.pagerank()
most_central = max(pagerank, key=lambda x: x["score"])["user_id"]

# Query their connections
results = g.query(f"""
    MATCH (p:Person {{name: '{most_central.title()}'}})-[:FOLLOWS]->(friend)
    RETURN friend.name AS friend
""")
print(f"Top influencer {most_central} follows: {[r['friend'] for r in results]}")

Next Steps

Graph Algorithms Reference - Complete algorithm documentation
Performance - Algorithm performance characteristics

Graph Algorithms in SQL

This tutorial shows how to run graph algorithms and work with their results in SQL.

Setup: Citation Network

.load build/graphqlite.dylib
.mode column
.headers on

-- Create papers
SELECT cypher('CREATE (p:Paper {title: "Foundations"})');
SELECT cypher('CREATE (p:Paper {title: "Methods"})');
SELECT cypher('CREATE (p:Paper {title: "Applications"})');
SELECT cypher('CREATE (p:Paper {title: "Survey"})');
SELECT cypher('CREATE (p:Paper {title: "Analysis"})');
SELECT cypher('CREATE (p:Paper {title: "Review"})');

-- Create citations (citing paper -> cited paper)
SELECT cypher('MATCH (a:Paper {title: "Methods"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Applications"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Foundations"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Survey"}), (b:Paper {title: "Applications"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Analysis"}), (b:Paper {title: "Methods"}) CREATE (a)-[:CITES]->(b)');
SELECT cypher('MATCH (a:Paper {title: "Review"}), (b:Paper {title: "Survey"}) CREATE (a)-[:CITES]->(b)');

PageRank

Find influential papers based on citation structure.

Basic usage

SELECT cypher('RETURN pageRank(0.85, 20)');

Extract as table

SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.user_id') as paper_id,
    printf('%.4f', json_extract(value, '$.score')) as score
FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
ORDER BY json_extract(value, '$.score') DESC;

Join with node properties

WITH rankings AS (
    SELECT
        json_extract(value, '$.node_id') as node_id,
        json_extract(value, '$.score') as score
    FROM json_each(cypher('RETURN pageRank(0.85, 20)'))
)
SELECT
    n.user_id as paper,
    printf('%.4f', r.score) as influence
FROM rankings r
JOIN nodes n ON n.id = r.node_id
ORDER BY r.score DESC;

Community Detection

Label Propagation

SELECT cypher('RETURN labelPropagation(10)');

Group by community

WITH communities AS (
    SELECT
        json_extract(value, '$.node_id') as node_id,
        json_extract(value, '$.community') as community
    FROM json_each(cypher('RETURN labelPropagation(10)'))
)
SELECT
    c.community,
    group_concat(n.user_id) as papers
FROM communities c
JOIN nodes n ON n.id = c.node_id
GROUP BY c.community;

Louvain

SELECT cypher('RETURN louvain(1.0)');

Centrality Metrics

Degree Centrality

SELECT
    json_extract(value, '$.user_id') as paper,
    json_extract(value, '$.in_degree') as cited_by,
    json_extract(value, '$.out_degree') as cites
FROM json_each(cypher('RETURN degreeCentrality()'))
ORDER BY json_extract(value, '$.in_degree') DESC;

Betweenness Centrality

SELECT
    json_extract(value, '$.user_id') as paper,
    printf('%.4f', json_extract(value, '$.score')) as betweenness
FROM json_each(cypher('RETURN betweennessCentrality()'))
ORDER BY json_extract(value, '$.score') DESC;

Path Finding

Shortest Path

SELECT cypher('RETURN dijkstra("Review", "Foundations")');

Result shows path and distance:

{"distance": 3, "path": ["Review", "Survey", "Foundations"]}

Combining Algorithms with Queries

Find most influential paper's citations

-- Get top paper by PageRank
WITH top_paper AS (
    SELECT json_extract(value, '$.user_id') as paper_id
    FROM json_each(cypher('RETURN pageRank()'))
    ORDER BY json_extract(value, '$.score') DESC
    LIMIT 1
)
-- Find what it cites
SELECT cypher(
    'MATCH (p:Paper {title: "' || paper_id || '"})-[:CITES]->(cited) RETURN cited.title'
)
FROM top_paper;

Export for visualization

-- Export nodes
.mode csv
.output nodes.csv
SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.user_id') as label,
    json_extract(value, '$.score') as pagerank
FROM json_each(cypher('RETURN pageRank()'));

-- Export edges
.output edges.csv
SELECT
    source_id, target_id, label as type
FROM edges;
.output stdout

Performance Tips

Limit output for large graphs:

SELECT * FROM json_each(cypher('RETURN pageRank()')) LIMIT 100;

Create views for repeated queries:

CREATE VIEW paper_influence AS
SELECT
    json_extract(value, '$.node_id') as node_id,
    json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'));

Index algorithm results if needed repeatedly:

CREATE TABLE pagerank_cache AS
SELECT * FROM json_each(cypher('RETURN pageRank()'));
CREATE INDEX idx_pagerank ON pagerank_cache(json_extract(value, '$.score'));

Next Steps

Graph Algorithms Reference - All available algorithms
Performance Guide - Algorithm performance characteristics

Building a GraphRAG System

This tutorial shows how to build a Graph Retrieval-Augmented Generation (GraphRAG) system using GraphQLite.

What is GraphRAG?

GraphRAG combines:

Document chunking - Split documents into processable pieces
Entity extraction - Identify entities and relationships
Graph storage - Store entities as nodes, relationships as edges
Vector search - Find relevant chunks by semantic similarity
Graph traversal - Expand context using graph structure

Prerequisites

pip install graphqlite sentence-transformers sqlite-vec spacy
python -m spacy download en_core_web_sm

Architecture

Query: "Who are the tech leaders?"
         │
         ▼
┌─────────────────────┐
│  1. Vector Search   │  Find chunks similar to query
└─────────┬───────────┘
         │
         ▼
┌─────────────────────┐
│  2. Graph Lookup    │  MATCH (chunk)-[:MENTIONS]->(entity)
└─────────┬───────────┘
         │
         ▼
┌─────────────────────┐
│  3. Graph Traversal │  MATCH (entity)-[*1..2]-(related)
└─────────┬───────────┘
         │
         ▼
    Context for LLM

Step 1: Document Chunking

from dataclasses import dataclass
from typing import List

@dataclass
class Chunk:
    chunk_id: str
    doc_id: str
    text: str
    start_char: int
    end_char: int

def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50, doc_id: str = "doc") -> List[Chunk]:
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    start = 0
    chunk_index = 0

    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk_words = words[start:end]
        chunk_text = " ".join(chunk_words)

        # Calculate character positions
        start_char = len(" ".join(words[:start])) + (1 if start > 0 else 0)
        end_char = start_char + len(chunk_text)

        chunks.append(Chunk(
            chunk_id=f"{doc_id}_chunk_{chunk_index}",
            doc_id=doc_id,
            text=chunk_text,
            start_char=start_char,
            end_char=end_char,
        ))

        start += chunk_size - overlap
        chunk_index += 1

    return chunks

Step 2: Entity Extraction

import spacy

nlp = spacy.load("en_core_web_sm")

def extract_entities(text: str) -> List[dict]:
    """Extract named entities from text."""
    doc = nlp(text)
    entities = []

    for ent in doc.ents:
        entities.append({
            "text": ent.text,
            "label": ent.label_,
            "start": ent.start_char,
            "end": ent.end_char,
        })

    return entities

def extract_relationships(entities: List[dict]) -> List[tuple]:
    """Create co-occurrence relationships between entities."""
    relationships = []

    for i, e1 in enumerate(entities):
        for e2 in entities[i+1:]:
            relationships.append((
                e1["text"],
                e2["text"],
                "CO_OCCURS",
            ))

    return relationships

Step 3: Build the Knowledge Graph

from graphqlite import Graph

g = Graph("knowledge.db")

def ingest_document(doc_id: str, text: str):
    """Process a document and add to knowledge graph."""

    # Chunk the document
    chunks = chunk_text(text, doc_id=doc_id)

    for chunk in chunks:
        # Store chunk as node
        g.upsert_node(
            chunk.chunk_id,
            {"text": chunk.text[:500], "doc_id": doc_id},  # Truncate for storage
            label="Chunk"
        )

        # Extract and store entities
        entities = extract_entities(chunk.text)

        for entity in entities:
            entity_id = entity["text"].lower().replace(" ", "_")

            # Create entity node
            g.upsert_node(
                entity_id,
                {"name": entity["text"], "type": entity["label"]},
                label="Entity"
            )

            # Link chunk to entity
            g.upsert_edge(
                chunk.chunk_id,
                entity_id,
                {},
                rel_type="MENTIONS"
            )

        # Create entity co-occurrence edges
        relationships = extract_relationships(entities)
        for source, target, rel_type in relationships:
            source_id = source.lower().replace(" ", "_")
            target_id = target.lower().replace(" ", "_")
            g.upsert_edge(source_id, target_id, {}, rel_type=rel_type)

Step 4: Add Vector Search

import sqlite3
import sqlite_vec
from sentence_transformers import SentenceTransformer

# Initialize embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

def setup_vector_search(conn: sqlite3.Connection):
    """Set up vector search table."""
    sqlite_vec.load(conn)
    conn.execute("""
        CREATE VIRTUAL TABLE IF NOT EXISTS chunk_embeddings USING vec0(
            chunk_id TEXT PRIMARY KEY,
            embedding FLOAT[384]
        )
    """)

def embed_chunks(conn: sqlite3.Connection, chunks: List[Chunk]):
    """Embed chunks and store vectors."""
    texts = [c.text for c in chunks]
    embeddings = model.encode(texts)

    for chunk, embedding in zip(chunks, embeddings):
        conn.execute(
            "INSERT OR REPLACE INTO chunk_embeddings (chunk_id, embedding) VALUES (?, ?)",
            [chunk.chunk_id, embedding.tobytes()]
        )
    conn.commit()

def vector_search(conn: sqlite3.Connection, query: str, k: int = 5) -> List[str]:
    """Find chunks similar to query."""
    query_embedding = model.encode([query])[0]

    results = conn.execute("""
        SELECT chunk_id
        FROM chunk_embeddings
        WHERE embedding MATCH ?
        LIMIT ?
    """, [query_embedding.tobytes(), k]).fetchall()

    return [r[0] for r in results]

Step 5: GraphRAG Retrieval

def graphrag_retrieve(query: str, k_chunks: int = 5, expand_hops: int = 1) -> dict:
    """
    Retrieve context using GraphRAG:
    1. Vector search for relevant chunks
    2. Find entities mentioned in those chunks
    3. Expand to related entities via graph
    """

    # Get underlying connection for vector search
    conn = g.connection.sqlite_connection

    # Step 1: Vector search
    chunk_ids = vector_search(conn, query, k=k_chunks)

    # Step 2: Get entities from chunks
    entities = set()
    for chunk_id in chunk_ids:
        results = g.query(f"""
            MATCH (c:Chunk {{id: '{chunk_id}'}})-[:MENTIONS]->(e:Entity)
            RETURN e.name
        """)
        for r in results:
            entities.add(r["e.name"])

    # Step 3: Expand via graph
    related_entities = set()
    for entity in entities:
        entity_id = entity.lower().replace(" ", "_")
        results = g.query(f"""
            MATCH (e:Entity {{name: '{entity}'}})-[*1..{expand_hops}]-(related:Entity)
            RETURN DISTINCT related.name
        """)
        for r in results:
            related_entities.add(r["related.name"])

    # Get chunk texts
    chunk_texts = []
    for chunk_id in chunk_ids:
        node = g.get_node(chunk_id)
        if node:
            chunk_texts.append(node["properties"].get("text", ""))

    return {
        "chunks": chunk_texts,
        "entities": list(entities),
        "related_entities": list(related_entities - entities),
    }

Step 6: Complete Pipeline

# Initialize
g = Graph("graphrag.db")
conn = sqlite3.connect("graphrag.db")
setup_vector_search(conn)

# Ingest documents
documents = [
    {"id": "doc1", "text": "Apple Inc. was founded by Steve Jobs..."},
    {"id": "doc2", "text": "Microsoft, led by Satya Nadella..."},
]

for doc in documents:
    ingest_document(doc["id"], doc["text"])
    chunks = chunk_text(doc["text"], doc_id=doc["id"])
    embed_chunks(conn, chunks)

# Query
context = graphrag_retrieve("Who are the tech industry leaders?")
print("Relevant chunks:", len(context["chunks"]))
print("Entities:", context["entities"])
print("Related:", context["related_entities"])

# Use context with an LLM
# response = llm.generate(query, context=context)

Graph Algorithms for GraphRAG

Use graph algorithms to enhance retrieval:

# Find important entities
important = g.pagerank()
top_entities = sorted(important, key=lambda x: x["score"], reverse=True)[:10]

# Find entity communities
communities = g.community_detection()

# Find central entities (good for summarization)
central = g.query("RETURN betweennessCentrality()")

Example Project

See examples/llm-graphrag/ for a complete GraphRAG implementation using the HotpotQA multi-hop reasoning dataset:

Graph-based knowledge storage with Cypher queries
sqlite-vec for vector similarity search
Ollama integration for local LLM inference
Community detection for topic-based retrieval

cd examples/llm-graphrag
uv sync
uv run python ingest.py      # Ingest HotpotQA dataset
uv run python rag.py          # Interactive query mode

Next Steps

Graph Algorithms - All available algorithms
Python API - Complete API reference

Installation

Python (Recommended)

pip install graphqlite

This installs pre-built binaries for:

Linux (x86_64, aarch64)
macOS (arm64, x86_64)
Windows (x86_64)

Rust

Add to your Cargo.toml:

[dependencies]
graphqlite = "0.3"

From Source

Building from source requires:

GCC or Clang
Bison (3.0+)
Flex
SQLite development headers

macOS

brew install bison flex sqlite
export PATH="$(brew --prefix bison)/bin:$PATH"
make extension RELEASE=1

Linux (Debian/Ubuntu)

sudo apt-get install build-essential bison flex libsqlite3-dev
make extension RELEASE=1

Windows (MSYS2)

pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-sqlite3 bison flex make
make extension RELEASE=1

The extension will be built to:

build/graphqlite.dylib (macOS)
build/graphqlite.so (Linux)
build/graphqlite.dll (Windows)

Verifying Installation

Python

import graphqlite
print(graphqlite.__version__)

# Quick test
from graphqlite import Graph
g = Graph(":memory:")
g.upsert_node("test", {"name": "Test"})
print(g.stats())  # {'nodes': 1, 'edges': 0}

SQL

sqlite3
.load /path/to/graphqlite
SELECT cypher('RETURN 1 + 1 AS result');

Troubleshooting

Extension not found

If you get FileNotFoundError: GraphQLite extension not found:

Build the extension: make extension RELEASE=1

Set the path explicitly:

from graphqlite import connect
conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib")

Or set an environment variable:

export GRAPHQLITE_EXTENSION_PATH=/path/to/graphqlite.dylib

macOS: Library not loaded

If you see errors about missing SQLite libraries, ensure you're using Homebrew's Python or set DYLD_LIBRARY_PATH:

export DYLD_LIBRARY_PATH="$(brew --prefix sqlite)/lib:$DYLD_LIBRARY_PATH"

Working with Multiple Graphs

GraphQLite supports managing and querying across multiple graph databases. This is useful for:

Separation of concerns: Keep different data domains in separate graphs
Access control: Different graphs can have different permissions
Performance: Smaller, focused graphs can be faster to query
Cross-domain queries: Query relationships across different datasets

Using GraphManager (Python)

The GraphManager class manages multiple graph databases in a directory:

from graphqlite import graphs

# Create a manager for a directory
with graphs("./data") as gm:
    # Create graphs
    social = gm.create("social")
    products = gm.create("products")

    # Add data to each graph
    social.upsert_node("alice", {"name": "Alice", "age": 30}, "Person")
    social.upsert_node("bob", {"name": "Bob", "age": 25}, "Person")
    social.upsert_edge("alice", "bob", {"since": 2020}, "KNOWS")

    products.upsert_node("phone", {"name": "iPhone", "price": 999}, "Product")
    products.upsert_node("laptop", {"name": "MacBook", "price": 1999}, "Product")

    # List all graphs
    print(gm.list())  # ['products', 'social']

    # Check if a graph exists
    if "social" in gm:
        print("Social graph exists")

Opening Existing Graphs

from graphqlite import graphs

with graphs("./data") as gm:
    # Open an existing graph
    social = gm.open("social")

    # Or create if it doesn't exist
    cache = gm.open_or_create("cache")

    # Query the graph
    result = social.query("MATCH (n:Person) RETURN n.name")
    for row in result:
        print(row["n.name"])

Dropping Graphs

from graphqlite import graphs

with graphs("./data") as gm:
    # Delete a graph and its file
    gm.drop("cache")

Cross-Graph Queries

GraphQLite supports querying across multiple graphs using the FROM clause:

from graphqlite import graphs

with graphs("./data") as gm:
    # Create and populate graphs
    social = gm.create("social")
    social.upsert_node("alice", {"name": "Alice", "user_id": "u1"}, "Person")

    purchases = gm.create("purchases")
    purchases.upsert_node("order1", {"user_id": "u1", "total": 99.99}, "Order")

    # Cross-graph query using FROM clause
    result = gm.query(
        """
        MATCH (p:Person) FROM social
        WHERE p.user_id = 'u1'
        RETURN p.name, graph(p) AS source
        """,
        graphs=["social"]
    )

    for row in result:
        print(f"{row['p.name']} from {row['source']}")

The `graph()` Function

Use the graph() function to identify which graph a node comes from:

result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source_graph",
    graphs=["social"]
)

Raw SQL Cross-Graph Queries

For advanced use cases, you can execute raw SQL across attached graphs:

result = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    graphs=["social"]
)
print(f"Node count: {result[0][0]}")

Using GraphManager (Rust)

The Rust API provides similar functionality:

use graphqlite::{graphs, GraphManager};

fn main() -> graphqlite::Result<()> {
    let mut gm = graphs("./data")?;

    // Create graphs
    gm.create("social")?;
    gm.create("products")?;

    // List graphs
    for name in gm.list()? {
        println!("Graph: {}", name);
    }

    // Open and use a graph
    let social = gm.open_graph("social")?;
    social.query("CREATE (n:Person {name: 'Alice'})")?;

    // Cross-graph query
    let result = gm.query(
        "MATCH (n:Person) FROM social RETURN n.name",
        &["social"]
    )?;

    for row in &result {
        println!("{}", row.get::<String>("n.name")?);
    }

    // Drop a graph
    gm.drop("products")?;

    Ok(())
}

Direct SQL with ATTACH

You can also work with multiple graphs directly using SQLite's ATTACH:

import sqlite3
import graphqlite

# Create separate graph databases
conn1 = sqlite3.connect("social.db")
graphqlite.load(conn1)
conn1.execute("SELECT cypher('CREATE (n:Person {name: \"Alice\"})')")
conn1.close()

conn2 = sqlite3.connect("products.db")
graphqlite.load(conn2)
conn2.execute("SELECT cypher('CREATE (n:Product {name: \"Phone\"})')")
conn2.close()

# Query across both
coordinator = sqlite3.connect(":memory:")
graphqlite.load(coordinator)
coordinator.execute("ATTACH DATABASE 'social.db' AS social")
coordinator.execute("ATTACH DATABASE 'products.db' AS products")

result = coordinator.execute(
    "SELECT cypher('MATCH (n:Person) FROM social RETURN n.name')"
).fetchone()
print(result[0])

Best Practices

Use GraphManager for convenience: It handles extension loading, connection caching, and cleanup automatically.
Commit before cross-graph queries: GraphManager automatically commits open graph connections before cross-graph queries to ensure data visibility.
Keep graphs focused: Design your graphs around specific domains or use cases for better performance and maintainability.
Use meaningful names: Graph names become SQLite database aliases, so use valid SQL identifiers.
Handle errors gracefully: Check for FileNotFoundError when opening graphs that might not exist.

Limitations

Cross-graph queries are read-only for the attached graphs
The FROM clause only applies to MATCH patterns
Graph names must be valid SQL identifiers (alphanumeric, underscores)
Maximum of ~10 attached databases (SQLite limit)

Use the gqlite CLI

The gqlite command-line tool provides an interactive shell for executing Cypher queries against a SQLite database.

Building

angreal build app
# or
make graphqlite

This creates build/gqlite.

Usage

# Interactive mode with default database (graphqlite.db)
./build/gqlite

# Specify a database file
./build/gqlite mydata.db

# Initialize a fresh database
./build/gqlite -i mydata.db

# Verbose mode (shows query execution details)
./build/gqlite -v mydata.db

Interactive Shell

When you start gqlite, you'll see an interactive prompt:

GraphQLite Interactive Shell
Type .help for help, .quit to exit
Queries must end with semicolon (;)

graphqlite>

Statement Termination

All Cypher queries must end with a semicolon (;). Multi-line statements are supported:

graphqlite> CREATE (a:Person {name: "Alice"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
       ...>     CREATE (a)-[:KNOWS]->(b);
Query executed successfully
  Relationships created: 1

The ...> prompt indicates you're continuing a multi-line statement.

Dot Commands

Command	Description
`.help`	Show help information
`.schema`	Display database schema
`.tables`	List all tables
`.stats`	Show database statistics
`.quit`	Exit the shell

Script Execution

You can pipe Cypher scripts to gqlite:

# Execute a script file
./build/gqlite mydata.db < script.cypher

# Inline script
echo 'CREATE (n:Test {value: 42});
MATCH (n:Test) RETURN n.value;' | ./build/gqlite mydata.db

Script Format

Scripts should have one statement per line or use multi-line statements ending with semicolons:

-- setup.cypher
CREATE (a:Person {name: "Alice", age: 30});
CREATE (b:Person {name: "Bob", age: 25});
CREATE (c:Person {name: "Charlie", age: 35});

MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b);

MATCH (b:Person {name: "Bob"}), (c:Person {name: "Charlie"})
    CREATE (b)-[:KNOWS]->(c);

-- Query friend-of-friend
MATCH (a:Person {name: "Alice"})-[:KNOWS]->()-[:KNOWS]->(fof)
    RETURN fof.name;

Examples

$ ./build/gqlite social.db
graphqlite> CREATE (alice:Person {name: "Alice"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> CREATE (bob:Person {name: "Bob"});
Query executed successfully
  Nodes created: 1
  Properties set: 1

graphqlite> MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
       ...>     CREATE (a)-[:FRIENDS_WITH]->(b);
Query executed successfully
  Relationships created: 1

graphqlite> MATCH (p:Person)-[:FRIENDS_WITH]->(friend)
       ...>     RETURN p.name, friend.name;
p.name         friend.name
---------------
Alice          Bob

graphqlite> .quit
Goodbye!

Check Database Statistics

graphqlite> .stats

Database Statistics:
===================
  Nodes           : 2
  Edges           : 1
  Node Labels     : 2
  Property Keys   : 1
  Edge Types      : FRIENDS_WITH

Command Line Options

Option	Description
`-h, --help`	Show help message
`-v, --verbose`	Enable verbose debug output
`-i, --init`	Initialize new database (overwrites existing)

Use Graph Algorithms

GraphQLite includes 15+ built-in graph algorithms. This guide shows how to use them effectively.

Using Algorithms with the Graph API

The Graph class provides direct methods for common algorithms:

from graphqlite import Graph

g = Graph("my_graph.db")

# Centrality
pagerank = g.pagerank(damping=0.85, iterations=20)
degree = g.degree_centrality()

# Community detection
communities = g.community_detection(iterations=10)

# Path finding
path = g.shortest_path("alice", "bob")

# Components
components = g.connected_components()

Using Algorithms with Cypher

For algorithms not exposed directly, use the RETURN clause:

# Betweenness centrality
results = g.query("RETURN betweennessCentrality()")

# Louvain community detection
results = g.query("RETURN louvain(1.0)")

# A* pathfinding
results = g.query("RETURN astar('start', 'end', 'lat', 'lon')")

Working with Results

All algorithms return JSON results that are parsed into Python dictionaries:

pagerank = g.pagerank()

# Results are a list of dicts
for node in pagerank:
    print(f"Node {node['user_id']}: score {node['score']}")

# Sort by score
top_nodes = sorted(pagerank, key=lambda x: x['score'], reverse=True)[:10]

# Filter
high_scores = [n for n in pagerank if n['score'] > 0.1]

Using Results in SQL

When using raw SQL, extract values with json_each and json_extract:

-- Get top 10 PageRank nodes
SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Algorithm Parameters

PageRank

g.pagerank(damping=0.85, iterations=20)

damping: Probability of following a link (default: 0.85)
iterations: Number of iterations (default: 20)

Label Propagation

g.community_detection(iterations=10)

iterations: Maximum iterations before stopping (default: 10)

Shortest Path

g.shortest_path(source_id, target_id)

Returns {"distance": int, "path": [node_ids], "found": bool}. When found is false, distance is None and path is empty.

A* Pathfinding

SELECT cypher('RETURN astar("start", "end", "lat", "lon")');

Requires nodes to have coordinate properties for the heuristic.

Performance Considerations

Small graphs (<10K nodes): All algorithms run instantly
Medium graphs (10K-100K nodes): Most algorithms complete in under a second
Large graphs (>100K nodes): Some algorithms (PageRank, community detection) may take several seconds

For large graphs, consider:

Running algorithms in a background thread
Caching results if the graph doesn't change frequently
Using approximate algorithms for real-time queries

Handle Special Characters

This guide explains how to handle special characters in property values to avoid query issues.

The Problem

Property values containing certain characters can break Cypher query parsing:

# This will cause issues
g.query("CREATE (n:Note {text: 'Line1\nLine2'})")

Characters that need special handling:

Newlines (\n)
Carriage returns (\r)
Tabs (\t)
Single quotes (')
Backslashes (\)

Solution 1: Use Parameterized Queries (Recommended)

The safest approach is to use parameterized queries via the Connection.cypher() method:

g.connection.cypher(
    "CREATE (n:Note {text: $text})",
    {"text": "Line1\nLine2"}
)

Parameters are properly escaped automatically.

Solution 2: Use the Graph API

The high-level Graph API handles escaping for you:

g.upsert_node("note1", {"text": "Line1\nLine2"}, label="Note")

Solution 3: Manual Escaping

If you must build queries manually, escape problematic characters:

def escape_for_cypher(value: str) -> str:
    """Escape a string for use in Cypher property values."""
    return (value
        .replace("\\", "\\\\")   # Backslashes first
        .replace("'", "\\'")      # Single quotes
        .replace("\n", " ")       # Newlines
        .replace("\r", " ")       # Carriage returns
        .replace("\t", " "))      # Tabs

text = escape_for_cypher("Line1\nLine2")
g.query(f"CREATE (n:Note {{text: '{text}'}})")

Common Symptoms

Nodes exist but MATCH returns nothing

Symptom: You insert nodes and can verify they exist with raw SQL (SELECT * FROM nodes), but MATCH (n) RETURN n returns empty results.

Cause: Newlines or other control characters in property values break the query.

Solution: Use parameterized queries or escape the values.

Query syntax errors

Symptom: SyntaxError when creating nodes with text content.

Cause: Unescaped single quotes in the value.

Solution: Escape quotes or use parameters:

# Wrong
g.query("CREATE (n:Quote {text: 'It's a test'})")

# Right - escape the quote
g.query("CREATE (n:Quote {text: 'It\\'s a test'})")

# Better - use parameters
g.connection.cypher("CREATE (n:Quote {text: $text})", {"text": "It's a test"})

Best Practices

Always use parameterized queries for user-provided data
Use the Graph API for simple CRUD operations
Validate input before storing if using raw queries
Consider replacing control characters with spaces or removing them entirely if they're not meaningful

Use with Other Extensions

GraphQLite works alongside other SQLite extensions. This guide shows how to combine them.

Loading Multiple Extensions

Method 1: Use graphqlite.load()

import sqlite3
import graphqlite

conn = sqlite3.connect("combined.db")
graphqlite.load(conn)

# Now load other extensions
conn.enable_load_extension(True)
conn.load_extension("other_extension")
conn.enable_load_extension(False)

Method 2: Access Underlying Connection

import graphqlite
import sqlite_vec  # Example: vector search extension

db = graphqlite.connect("combined.db")
sqlite_vec.load(db.sqlite_connection)  # Access underlying sqlite3.Connection

Example: GraphQLite + sqlite-vec

Combine graph queries with vector similarity search:

import sqlite3
import graphqlite
import sqlite_vec

# Create connection and load both extensions
conn = sqlite3.connect("knowledge.db")
graphqlite.load(conn)
sqlite_vec.load(conn)

# Create graph nodes
conn.execute("SELECT cypher('CREATE (n:Document {id: \"doc1\", title: \"Introduction\"})')")

# Store embeddings in a vector table
conn.execute("""
    CREATE VIRTUAL TABLE IF NOT EXISTS embeddings USING vec0(
        doc_id TEXT PRIMARY KEY,
        embedding FLOAT[384]
    )
""")

# Query: find similar documents, then get their graph neighbors
similar_docs = conn.execute("""
    SELECT doc_id FROM embeddings
    WHERE embedding MATCH ?
    LIMIT 5
""", [query_embedding]).fetchall()

for (doc_id,) in similar_docs:
    # Get related nodes from graph
    related = conn.execute(f"""
        SELECT cypher('
            MATCH (d:Document {{id: "{doc_id}"}})-[:RELATED_TO]->(other)
            RETURN other.title
        ')
    """).fetchall()

In-Memory Database Considerations

In-memory databases are connection-specific. All extensions must share the same connection:

# Correct: single connection, multiple extensions
conn = sqlite3.connect(":memory:")
graphqlite.load(conn)
other_extension.load(conn)
# Both extensions share the same in-memory database

# Wrong: separate connections don't share data
conn1 = sqlite3.connect(":memory:")
conn2 = sqlite3.connect(":memory:")
# conn1 and conn2 are completely separate databases!

Extension Loading Order

Generally, load GraphQLite first, then other extensions. This ensures the graph schema is created before any dependent operations.

conn = sqlite3.connect("db.sqlite")

# 1. Load GraphQLite first
graphqlite.load(conn)

# 2. Load other extensions
conn.enable_load_extension(True)
conn.load_extension("extension2")
conn.load_extension("extension3")
conn.enable_load_extension(False)

Troubleshooting

Extension conflicts

If extensions conflict, try loading them in different orders or check for table name collisions.

Missing tables

Ensure GraphQLite is loaded before querying graph tables. The schema is created on first load.

Transaction issues

Some extensions may have different transaction semantics. If you encounter issues, try committing between operations:

graphqlite.load(conn)
conn.commit()

other_extension.load(conn)
conn.commit()

Parameterized Queries

Parameterized queries prevent SQL injection and properly handle special characters. This guide shows how to use them.

Basic Usage

Use $parameter syntax in Cypher and pass a dictionary of parameters to Connection.cypher():

from graphqlite import Graph

g = Graph(":memory:")

# Named parameters via the connection
results = g.connection.cypher(
    "MATCH (n:Person {name: $name}) WHERE n.age > $age RETURN n",
    {"name": "Alice", "age": 30}
)

With the Connection API

The Connection.cypher() method accepts parameters as a dictionary:

from graphqlite import connect

conn = connect(":memory:")

# Create with parameters
conn.cypher(
    "CREATE (n:Person {name: $name, age: $age})",
    {"name": "Bob", "age": 25}
)

# Query with parameters
results = conn.cypher(
    "MATCH (n:Person) WHERE n.age >= $min_age RETURN n.name",
    {"min_age": 21}
)

With Raw SQL

When using the SQLite interface directly:

SELECT cypher(
    'MATCH (n:Person {name: $name}) RETURN n',
    '{"name": "Alice"}'
);

Parameter Types

Parameters support all JSON types:

params = json.dumps({
    "string_val": "hello",
    "int_val": 42,
    "float_val": 3.14,
    "bool_val": True,
    "null_val": None,
    "array_val": [1, 2, 3]
})

Use Cases

User Input

Always use parameters for user-provided values:

def search_by_name(user_input: str):
    # Safe - user input is parameterized
    return g.connection.cypher(
        "MATCH (n:Person {name: $name}) RETURN n",
        {"name": user_input}
    )

Batch Operations

people = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Carol", "age": 35},
]

for person in people:
    g.connection.cypher(
        "CREATE (n:Person {name: $name, age: $age})",
        person
    )

Complex Values

Parameters handle special characters automatically:

# This works correctly even with quotes and newlines
text = "He said \"hello\"\nand then left."
g.connection.cypher(
    "CREATE (n:Note {content: $text})",
    {"text": text}
)

Benefits

Security: Prevents Cypher injection attacks
Correctness: Properly handles quotes, newlines, and special characters
Performance: Query plans can be cached (future optimization)
Clarity: Separates query logic from data

Common Patterns

Optional Parameters

def search(name: str = None, min_age: int = None):
    conditions = []
    params = {}

    if name:
        conditions.append("n.name = $name")
        params["name"] = name
    if min_age:
        conditions.append("n.age >= $min_age")
        params["min_age"] = min_age

    where = f"WHERE {' AND '.join(conditions)}" if conditions else ""

    return g.connection.cypher(
        f"MATCH (n:Person) {where} RETURN n",
        params if params else None
    )

Lists in Parameters

names = ["Alice", "Bob", "Carol"]
results = g.connection.cypher(
    "MATCH (n:Person) WHERE n.name IN $names RETURN n",
    {"names": names}
)

Cypher Support

GraphQLite implements a substantial subset of the Cypher query language.

Overview

Cypher is a declarative graph query language originally developed by Neo4j. GraphQLite supports the core features needed for most graph operations.

Quick Reference

Feature	Support
Node patterns	✅ Full
Relationship patterns	✅ Full
Variable-length paths	✅ Full
shortestPath/allShortestPaths	✅ Full
Parameterized queries	✅ Full
MATCH/OPTIONAL MATCH	✅ Full
CREATE/MERGE	✅ Full
SET/REMOVE/DELETE	✅ Full
WITH/UNWIND/FOREACH	✅ Full
LOAD CSV	✅ Full
UNION/UNION ALL	✅ Full
RETURN with modifiers	✅ Full
Aggregation functions	✅ Full
CASE expressions	✅ Full
List comprehensions	✅ Full
Pattern comprehensions	✅ Full
Map projections	✅ Full
Multi-graph (FROM clause)	✅ Full
Graph algorithms	✅ 15+ built-in
CALL procedures	❌ Not supported
CREATE INDEX/CONSTRAINT	❌ Use SQLite

Pattern Syntax

Nodes

(n)                           -- Any node
(n:Person)                    -- Node with label
(n:Person {name: 'Alice'})    -- Node with properties
(:Person)                     -- Anonymous node with label

Relationships

-[r]->                        -- Outgoing relationship
<-[r]-                        -- Incoming relationship
-[r]-                         -- Either direction
-[:KNOWS]->                   -- Relationship with type
-[r:KNOWS {since: 2020}]->    -- With properties

Variable-Length Paths

-[*]->                        -- Any length
-[*2]->                       -- Exactly 2 hops
-[*1..3]->                    -- 1 to 3 hops
-[:KNOWS*1..5]->              -- Typed, 1 to 5 hops

Clauses

See Clauses Reference for detailed documentation.

Functions

See Functions Reference for the complete function list.

Operators

See Operators Reference for comparison and logical operators.

Implementation Notes

GraphQLite implements standard Cypher with some differences from full implementations:

No CALL procedures - Use built-in graph algorithm functions instead (e.g., RETURN pageRank())
No CREATE INDEX/CONSTRAINT - Use SQLite's indexing and constraint mechanisms directly
EXPLAIN supported - Returns the generated SQL for debugging instead of a query plan
Multi-graph support - Use the FROM clause to query specific graphs with GraphManager
Substring indexing - Uses 0-based indexing (Cypher standard), automatically converted for SQLite

Cypher Clauses

Reading Clauses

MATCH

Find patterns in the graph:

MATCH (n:Person) RETURN n
MATCH (a)-[:KNOWS]->(b) RETURN a, b
MATCH (n:Person {name: 'Alice'}) RETURN n

Shortest Path Patterns

Find shortest paths between nodes:

// Find a single shortest path
MATCH p = shortestPath((a:Person {name: 'Alice'})-[*]-(b:Person {name: 'Bob'}))
RETURN p, length(p)

// Find all shortest paths (all paths with minimum length)
MATCH p = allShortestPaths((a:Person)-[*]-(b:Person))
WHERE a.name = 'Alice' AND b.name = 'Bob'
RETURN p

// With relationship type filter
MATCH p = shortestPath((a)-[:KNOWS*]->(b))
RETURN nodes(p), relationships(p)

// With length constraints
MATCH p = shortestPath((a)-[*..10]->(b))
RETURN p

OPTIONAL MATCH

Like MATCH, but returns NULL for non-matches (left join semantics):

MATCH (p:Person)
OPTIONAL MATCH (p)-[:MANAGES]->(e)
RETURN p.name, e.name

WHERE

Filter results:

MATCH (n:Person)
WHERE n.age > 21 AND n.city = 'NYC'
RETURN n

Writing Clauses

CREATE

Create nodes and relationships:

CREATE (n:Person {name: 'Alice', age: 30})
CREATE (a)-[:KNOWS {since: 2020}]->(b)

MERGE

Create if not exists, match if exists:

MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()

SET

Update properties:

MATCH (n:Person {name: 'Alice'})
SET n.age = 31, n.city = 'LA'

Add labels:

MATCH (n:Person {name: 'Alice'})
SET n:Employee

REMOVE

Remove properties:

MATCH (n:Person {name: 'Alice'})
REMOVE n.temporary_field

Remove labels:

MATCH (n:Person:Employee {name: 'Alice'})
REMOVE n:Employee

DELETE

Delete nodes (must have no relationships):

MATCH (n:Person {name: 'Alice'})
DELETE n

DETACH DELETE

Delete nodes and all their relationships:

MATCH (n:Person {name: 'Alice'})
DETACH DELETE n

Composing Clauses

WITH

Chain query parts, aggregation, and filtering:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c, count(p) as employee_count
WHERE employee_count > 10
RETURN c.name, employee_count

UNWIND

Expand a list into rows:

UNWIND [1, 2, 3] AS x
RETURN x

UNWIND $names AS name
CREATE (n:Person {name: name})

FOREACH

Iterate and perform updates:

MATCH p = (start)-[*]->(end)
FOREACH (n IN nodes(p) | SET n.visited = true)

LOAD CSV

Import data from CSV files:

// With headers (access columns by name)
LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
CREATE (n:Person {name: row.name, age: toInteger(row.age)})

// Without headers (access columns by index)
LOAD CSV FROM 'file:///data.csv' AS row
CREATE (n:Item {id: row[0], value: row[1]})

// Custom field terminator
LOAD CSV WITH HEADERS FROM 'file:///data.tsv' AS row FIELDTERMINATOR '\t'
CREATE (n:Record {field1: row.col1})

Note: File paths are relative to the current working directory. Use file:/// prefix for local files.

Multi-Graph Queries

FROM Clause

Query specific graphs when using GraphManager (multi-graph support):

// Query a specific graph
MATCH (n:Person) FROM social
RETURN n.name

// Combined with other clauses
MATCH (p:Person) FROM social
WHERE p.age > 21
RETURN p.name, graph(p) AS source_graph

The graph() function returns which graph a node came from.

Combining Results

UNION

Combine results from multiple queries, removing duplicates:

MATCH (n:Person) WHERE n.city = 'NYC' RETURN n.name
UNION
MATCH (n:Person) WHERE n.age > 50 RETURN n.name

UNION ALL

Combine results keeping all rows (including duplicates):

MATCH (a:Person)-[:KNOWS]->(b) RETURN b.name AS connection
UNION ALL
MATCH (a:Person)-[:WORKS_WITH]->(b) RETURN b.name AS connection

Return Clause

RETURN

Specify what to return:

MATCH (n:Person) RETURN n
MATCH (n:Person) RETURN n.name, n.age
MATCH (n:Person) RETURN n.name AS name

DISTINCT

Remove duplicates:

MATCH (n:Person)-[:KNOWS]->(m)
RETURN DISTINCT m.city

ORDER BY

Sort results:

MATCH (n:Person)
RETURN n.name, n.age
ORDER BY n.age DESC, n.name ASC

SKIP and LIMIT

Pagination:

MATCH (n:Person)
RETURN n
ORDER BY n.name
SKIP 10
LIMIT 5

Aggregation

Use aggregate functions in RETURN or WITH:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, count(p), avg(p.salary), collect(p.name)

See Functions Reference for all aggregate functions.

Cypher Functions

String Functions

Function	Description	Example
`toLower(s)`	Convert to lowercase	`toLower('Hello')` → `'hello'`
`toUpper(s)`	Convert to uppercase	`toUpper('Hello')` → `'HELLO'`
`trim(s)`	Remove leading/trailing whitespace	`trim(' hi ')` → `'hi'`
`ltrim(s)`	Remove leading whitespace	`ltrim(' hi')` → `'hi'`
`rtrim(s)`	Remove trailing whitespace	`rtrim('hi ')` → `'hi'`
`replace(s, from, to)`	Replace occurrences	`replace('hello', 'l', 'x')` → `'hexxo'`
`substring(s, start, len)`	Extract substring	`substring('hello', 1, 3)` → `'ell'`
`left(s, n)`	First n characters	`left('hello', 2)` → `'he'`
`right(s, n)`	Last n characters	`right('hello', 2)` → `'lo'`
`split(s, delim)`	Split into list	`split('a,b,c', ',')` → `['a','b','c']`
`reverse(s)`	Reverse string	`reverse('hello')` → `'olleh'`
`length(s)`	String length	`length('hello')` → `5`
`size(s)`	String length (alias)	`size('hello')` → `5`
`toString(x)`	Convert to string	`toString(123)` → `'123'`

String Predicates

Function	Description	Example
`startsWith(s, prefix)`	Check prefix	`startsWith('hello', 'he')` → `true`
`endsWith(s, suffix)`	Check suffix	`endsWith('hello', 'lo')` → `true`
`contains(s, sub)`	Check substring	`contains('hello', 'ell')` → `true`

Math Functions

Function	Description	Example
`abs(n)`	Absolute value	`abs(-5)` → `5`
`ceil(n)`	Round up	`ceil(2.3)` → `3`
`floor(n)`	Round down	`floor(2.7)` → `2`
`round(n)`	Round to nearest	`round(2.5)` → `3`
`sign(n)`	Sign (-1, 0, 1)	`sign(-5)` → `-1`
`sqrt(n)`	Square root	`sqrt(16)` → `4`
`log(n)`	Natural logarithm	`log(e())` → `1`
`log10(n)`	Base-10 logarithm	`log10(100)` → `2`
`exp(n)`	e^n	`exp(1)` → `2.718...`
`rand()`	Random 0-1	`rand()` → `0.42...`
`random()`	Random 0-1 (alias)	`random()` → `0.42...`
`pi()`	π constant	`pi()` → `3.14159...`
`e()`	e constant	`e()` → `2.71828...`

Trigonometric Functions

Function	Description
`sin(n)`	Sine
`cos(n)`	Cosine
`tan(n)`	Tangent
`asin(n)`	Arc sine
`acos(n)`	Arc cosine
`atan(n)`	Arc tangent

List Functions

Function	Description	Example
`head(list)`	First element	`head([1,2,3])` → `1`
`tail(list)`	All but first	`tail([1,2,3])` → `[2,3]`
`last(list)`	Last element	`last([1,2,3])` → `3`
`size(list)`	Length	`size([1,2,3])` → `3`
`range(start, end)`	Create range	`range(1, 5)` → `[1,2,3,4,5]`
`reverse(list)`	Reverse list	`reverse([1,2,3])` → `[3,2,1]`
`keys(map)`	Get map keys	`keys({a:1, b:2})` → `['a','b']`

Aggregate Functions

Function	Description	Example
`count(x)`	Count items	`count(n)`, `count(*)`
`sum(x)`	Sum values	`sum(n.amount)`
`avg(x)`	Average	`avg(n.score)`
`min(x)`	Minimum	`min(n.age)`
`max(x)`	Maximum	`max(n.age)`
`collect(x)`	Collect into list	`collect(n.name)`

Entity Functions

Function	Description	Example
`id(node)`	Get node/edge ID	`id(n)`
`labels(node)`	Get node labels	`labels(n)` → `['Person']`
`type(rel)`	Get relationship type	`type(r)` → `'KNOWS'`
`properties(x)`	Get all properties	`properties(n)`
`startNode(rel)`	Start node of relationship	`startNode(r)`
`endNode(rel)`	End node of relationship	`endNode(r)`

Path Functions

Function	Description	Example
`nodes(path)`	Get all nodes in path	`nodes(p)`
`relationships(path)`	Get all relationships	`relationships(p)`
`rels(path)`	Get all relationships (alias)	`rels(p)`
`length(path)`	Path length (edges)	`length(p)`

Type Conversion

Function	Description	Example
`toInteger(x)`	Convert to integer	`toInteger('42')` → `42`
`toFloat(x)`	Convert to float	`toFloat('3.14')` → `3.14`
`toBoolean(x)`	Convert to boolean	`toBoolean('true')` → `true`
`coalesce(x, y, ...)`	First non-null value	`coalesce(n.name, 'Unknown')`

Temporal Functions

Function	Description	Example
`date()`	Current date	`date()` → `'2025-01-15'`
`datetime()`	Current datetime	`datetime()`
`time()`	Current time	`time()`
`timestamp()`	Unix timestamp (ms)	`timestamp()`
`localdatetime()`	Local datetime	`localdatetime()`
`randomUUID()`	Generate random UUID	`randomUUID()` → `'550e8400-e29b-...'`

Predicate Functions

Function	Description	Example
`exists(pattern)`	Pattern exists	`EXISTS { (n)-[:KNOWS]->() }`
`exists(prop)`	Property exists	`exists(n.email)`
`all(x IN list WHERE pred)`	All match	`all(x IN [1,2,3] WHERE x > 0)`
`any(x IN list WHERE pred)`	Any match	`any(x IN [1,2,3] WHERE x > 2)`
`none(x IN list WHERE pred)`	None match	`none(x IN [1,2,3] WHERE x < 0)`
`single(x IN list WHERE pred)`	Exactly one	`single(x IN [1,2,3] WHERE x = 2)`

Reduce

Function	Description	Example
`reduce(acc = init, x IN list \| expr)`	Fold/reduce	`reduce(s = 0, x IN [1,2,3] \| s + x)` → `6`

CASE Expressions

Searched CASE

Evaluates conditions in order and returns the first matching result:

RETURN CASE
    WHEN n.age < 18 THEN 'minor'
    WHEN n.age < 65 THEN 'adult'
    ELSE 'senior'
END AS category

Simple CASE

Compares an expression against values:

RETURN CASE n.status
    WHEN 'A' THEN 'Active'
    WHEN 'I' THEN 'Inactive'
    WHEN 'P' THEN 'Pending'
    ELSE 'Unknown'
END AS status_name

Comprehensions

List Comprehension

Create lists by transforming or filtering:

// Transform each element
RETURN [x IN range(1, 5) | x * 2]
// → [2, 4, 6, 8, 10]

// Filter elements
RETURN [x IN range(1, 10) WHERE x % 2 = 0]
// → [2, 4, 6, 8, 10]

// Filter and transform
RETURN [x IN range(1, 10) WHERE x % 2 = 0 | x * x]
// → [4, 16, 36, 64, 100]

Pattern Comprehension

Extract data from pattern matches within an expression:

// Collect names of friends
MATCH (p:Person)
RETURN p.name, [(p)-[:KNOWS]->(friend) | friend.name] AS friends

// With filtering
RETURN [(p)-[:KNOWS]->(f:Person) WHERE f.age > 21 | f.name] AS adult_friends

Map Projection

Create maps by selecting properties from nodes:

// Select specific properties
MATCH (n:Person)
RETURN n {.name, .age}
// → {name: "Alice", age: 30}

// Include computed values
MATCH (n:Person)
RETURN n {.name, status: 'active', upperName: toUpper(n.name)}

Cypher Operators

Comparison Operators

Operator	Description	Example
`=`	Equal	`n.age = 30`
`<>`	Not equal	`n.status <> 'deleted'`
`<`	Less than	`n.age < 18`
`>`	Greater than	`n.age > 65`
`<=`	Less than or equal	`n.score <= 100`
`>=`	Greater than or equal	`n.score >= 0`

Boolean Operators

Operator	Description	Example
`AND`	Logical and	`n.age > 18 AND n.active = true`
`OR`	Logical or	`n.role = 'admin' OR n.role = 'mod'`
`NOT`	Logical not	`NOT n.deleted`
`XOR`	Exclusive or	`a.flag XOR b.flag`

Null Operators

Operator	Description	Example
`IS NULL`	Check for null	`n.email IS NULL`
`IS NOT NULL`	Check for non-null	`n.email IS NOT NULL`

String Operators

Operator	Description	Example
`STARTS WITH`	Prefix match	`n.name STARTS WITH 'A'`
`ENDS WITH`	Suffix match	`n.email ENDS WITH '.com'`
`CONTAINS`	Substring match	`n.bio CONTAINS 'developer'`
`=~`	Regex match	`n.email =~ '.*@gmail\\.com'`

List Operators

Operator	Description	Example
`IN`	List membership	`n.status IN ['active', 'pending']`
`+`	List concatenation	`[1, 2] + [3, 4]` → `[1, 2, 3, 4]`
`[index]`	Index access	`list[0]` (first element)

Arithmetic Operators

Operator	Description	Example
`+`	Addition	`n.price + tax`
`-`	Subtraction	`n.total - discount`
`*`	Multiplication	`n.quantity * n.price`
`/`	Division	`n.total / n.count`
`%`	Modulo	`n.id % 10`

String Concatenation

Operator	Description	Example
`+`	Concatenate strings	`n.first + ' ' + n.last`

Property Access

Operator	Description	Example
`.`	Property access	`n.name`

Operator Precedence

From highest to lowest:

. [] - Property/index access
* / % - Multiplication, division, modulo
+ - - Addition, subtraction
= <> < > <= >= - Comparison
IS NULL IS NOT NULL
IN STARTS WITH ENDS WITH CONTAINS =~
NOT
AND
XOR
OR

Use parentheses to override precedence:

WHERE (n.age > 18 OR n.verified) AND n.active

Graph Algorithms

GraphQLite includes 15+ built-in graph algorithms.

Centrality Algorithms

PageRank

Measures node importance based on incoming links from important nodes.

RETURN pageRank()
RETURN pageRank(0.85, 20)  -- damping, iterations

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Parameters:

damping (default: 0.85) - Probability of following a link
iterations (default: 20) - Number of iterations

Degree Centrality

Counts incoming and outgoing connections.

RETURN degreeCentrality()

Returns: [{"node_id": int, "user_id": string, "in_degree": int, "out_degree": int, "degree": int}, ...]

Betweenness Centrality

Measures how often a node lies on shortest paths between other nodes.

RETURN betweennessCentrality()

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Closeness Centrality

Measures average distance to all other nodes.

RETURN closenessCentrality()

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Eigenvector Centrality

Measures influence based on connections to high-scoring nodes.

RETURN eigenvectorCentrality()
RETURN eigenvectorCentrality(100)  -- max iterations

Returns: [{"node_id": int, "user_id": string, "score": float}, ...]

Community Detection

Label Propagation

Detects communities by propagating labels through the network.

RETURN labelPropagation()
RETURN labelPropagation(10)  -- max iterations
RETURN communities()         -- alias

Returns: [{"node_id": int, "user_id": string, "community": int}, ...]

Louvain

Hierarchical community detection optimizing modularity.

RETURN louvain()
RETURN louvain(1.0)  -- resolution parameter

Returns: [{"node_id": int, "user_id": string, "community": int}, ...]

Connected Components

Weakly Connected Components (WCC)

Groups nodes reachable by ignoring edge direction.

RETURN wcc()

Returns: [{"node_id": int, "user_id": string, "component": int}, ...]

Strongly Connected Components (SCC)

Groups nodes where every node can reach every other node following edge direction.

RETURN scc()

Returns: [{"node_id": int, "user_id": string, "component": int}, ...]

Path Finding

Dijkstra (Shortest Path)

Finds shortest path between two nodes.

RETURN dijkstra('source_id', 'target_id')

Returns: {"found": bool, "distance": int, "path": [node_ids]}

The found field indicates whether a path exists. When found is false, distance is null and path is empty.

A* Search

Shortest path with heuristic. Can use geographic coordinates for distance estimation or fall back to uniform heuristic.

RETURN astar('source_id', 'target_id')
RETURN astar('source_id', 'target_id', 'lat_prop', 'lon_prop')

When lat_prop and lon_prop are provided, A* uses haversine distance as the heuristic. Without these properties, it behaves similarly to Dijkstra but may explore fewer nodes.

Returns: {"found": bool, "distance": float, "path": [node_ids], "nodes_explored": int}

All-Pairs Shortest Paths (APSP)

Computes shortest distances between all node pairs.

RETURN apsp()

Returns: [{"source": string, "target": string, "distance": int}, ...]

Note: O(n²) space and time complexity. Use with caution on large graphs.

Traversal

Breadth-First Search (BFS)

Explores nodes level by level from a starting point.

RETURN bfs('start_id')
RETURN bfs('start_id', 3)  -- max depth

Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]

The order field indicates the traversal order (0 = starting node, then incrementing).

Depth-First Search (DFS)

Explores as far as possible along each branch.

RETURN dfs('start_id')
RETURN dfs('start_id', 5)  -- max depth

Returns: [{"node_id": int, "user_id": string, "depth": int, "order": int}, ...]

Similarity

Node Similarity (Jaccard)

Computes Jaccard similarity between node neighborhoods.

RETURN nodeSimilarity()

Returns: [{"node1": int, "node2": int, "similarity": float}, ...]

K-Nearest Neighbors (KNN)

Finds k most similar nodes to a given node based on Jaccard similarity of neighborhoods.

RETURN knn('node_id', 10)  -- node, k

Returns: [{"neighbor": string, "similarity": float, "rank": int}, ...]

Results are ordered by similarity (highest first), with rank starting at 1.

Triangle Count

Counts triangles and computes clustering coefficient.

RETURN triangleCount()

Returns: [{"node_id": int, "user_id": string, "triangles": int, "clustering_coefficient": float}, ...]

Using Results in SQL

Extract algorithm results using SQLite JSON functions:

SELECT
    json_extract(value, '$.node_id') as id,
    json_extract(value, '$.score') as score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Python API Reference

Installation

pip install graphqlite

Module Functions

graphqlite.connect()

Create a connection to a SQLite database with GraphQLite loaded.

from graphqlite import connect

conn = connect(":memory:")
conn = connect("graph.db")
conn = connect("graph.db", extension_path="/path/to/graphqlite.dylib")

Parameters:

database (str) - Database path or :memory:
extension_path (str, optional) - Path to extension file

Returns: Connection

graphqlite.load()

Load GraphQLite into an existing sqlite3 connection.

import sqlite3
import graphqlite

conn = sqlite3.connect(":memory:")
graphqlite.load(conn)

Parameters:

conn - sqlite3.Connection or apsw.Connection
entry_point (str, optional) - Extension entry point

graphqlite.loadable_path()

Get the path to the loadable extension.

path = graphqlite.loadable_path()

Returns: str

graphqlite.wrap()

Wrap an existing sqlite3 connection with GraphQLite support.

import sqlite3
import graphqlite

conn = sqlite3.connect(":memory:")
wrapped = graphqlite.wrap(conn)
results = wrapped.cypher("RETURN 1 AS x")

Parameters:

conn - sqlite3.Connection object
extension_path (str, optional) - Path to extension file

Returns: Connection

graphqlite.graph()

Factory function to create a Graph instance.

from graphqlite import graph

g = graph(":memory:")
g = graph("graph.db", namespace="myapp")

Parameters:

db_path (str) - Database path or :memory:
namespace (str, optional) - Graph namespace (default: "default")
extension_path (str, optional) - Path to extension file

Returns: Graph

CypherResult Class

Result container returned by cypher() queries.

results = conn.cypher("MATCH (n:Person) RETURN n.name, n.age")

# Length
print(len(results))  # Number of rows

# Indexing
first_row = results[0]  # Get first row as dict

# Iteration
for row in results:
    print(row["n.name"])

# Column names
print(results.columns)  # ["n.name", "n.age"]

# Convert to list
all_rows = results.to_list()  # List of dicts

Properties:

columns - List of column names

Methods:

to_list() - Return all rows as a list of dictionaries

Connection Class

Connection.cypher()

Execute a Cypher query with optional parameters.

conn.cypher("CREATE (n:Person {name: 'Alice'})")
results = conn.cypher("MATCH (n) RETURN n.name")
for row in results:
    print(row["n.name"])

# With parameters
results = conn.cypher(
    "MATCH (n:Person {name: $name}) RETURN n",
    {"name": "Alice"}
)

The query parameter is the Cypher query string. The optional params parameter accepts a dictionary that will be converted to JSON for parameter binding.

Returns: CypherResult object (iterable, supports indexing and len())

Connection.execute()

Execute raw SQL.

conn.execute("SELECT * FROM nodes")

Graph Class

High-level API for graph operations.

Constructor

from graphqlite import Graph

g = Graph(":memory:")
g = Graph("graph.db")

Node Operations

upsert_node()

Create or update a node.

g.upsert_node("alice", {"name": "Alice", "age": 30}, label="Person")

Parameters:

node_id (str) - Unique node identifier
properties (dict) - Node properties
label (str, optional) - Node label

get_node()

Get a node by ID.

node = g.get_node("alice")
# {"id": "alice", "label": "Person", "properties": {"name": "Alice", "age": 30}}

Returns: dict or None

has_node()

Check if a node exists.

exists = g.has_node("alice")  # True

Returns: bool

delete_node()

Delete a node.

g.delete_node("alice")

get_all_nodes()

Get all nodes, optionally filtered by label.

all_nodes = g.get_all_nodes()
people = g.get_all_nodes(label="Person")

Returns: List of dicts

Edge Operations

upsert_edge()

Create or update an edge. If an edge of the same type already exists, its properties are updated (merge semantics).

g.upsert_edge("alice", "bob", {"since": 2020}, rel_type="KNOWS")

# Update properties on existing edge
g.upsert_edge("alice", "bob", {"since": 2021}, rel_type="KNOWS")

# Multiple relationship types between the same nodes
g.upsert_edge("alice", "bob", {"project": "X"}, rel_type="WORKS_WITH")

Parameters:

source_id (str) - Source node ID
target_id (str) - Target node ID
properties (dict) - Edge properties
rel_type (str, optional) - Relationship type (default: "RELATED")

get_edge()

Get an edge between two nodes.

edge = g.get_edge("alice", "bob")

# Get a specific relationship type
edge = g.get_edge("alice", "bob", rel_type="KNOWS")

Parameters:

source_id (str) - Source node ID
target_id (str) - Target node ID
rel_type (str, optional) - Relationship type to retrieve. If omitted, matches any type.

Returns: dict or None

has_edge()

Check if an edge exists.

exists = g.has_edge("alice", "bob")

# Check for a specific relationship type
exists = g.has_edge("alice", "bob", rel_type="KNOWS")

Parameters:

source_id (str) - Source node ID
target_id (str) - Target node ID
rel_type (str, optional) - Relationship type to check for. If omitted, matches any type.

Returns: bool

delete_edge()

Delete an edge between two nodes.

g.delete_edge("alice", "bob")

# Delete only a specific relationship type
g.delete_edge("alice", "bob", rel_type="KNOWS")

Parameters:

source_id (str) - Source node ID
target_id (str) - Target node ID
rel_type (str, optional) - Relationship type to delete. If omitted, deletes all edges between the nodes.

get_all_edges()

Get all edges.

edges = g.get_all_edges()

Returns: List of dicts

Graph Operations

get_neighbors()

Get a node's neighbors (connected by edges in either direction).

neighbors = g.get_neighbors("alice")

Parameters:

node_id (str) - Node ID

Returns: List of neighbor node dicts

node_degree()

Get a node's degree, which is the total number of edges connected to the node (both incoming and outgoing).

degree = g.node_degree("alice")  # 5

Returns an integer count of connected edges.

stats()

Get graph statistics.

stats = g.stats()
# {"nodes": 100, "edges": 250}

Returns: dict

Query Methods

query()

Execute a Cypher query and return results as a list of dictionaries.

results = g.query("MATCH (n:Person) RETURN n.name")
for row in results:
    print(row["n.name"])

This method is for queries that don't require parameters. For parameterized queries, access the underlying connection:

results = g.connection.cypher(
    "MATCH (n:Person {name: $name}) RETURN n",
    {"name": "Alice"}
)

Algorithm Methods

Centrality Algorithms

pagerank()

Compute PageRank scores for all nodes.

results = g.pagerank(damping=0.85, iterations=20)
# [{"node_id": "alice", "score": 0.25}, ...]

Parameters:

damping (float, default: 0.85) - Damping factor
iterations (int, default: 20) - Number of iterations

degree_centrality()

Compute in-degree, out-degree, and total degree for all nodes.

results = g.degree_centrality()
# [{"node_id": "alice", "in_degree": 2, "out_degree": 3, "degree": 5}, ...]

betweenness_centrality()

Compute betweenness centrality (how often a node lies on shortest paths).

results = g.betweenness_centrality()
# Alias: g.betweenness()

Returns: List of {"node_id": str, "score": float}

closeness_centrality()

Compute closeness centrality (average distance to all other nodes).

results = g.closeness_centrality()
# Alias: g.closeness()

Returns: List of {"node_id": str, "score": float}

eigenvector_centrality()

Compute eigenvector centrality (influence based on connections to high-scoring nodes).

results = g.eigenvector_centrality(iterations=100)

Parameters:

iterations (int, default: 100) - Maximum iterations

Community Detection

community_detection()

Detect communities using label propagation.

results = g.community_detection(iterations=10)
# [{"node_id": "alice", "community": 1}, ...]

Parameters:

iterations (int, default: 10) - Maximum iterations

louvain()

Detect communities using the Louvain algorithm (modularity optimization).

results = g.louvain(resolution=1.0)

Parameters:

resolution (float, default: 1.0) - Higher values produce more communities

leiden_communities()

Detect communities using the Leiden algorithm.

results = g.leiden_communities(resolution=1.0, random_seed=42)

Parameters:

resolution (float, default: 1.0) - Resolution parameter
random_seed (int, optional) - Random seed for reproducibility

Requires: graspologic>=3.0 (pip install graspologic)

Connected Components

weakly_connected_components()

Find weakly connected components (ignoring edge direction).

results = g.weakly_connected_components()
# Aliases: g.connected_components(), g.wcc()

Returns: List of {"node_id": str, "component": int}

strongly_connected_components()

Find strongly connected components (respecting edge direction).

results = g.strongly_connected_components()
# Alias: g.scc()

Returns: List of {"node_id": str, "component": int}

Path Finding

shortest_path()

Find the shortest path between two nodes using Dijkstra's algorithm.

path = g.shortest_path("alice", "bob", weight_property="distance")
# {"distance": 2, "path": ["alice", "carol", "bob"], "found": True}
# Alias: g.dijkstra()

Parameters:

source_id (str) - Starting node ID
target_id (str) - Ending node ID
weight_property (str, optional) - Edge property to use as weight

Returns: {"path": list, "distance": float|None, "found": bool}

astar()

Find the shortest path using A* algorithm with optional geographic heuristic.

path = g.astar("alice", "bob", lat_prop="latitude", lon_prop="longitude")
# Alias: g.a_star()

Parameters:

source_id (str) - Starting node ID
target_id (str) - Ending node ID
lat_prop (str, optional) - Latitude property name for heuristic
lon_prop (str, optional) - Longitude property name for heuristic

Returns: {"path": list, "distance": float|None, "found": bool, "nodes_explored": int}

all_pairs_shortest_path()

Compute shortest distances between all node pairs (Floyd-Warshall).

results = g.all_pairs_shortest_path()
# Alias: g.apsp()

Returns: List of {"source": str, "target": str, "distance": float}

Note: O(n²) complexity. Use with caution on large graphs.

Traversal

bfs()

Breadth-first search from a starting node.

results = g.bfs("alice", max_depth=3)
# Alias: g.breadth_first_search()

Parameters:

start_id (str) - Starting node ID
max_depth (int, default: -1) - Maximum depth (-1 for unlimited)

Returns: List of {"user_id": str, "depth": int, "order": int}

dfs()

Depth-first search from a starting node.

results = g.dfs("alice", max_depth=5)
# Alias: g.depth_first_search()

Parameters:

start_id (str) - Starting node ID
max_depth (int, default: -1) - Maximum depth (-1 for unlimited)

Returns: List of {"user_id": str, "depth": int, "order": int}

Similarity

node_similarity()

Compute Jaccard similarity between node neighborhoods.

# All pairs above threshold
results = g.node_similarity(threshold=0.5)

# Specific pair
results = g.node_similarity(node1_id="alice", node2_id="bob")

# Top-k most similar pairs
results = g.node_similarity(top_k=10)

Parameters:

node1_id (str, optional) - First node ID
node2_id (str, optional) - Second node ID
threshold (float, default: 0.0) - Minimum similarity threshold
top_k (int, default: 0) - Return only top-k pairs (0 for all)

Returns: List of {"node1": str, "node2": str, "similarity": float}

knn()

Find k-nearest neighbors for a node based on Jaccard similarity.

results = g.knn("alice", k=10)

Parameters:

node_id (str) - Node to find neighbors for
k (int, default: 10) - Number of neighbors to return

Returns: List of {"neighbor": str, "similarity": float, "rank": int}

triangle_count()

Count triangles and compute clustering coefficients.

results = g.triangle_count()
# Alias: g.triangles()

Returns: List of {"node_id": str, "triangles": int, "clustering_coefficient": float}

Export

to_rustworkx()

Export the graph to a rustworkx PyDiGraph for use with rustworkx algorithms.

graph, node_map = g.to_rustworkx()

Returns: Tuple of (rustworkx.PyDiGraph, dict mapping node IDs to indices)

Requires: rustworkx>=0.13 (pip install rustworkx)

Batch Operations

upsert_nodes_batch()

nodes = [
    ("alice", {"name": "Alice"}, "Person"),
    ("bob", {"name": "Bob"}, "Person"),
]
g.upsert_nodes_batch(nodes)

upsert_edges_batch()

edges = [
    ("alice", "bob", {"since": 2020}, "KNOWS"),
    ("bob", "carol", {"since": 2021}, "KNOWS"),
]
g.upsert_edges_batch(edges)

GraphManager Class

Manages multiple graph databases in a directory with cross-graph query support.

Constructor

from graphqlite import graphs, GraphManager

# Using factory function (recommended)
gm = graphs("./data")

# Or direct instantiation
gm = GraphManager("./data")
gm = GraphManager("./data", extension_path="/path/to/graphqlite.dylib")

Context Manager

with graphs("./data") as gm:
    # Work with graphs...
    pass  # All connections closed automatically

Graph Management

list()

List all graphs in the directory.

names = gm.list()  # ["products", "social", "users"]

Returns: List of graph names (sorted)

exists()

Check if a graph exists.

if gm.exists("social"):
    print("Graph exists")

Returns: bool

create()

Create a new graph.

g = gm.create("social")

Parameters:

name (str) - Graph name

Returns: Graph instance

Raises: FileExistsError if graph already exists

open()

Open an existing graph.

g = gm.open("social")

Parameters:

name (str) - Graph name

Returns: Graph instance

Raises: FileNotFoundError if graph doesn't exist

open_or_create()

Open a graph, creating it if it doesn't exist.

g = gm.open_or_create("cache")

Returns: Graph instance

drop()

Delete a graph and its database file.

gm.drop("old_graph")

Raises: FileNotFoundError if graph doesn't exist

Cross-Graph Queries

query()

Execute a Cypher query across multiple graphs.

result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source",
    graphs=["social"]
)
for row in result:
    print(f"{row['n.name']} from {row['source']}")

Parameters:

cypher (str) - Cypher query with FROM clauses
graphs (list) - Graph names to attach
params (dict, optional) - Query parameters

Returns: CypherResult

query_sql()

Execute raw SQL across attached graphs.

result = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    graphs=["social"]
)

Parameters:

sql (str) - SQL query with graph-prefixed table names
graphs (list) - Graph names to attach
parameters (tuple, optional) - Query parameters

Returns: List of tuples

Collection Interface

# Length
len(gm)  # Number of graphs

# Membership
"social" in gm  # True/False

# Iteration
for name in gm:
    print(name)

Utility Functions

escape_string()

Escape a string for use in Cypher.

from graphqlite import escape_string

safe = escape_string("It's a test")

sanitize_rel_type()

Sanitize a relationship type name.

from graphqlite import sanitize_rel_type

safe = sanitize_rel_type("has-friend")  # "HAS_FRIEND"

CYPHER_RESERVED

A set of reserved Cypher keywords that need special handling in queries.

from graphqlite import CYPHER_RESERVED

if my_label.upper() in CYPHER_RESERVED:
    my_label = f"`{my_label}`"  # Quote reserved words

Contains keywords like: MATCH, CREATE, RETURN, WHERE, AND, OR, NOT, IN, AS, WITH, ORDER, BY, LIMIT, SKIP, DELETE, SET, REMOVE, MERGE, ON, CASE, WHEN, THEN, ELSE, END, TRUE, FALSE, NULL, etc.

Rust API Reference

Installation

Add to your Cargo.toml:

[dependencies]
graphqlite = "0.3"

Connection

Opening a Connection

#![allow(unused)]
fn main() {
use graphqlite::Connection;

// In-memory database
let conn = Connection::open_in_memory()?;

// File-based database
let conn = Connection::open("graph.db")?;

// With custom extension path
let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?;
}

Executing Cypher Queries

#![allow(unused)]
fn main() {
// Execute without results
conn.cypher("CREATE (n:Person {name: 'Alice'})")?;

// Execute with results
let rows = conn.cypher("MATCH (n:Person) RETURN n.name")?;
for row in rows {
    let name: String = row.get(0)?;
    println!("{}", name);
}
}

Parameterized Queries

For parameterized queries, embed parameters in the query string:

#![allow(unused)]
fn main() {
use serde_json::json;

let params = json!({"name": "Alice", "age": 30});
let query = format!(
    "CREATE (n:Person {{name: '{}', age: {}}})",
    params["name"].as_str().unwrap(),
    params["age"]
);
conn.cypher(&query)?;
}

Note: Direct parameter binding is planned for a future release.

Row Access

Access row values by column name using the get() method:

#![allow(unused)]
fn main() {
let results = conn.cypher("MATCH (n) RETURN n.name AS name, n.age AS age")?;
for row in &results {
    let name: String = row.get("name")?;
    let age: i32 = row.get("age")?;
    println!("{} is {} years old", name, age);
}
}

The column name must match the alias in your RETURN clause. Use AS to create readable column names.

Type Conversions

GraphQLite automatically converts between Cypher and Rust types:

Cypher Type	Rust Type
Integer	`i32`, `i64`
Float	`f64`
String	`String`, `&str`
Boolean	`bool`
Null	`Option<T>`
List	`Vec<T>`
Map	`serde_json::Value`

Error Handling

#![allow(unused)]
fn main() {
use graphqlite::{Connection, Error};

fn example() -> Result<(), Error> {
    let conn = Connection::open_in_memory()?;

    match conn.cypher("INVALID QUERY") {
        Ok(rows) => { /* process rows */ }
        Err(Error::Cypher(msg)) => {
            eprintln!("Cypher query error: {}", msg);
        }
        Err(Error::Sqlite(e)) => {
            eprintln!("SQLite error: {}", e);
        }
        Err(e) => {
            eprintln!("Other error: {}", e);
        }
    }

    Ok(())
}
}

Error Variants

The Error enum includes the following variants:

#![allow(unused)]
fn main() {
pub enum Error {
    Sqlite(rusqlite::Error),           // SQLite database errors
    Json(serde_json::Error),           // JSON parsing errors
    Cypher(String),                    // Cypher query errors
    ExtensionNotFound(String),         // Extension file not found
    TypeError { expected: &'static str, actual: String }, // Type conversion errors
    ColumnNotFound(String),            // Column doesn't exist in result
    GraphExists(String),               // Graph already exists (GraphManager)
    GraphNotFound { name: String, available: Vec<String> }, // Graph not found
    Io(std::io::Error),                // File I/O errors
}
}

Complete Example

use graphqlite::Connection;

fn main() -> Result<(), graphqlite::Error> {
    // Open connection
    let conn = Connection::open_in_memory()?;

    // Create nodes
    conn.cypher("CREATE (a:Person {name: 'Alice', age: 30})")?;
    conn.cypher("CREATE (b:Person {name: 'Bob', age: 25})")?;

    // Create relationship
    conn.cypher("
        MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
        CREATE (a)-[:KNOWS {since: 2020}]->(b)
    ")?;

    // Query with aliases
    let results = conn.cypher("
        MATCH (a:Person)-[:KNOWS]->(b:Person)
        RETURN a.name AS from_person, b.name AS to_person
    ")?;

    for row in &results {
        let from: String = row.get("from_person")?;
        let to: String = row.get("to_person")?;
        println!("{} knows {}", from, to);
    }

    // Query with filter (embedding values directly)
    let min_age = 26;
    let results = conn.cypher(&format!(
        "MATCH (n:Person) WHERE n.age >= {} RETURN n.name AS name",
        min_age
    ))?;

    for row in &results {
        let name: String = row.get("name")?;
        println!("Adult: {}", name);
    }

    Ok(())
}

Graph Class

High-level API for graph operations, providing ergonomic methods for nodes, edges, and algorithms.

Creating a Graph

#![allow(unused)]
fn main() {
use graphqlite::Graph;

// In-memory graph
let g = Graph::open_in_memory()?;

// File-based graph
let g = Graph::open("graph.db")?;

// With custom extension path
let g = Graph::open_with_extension("graph.db", "/path/to/graphqlite.so")?;

// From existing connection
let g = Graph::from_connection(conn)?;
}

Node Operations

#![allow(unused)]
fn main() {
// Create or update a node
g.upsert_node("alice", [("name", "Alice"), ("age", "30")], "Person")?;

// Check if node exists
if g.has_node("alice")? {
    println!("Alice exists");
}

// Get a node
if let Some(node) = g.get_node("alice")? {
    println!("Found: {:?}", node);
}

// Get all nodes (optionally filtered by label)
let all_nodes = g.get_all_nodes(None)?;
let people = g.get_all_nodes(Some("Person"))?;

// Delete a node (also deletes connected edges)
g.delete_node("alice")?;
}

Edge Operations

#![allow(unused)]
fn main() {
// Create or update an edge
g.upsert_edge("alice", "bob", [("since", "2020")], "KNOWS")?;

// Update properties on an existing edge (true upsert)
g.upsert_edge("alice", "bob", [("since", "2021")], "KNOWS")?;

// Multiple relationship types between the same nodes
g.upsert_edge("alice", "bob", [("project", "X")], "WORKS_WITH")?;

// Check if edge exists (any type)
if g.has_edge("alice", "bob", None)? {
    println!("Edge exists");
}

// Check for a specific relationship type
if g.has_edge("alice", "bob", Some("KNOWS"))? {
    println!("KNOWS edge exists");
}

// Get an edge (any type)
if let Some(edge) = g.get_edge("alice", "bob", None)? {
    println!("Edge: {:?}", edge);
}

// Get a specific relationship type
if let Some(edge) = g.get_edge("alice", "bob", Some("KNOWS"))? {
    println!("KNOWS edge: {:?}", edge);
}

// Get all edges
let edges = g.get_all_edges()?;

// Delete all edges between two nodes
g.delete_edge("alice", "bob", None)?;

// Delete only a specific relationship type
g.delete_edge("alice", "bob", Some("KNOWS"))?;
}

Query Operations

#![allow(unused)]
fn main() {
// Execute Cypher query
let results = g.query("MATCH (n:Person) RETURN n.name")?;

// Get graph statistics
let stats = g.stats()?;
println!("Nodes: {}, Edges: {}", stats.nodes, stats.edges);

// Get node degree (connection count)
let degree = g.node_degree("alice")?;

// Get neighbors
let neighbors = g.get_neighbors("alice")?;
}

Batch Operations

#![allow(unused)]
fn main() {
// Batch insert nodes
let nodes = vec![
    ("alice", vec![("name", "Alice")], "Person"),
    ("bob", vec![("name", "Bob")], "Person"),
];
g.upsert_nodes_batch(nodes)?;

// Batch insert edges
let edges = vec![
    ("alice", "bob", vec![("since", "2020")], "KNOWS"),
    ("bob", "carol", vec![("since", "2021")], "KNOWS"),
];
g.upsert_edges_batch(edges)?;
}

Algorithm Methods

Centrality

#![allow(unused)]
fn main() {
// PageRank
let results = g.pagerank(0.85, 20)?;  // damping, iterations
for r in results {
    println!("{}: {}", r.user_id.unwrap_or_default(), r.score);
}

// Degree centrality
let results = g.degree_centrality()?;
for r in results {
    println!("{}: in={}, out={}, total={}",
        r.user_id.unwrap_or_default(), r.in_degree, r.out_degree, r.degree);
}

// Betweenness centrality
let results = g.betweenness_centrality()?;

// Closeness centrality
let results = g.closeness_centrality()?;

// Eigenvector centrality
let results = g.eigenvector_centrality(100)?;  // iterations
}

Community Detection

#![allow(unused)]
fn main() {
// Label propagation
let results = g.community_detection(10)?;  // iterations
for r in results {
    println!("{} is in community {}", r.user_id.unwrap_or_default(), r.community);
}

// Louvain algorithm
let results = g.louvain(1.0)?;  // resolution
}

Connected Components

#![allow(unused)]
fn main() {
// Weakly connected components
let results = g.wcc()?;

// Strongly connected components
let results = g.scc()?;
}

Path Finding

#![allow(unused)]
fn main() {
// Shortest path (Dijkstra)
let result = g.shortest_path("alice", "bob", None)?;  // optional weight property
if result.found {
    println!("Path: {:?}, Distance: {:?}", result.path, result.distance);
}

// A* search (with optional lat/lon heuristic)
let result = g.astar("alice", "bob", None, None)?;
println!("Explored {} nodes", result.nodes_explored);

// All-pairs shortest paths
let results = g.apsp()?;
for r in results {
    println!("{} -> {}: {}", r.source, r.target, r.distance);
}
}

Traversal

#![allow(unused)]
fn main() {
// Breadth-first search
let results = g.bfs("alice", Some(3))?;  // optional max depth
for r in results {
    println!("{} at depth {} (order {})", r.user_id, r.depth, r.order);
}

// Depth-first search
let results = g.dfs("alice", None)?;  // None = unlimited depth
}

Similarity

#![allow(unused)]
fn main() {
// Node similarity (Jaccard)
let results = g.node_similarity(None, None, 0.5, 10)?;  // node1, node2, threshold, top_k
for r in results {
    println!("{} <-> {}: {}", r.node1, r.node2, r.similarity);
}

// K-nearest neighbors
let results = g.knn("alice", 5)?;
for r in results {
    println!("#{}: {} (similarity: {})", r.rank, r.neighbor, r.similarity);
}

// Triangle count
let results = g.triangle_count()?;
for r in results {
    println!("{}: {} triangles, clustering={}",
        r.user_id.unwrap_or_default(), r.triangles, r.clustering_coefficient);
}
}

Algorithm Result Types

All algorithm methods return strongly-typed result structs:

#![allow(unused)]
fn main() {
// PageRank, Betweenness, Closeness, Eigenvector
pub struct PageRankResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub score: f64,
}

// Degree Centrality
pub struct DegreeCentralityResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub in_degree: i64,
    pub out_degree: i64,
    pub degree: i64,
}

// Community Detection, Louvain
pub struct CommunityResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub community: i64,
}

// WCC, SCC
pub struct ComponentResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub component: i64,
}

// Shortest Path
pub struct ShortestPathResult {
    pub path: Vec<String>,
    pub distance: Option<f64>,
    pub found: bool,
}

// A* Search
pub struct AStarResult {
    pub path: Vec<String>,
    pub distance: Option<f64>,
    pub found: bool,
    pub nodes_explored: i64,
}

// All-Pairs Shortest Path
pub struct ApspResult {
    pub source: String,
    pub target: String,
    pub distance: f64,
}

// BFS, DFS
pub struct TraversalResult {
    pub user_id: String,
    pub depth: i64,
    pub order: i64,
}

// Node Similarity
pub struct NodeSimilarityResult {
    pub node1: String,
    pub node2: String,
    pub similarity: f64,
}

// KNN
pub struct KnnResult {
    pub neighbor: String,
    pub similarity: f64,
    pub rank: i64,
}

// Triangle Count
pub struct TriangleCountResult {
    pub node_id: String,
    pub user_id: Option<String>,
    pub triangles: i64,
    pub clustering_coefficient: f64,
}
}

GraphManager

Manages multiple graph databases in a directory with cross-graph query support.

Creating a GraphManager

#![allow(unused)]
fn main() {
use graphqlite::{graphs, GraphManager};

// Using factory function (recommended)
let mut gm = graphs("./data")?;

// Or direct instantiation
let mut gm = GraphManager::open("./data")?;

// With custom extension path
let mut gm = GraphManager::open_with_extension("./data", "/path/to/graphqlite.so")?;
}

Graph Management

#![allow(unused)]
fn main() {
// Create a new graph
let social = gm.create("social")?;

// Open an existing graph
let social = gm.open_graph("social")?;

// Open or create
let cache = gm.open_or_create("cache")?;

// List all graphs
for name in gm.list()? {
    println!("Graph: {}", name);
}

// Check if graph exists
if gm.exists("social") {
    println!("Social graph exists");
}

// Delete a graph
gm.drop("old_graph")?;
}

Cross-Graph Queries

#![allow(unused)]
fn main() {
// Query across multiple graphs using FROM clause
let result = gm.query(
    "MATCH (n:Person) FROM social RETURN n.name, graph(n) AS source",
    &["social"]
)?;

for row in &result {
    let name: String = row.get("n.name")?;
    let source: String = row.get("source")?;
    println!("{} from {}", name, source);
}
}

Raw SQL Cross-Graph Queries

#![allow(unused)]
fn main() {
let results = gm.query_sql(
    "SELECT COUNT(*) FROM social.nodes",
    &["social"]
)?;
}

Complete Multi-Graph Example

use graphqlite::graphs;

fn main() -> graphqlite::Result<()> {
    let mut gm = graphs("./data")?;

    // Create and populate graphs
    {
        let social = gm.create("social")?;
        social.query("CREATE (n:Person {name: 'Alice', user_id: 'u1'})")?;
        social.query("CREATE (n:Person {name: 'Bob', user_id: 'u2'})")?;
    }

    {
        let products = gm.create("products")?;
        products.query("CREATE (n:Product {name: 'Phone', sku: 'p1'})")?;
    }

    // List graphs
    println!("Graphs: {:?}", gm.list()?);  // ["products", "social"]

    // Cross-graph query
    let result = gm.query(
        "MATCH (n:Person) FROM social RETURN n.name ORDER BY n.name",
        &["social"]
    )?;

    for row in &result {
        println!("Person: {}", row.get::<String>("n.name")?);
    }

    // Clean up
    gm.drop("products")?;
    gm.drop("social")?;

    Ok(())
}

Error Handling

#![allow(unused)]
fn main() {
use graphqlite::{graphs, Error};

let mut gm = graphs("./data")?;

match gm.open_graph("nonexistent") {
    Ok(g) => { /* use graph */ }
    Err(Error::GraphNotFound { name, available }) => {
        println!("Graph '{}' not found. Available: {:?}", name, available);
    }
    Err(e) => { /* handle other errors */ }
}

match gm.create("existing") {
    Ok(g) => { /* use graph */ }
    Err(Error::GraphExists(name)) => {
        println!("Graph '{}' already exists", name);
    }
    Err(e) => { /* handle other errors */ }
}
}

Extension Loading

For advanced use cases, wrap an existing rusqlite connection:

#![allow(unused)]
fn main() {
use rusqlite::Connection as SqliteConnection;
use graphqlite::Connection;

let sqlite_conn = SqliteConnection::open_in_memory()?;
let conn = Connection::from_rusqlite(sqlite_conn)?;
}

Or specify a custom extension path:

#![allow(unused)]
fn main() {
let conn = Connection::open_with_extension("graph.db", "/path/to/graphqlite.so")?;
}

SQL Interface

GraphQLite works as a standard SQLite extension, providing the cypher() function.

Loading the Extension

SQLite CLI

sqlite3 graph.db
.load /path/to/graphqlite

Or with automatic extension loading:

sqlite3 -cmd ".load /path/to/graphqlite" graph.db

Programmatically

SELECT load_extension('/path/to/graphqlite');

The cypher() Function

Basic Usage

SELECT cypher('MATCH (n) RETURN n.name');

With Parameters

SELECT cypher(
    'MATCH (n:Person {name: $name}) RETURN n',
    '{"name": "Alice"}'
);

Return Format

The cypher() function returns results as JSON:

SELECT cypher('MATCH (n:Person) RETURN n.name, n.age');
-- Returns: [{"n.name": "Alice", "n.age": 30}, {"n.name": "Bob", "n.age": 25}]

Working with Results

Extract Values with JSON Functions

SELECT json_extract(value, '$.n.name') AS name
FROM json_each(cypher('MATCH (n:Person) RETURN n'));

Algorithm Results

SELECT
    json_extract(value, '$.node_id') AS id,
    json_extract(value, '$.score') AS score
FROM json_each(cypher('RETURN pageRank()'))
ORDER BY score DESC
LIMIT 10;

Join with Regular Tables

-- Assuming you have a regular 'users' table
SELECT u.email, json_extract(g.value, '$.degree')
FROM users u
JOIN json_each(cypher('RETURN degreeCentrality()')) g
    ON u.id = json_extract(g.value, '$.user_id');

Write Operations

-- Create nodes
SELECT cypher('CREATE (n:Person {name: "Alice", age: 30})');

-- Create relationships
SELECT cypher('
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS]->(b)
');

-- Update properties
SELECT cypher('
    MATCH (n:Person {name: "Alice"})
    SET n.age = 31
');

-- Delete
SELECT cypher('
    MATCH (n:Person {name: "Alice"})
    DETACH DELETE n
');

Schema Tables

GraphQLite creates these tables automatically. See Storage Model for detailed documentation.

Core Tables

SELECT * FROM nodes;
-- id (auto-increment primary key)

SELECT * FROM node_labels;
-- node_id, label

SELECT * FROM edges;
-- id, source_id, target_id, type

SELECT * FROM property_keys;
-- id, key (normalized property names)

Property Tables

Properties use key_id as a foreign key to property_keys for normalization:

SELECT * FROM node_props_text;   -- node_id, key_id, value
SELECT * FROM node_props_int;    -- node_id, key_id, value
SELECT * FROM node_props_real;   -- node_id, key_id, value
SELECT * FROM node_props_bool;   -- node_id, key_id, value

SELECT * FROM edge_props_text;   -- edge_id, key_id, value
SELECT * FROM edge_props_int;    -- edge_id, key_id, value
SELECT * FROM edge_props_real;   -- edge_id, key_id, value
SELECT * FROM edge_props_bool;   -- edge_id, key_id, value

Direct SQL Access

You can query the underlying tables directly for debugging or advanced use cases:

-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;

-- Find nodes with a specific property (join through property_keys)
SELECT n.id, pk.key, p.value
FROM nodes n
JOIN node_props_text p ON n.id = p.node_id
JOIN property_keys pk ON p.key_id = pk.id
WHERE pk.key = 'name';

-- Find all properties for a specific node
SELECT pk.key, p.value
FROM node_props_text p
JOIN property_keys pk ON p.key_id = pk.id
WHERE p.node_id = 1;

-- Find edges with their endpoint info
SELECT e.id, e.type, e.source_id, e.target_id
FROM edges e
WHERE e.type = 'KNOWS';

Transaction Support

GraphQLite respects SQLite transactions:

BEGIN;
SELECT cypher('CREATE (a:Person {name: "Alice"})');
SELECT cypher('CREATE (b:Person {name: "Bob"})');
SELECT cypher('MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) CREATE (a)-[:KNOWS]->(b)');
COMMIT;

Or rollback on error:

BEGIN;
SELECT cypher('CREATE (n:Person {name: "Test"})');
ROLLBACK;  -- Node is not created

Architecture

This document explains how GraphQLite is structured and how queries flow through the system.

High-Level Overview

┌─────────────────────────────────────────────────────────────┐
│                     SQLite Extension                        │
├─────────────────────────────────────────────────────────────┤
│  cypher() function                                          │
│      │                                                      │
│      ▼                                                      │
│  ┌─────────┐    ┌───────────┐    ┌──────────┐              │
│  │ Parser  │───▶│ Transform │───▶│ Executor │              │
│  └─────────┘    └───────────┘    └──────────┘              │
│      │              │                 │                     │
│      ▼              ▼                 ▼                     │
│  Cypher AST     SQL Query        Results                    │
└─────────────────────────────────────────────────────────────┘

Components

Parser

The parser converts Cypher query text into an Abstract Syntax Tree (AST).

Implementation: Flex (lexer) + Bison (parser)

src/backend/parser/cypher_scanner.l - Tokenizer
src/backend/parser/cypher_gram.y - Grammar
src/backend/parser/cypher_ast.c - AST construction

Transformer

The transformer converts the Cypher AST into SQL that can be executed against the graph schema.

Key files:

src/backend/transform/cypher_transform.c - Main entry point
src/backend/transform/transform_match.c - MATCH clause handling
src/backend/transform/transform_return.c - RETURN clause handling
src/backend/transform/sql_builder.c - SQL construction utilities

Executor

The executor runs the generated SQL and handles special cases like graph algorithms.

Key files:

src/backend/executor/cypher_executor.c - Main entry point
src/backend/executor/query_dispatch.c - Pattern-based routing
src/backend/executor/graph_algorithms.c - Algorithm implementations

Query Flow

1. Entry Point

The cypher() SQL function receives the query:

// In extension.c
static void graphqlite_cypher_func(sqlite3_context *context, int argc, sqlite3_value **argv) {
    const char *query = (const char *)sqlite3_value_text(argv[0]);
    // ...
}

2. Parsing

The query is tokenized and parsed:

cypher_parse_result *parse_result = parse_cypher_query_ext(query);
ast_node *ast = parse_result->root;

3. Pattern Dispatch

Instead of a giant if-else chain, queries are matched against patterns:

clause_flags flags = analyze_query_clauses(ast);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, ast, result, flags);

4. Transformation

The AST is converted to SQL using the unified SQL builder:

cypher_transform_context *ctx = create_transform_context(db);
transform_query(ctx, ast);
char *sql = sql_builder_to_string(ctx->unified_builder);

5. Execution

The SQL is executed against SQLite:

sqlite3_stmt *stmt;
sqlite3_prepare_v2(db, sql, -1, &stmt, NULL);
while (sqlite3_step(stmt) == SQLITE_ROW) {
    // Process results
}

Design Decisions

Why SQLite?

Zero configuration - single file, no server
Ubiquitous - available everywhere
Well-tested - decades of production use
Extensible - clean extension API

Why Transform to SQL?

Rather than implementing our own storage engine, we transform Cypher to SQL:

Leverage SQLite's query optimizer
Benefit from SQLite's transaction handling
Interop with regular SQL tables
Simpler implementation

Why Pattern Dispatch?

Replacing if-else chains with table-driven dispatch:

Easier to add new query patterns
Clear priority ordering
Better testability
Reduced cyclomatic complexity

Extension Loading

When the extension loads:

Register the cypher() function
Create schema tables if they don't exist
Create indexes for efficient lookups

int sqlite3_graphqlite_init(
    sqlite3 *db,
    char **pzErrMsg,
    const sqlite3_api_routines *pApi
) {
    SQLITE_EXTENSION_INIT2(pApi);
    create_graph_schema(db);
    sqlite3_create_function(db, "cypher", -1, SQLITE_UTF8, 0,
                           graphqlite_cypher_func, 0, 0);
    return SQLITE_OK;
}

Storage Model

GraphQLite uses a typed property graph model stored in regular SQLite tables. The schema is designed for query efficiency using an Entity-Attribute-Value (EAV) pattern with property key normalization.

Schema Overview

┌─────────────────────────────────────┐
│              nodes                   │
│  id (PK, auto-increment)            │
├─────────────────────────────────────┤
│  1                                  │
│  2                                  │
│  3                                  │
└─────────────────────────────────────┘
           │
           │ 1:N
           ▼
┌─────────────────────────────────────┐
│           node_labels                │
│  node_id (FK) │ label               │
├───────────────┼─────────────────────┤
│  1            │ "Person"            │
│  2            │ "Person"            │
│  3            │ "Company"           │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│           property_keys              │
│  id (PK) │ key (UNIQUE)             │
├──────────┼──────────────────────────┤
│  1       │ "name"                   │
│  2       │ "age"                    │
│  3       │ "id"                     │
└─────────────────────────────────────┘
           │
           │ 1:N (via key_id)
           ▼
┌───────────────────────────────────────────┐
│            node_props_text                 │
│  node_id (FK) │ key_id (FK) │ value       │
├───────────────┼─────────────┼─────────────┤
│  1            │ 3           │ "alice"     │
│  1            │ 1           │ "Alice"     │
│  2            │ 3           │ "bob"       │
│  2            │ 1           │ "Bob"       │
└───────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│                         edges                            │
│  id (PK) │ source_id (FK) │ target_id (FK) │ type       │
├──────────┼────────────────┼────────────────┼────────────┤
│  1       │ 1              │ 2              │ "KNOWS"    │
│  2       │ 1              │ 3              │ "WORKS_AT" │
└─────────────────────────────────────────────────────────┘

Core Tables

nodes

The nodes table stores graph nodes with a simple auto-incrementing ID. Node metadata such as labels and properties are stored in separate tables, enabling nodes to have multiple labels and efficient property queries.

Column	Type	Description
`id`	INTEGER PRIMARY KEY AUTOINCREMENT	Internal node identifier

node_labels

Labels are stored in a separate table allowing nodes to have multiple labels. This normalized design enables efficient label-based filtering through indexed lookups.

Column	Type	Description
`node_id`	INTEGER FK → nodes(id)	References the node
`label`	TEXT	Label name (e.g., "Person")

The primary key is the composite (node_id, label), which prevents duplicate labels on the same node.

edges

The edges table stores relationships between nodes with a required relationship type.

Column	Type	Description
`id`	INTEGER PRIMARY KEY AUTOINCREMENT	Internal edge identifier
`source_id`	INTEGER FK → nodes(id)	Source node
`target_id`	INTEGER FK → nodes(id)	Target node
`type`	TEXT NOT NULL	Relationship type (e.g., "KNOWS")

Foreign keys use ON DELETE CASCADE so removing a node automatically removes its edges.

property_keys

Property names are normalized into a lookup table to reduce storage overhead and improve query performance. Instead of storing the property name string with every property value, we store a small integer key ID.

Column	Type	Description
`id`	INTEGER PRIMARY KEY AUTOINCREMENT	Property key identifier
`key`	TEXT UNIQUE	Property name (e.g., "name", "age")

Property Tables

Properties are stored in separate tables by type. This approach enables type-safe queries, efficient indexing by value, and proper numeric comparisons without type conversion overhead.

Node property tables:

node_props_text — String values
node_props_int — Integer values
node_props_real — Floating-point values
node_props_bool — Boolean values (stored as 0 or 1)

Edge property tables:

edge_props_text
edge_props_int
edge_props_real
edge_props_bool

Each property table has the same structure:

Column	Type	Description
`node_id` / `edge_id`	INTEGER FK	References the owner entity
`key_id`	INTEGER FK → property_keys(id)	References the property name
`value`	(varies by table)	The property value

The primary key is the composite (node_id, key_id) or (edge_id, key_id), ensuring each entity has at most one value per property.

Indexes

GraphQLite creates indexes optimized for common graph query patterns:

-- Edge traversal (covers both directions and type filtering)
CREATE INDEX idx_edges_source ON edges(source_id, type);
CREATE INDEX idx_edges_target ON edges(target_id, type);
CREATE INDEX idx_edges_type ON edges(type);

-- Label filtering
CREATE INDEX idx_node_labels_label ON node_labels(label, node_id);

-- Property key lookup
CREATE INDEX idx_property_keys_key ON property_keys(key);

-- Property value queries (enables efficient WHERE clauses)
CREATE INDEX idx_node_props_text_key_value ON node_props_text(key_id, value, node_id);
CREATE INDEX idx_node_props_int_key_value ON node_props_int(key_id, value, node_id);
-- ... similar for other property tables

The property indexes are designed "key-first" to efficiently satisfy queries like WHERE n.name = 'Alice', which translate to lookups by key_id and value.

Why This Design?

Typed property tables provide several advantages over storing all properties as JSON or a single TEXT column. Integer comparisons are performed natively rather than through string parsing. Type-specific indexes enable efficient range queries. Storage is more compact since values don't require type metadata.

Property key normalization through the property_keys table reduces storage by replacing repeated property name strings with integer IDs. This also enables efficient property-first queries and simplifies schema introspection.

Separate label table allows nodes to have multiple labels, which is a common requirement in graph modeling. The label index supports efficient label-based filtering without scanning all nodes.

Query Translation

When you write:

MATCH (p:Person {name: 'Alice'})
WHERE p.age > 25
RETURN p.name, p.age

GraphQLite translates this to SQL that joins the appropriate tables:

SELECT
    name_prop.value AS "p.name",
    age_prop.value AS "p.age"
FROM nodes p
JOIN node_labels p_label ON p.id = p_label.node_id AND p_label.label = 'Person'
LEFT JOIN node_props_text name_prop
    ON p.id = name_prop.node_id
    AND name_prop.key_id = (SELECT id FROM property_keys WHERE key = 'name')
LEFT JOIN node_props_int age_prop
    ON p.id = age_prop.node_id
    AND age_prop.key_id = (SELECT id FROM property_keys WHERE key = 'age')
WHERE name_prop.value = 'Alice'
    AND age_prop.value > 25

In practice, the query optimizer uses cached prepared statements for property key lookups, making this translation efficient.

Direct SQL Access

You can query the underlying tables directly for advanced use cases:

-- Count nodes by label
SELECT label, COUNT(*) FROM node_labels GROUP BY label;

-- Find all properties of a specific node
SELECT pk.key, 'text' as type, pt.value
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
WHERE pt.node_id = 1
UNION ALL
SELECT pk.key, 'int' as type, CAST(pi.value AS TEXT)
FROM node_props_int pi
JOIN property_keys pk ON pi.key_id = pk.id
WHERE pi.node_id = 1;

-- Find nodes with a specific property value
SELECT nl.node_id, nl.label, pt.value as name
FROM node_props_text pt
JOIN property_keys pk ON pt.key_id = pk.id
JOIN node_labels nl ON pt.node_id = nl.node_id
WHERE pk.key = 'name' AND pt.value = 'Alice';

Query Pattern Dispatch System

GraphQLite uses a table-driven pattern dispatch system to execute Cypher queries. This document describes how the system works and how to extend it.

Overview

Instead of a massive if-else chain checking clause combinations, queries are matched against a registry of patterns. Each pattern defines:

Required clauses: Must all be present
Forbidden clauses: Must all be absent
Priority: Higher priority patterns are checked first
Handler: Function to execute the query

Supported Query Patterns

Pattern	Required	Forbidden	Priority	Description
`UNWIND+CREATE`	UNWIND, CREATE	RETURN, MATCH	100	Batch node/edge creation
`WITH+MATCH+RETURN`	WITH, MATCH, RETURN	-	100	Subquery pipeline
`MATCH+CREATE+RETURN`	MATCH, CREATE, RETURN	-	100	Match then create with results
`MATCH+SET`	MATCH, SET	-	90	Update matched nodes
`MATCH+DELETE`	MATCH, DELETE	-	90	Delete matched nodes
`MATCH+REMOVE`	MATCH, REMOVE	-	90	Remove properties/labels
`MATCH+MERGE`	MATCH, MERGE	-	90	Conditional create/match
`MATCH+CREATE`	MATCH, CREATE	RETURN	90	Match then create
`OPTIONAL_MATCH+RETURN`	MATCH, OPTIONAL, RETURN	CREATE, SET, DELETE, MERGE	80	Left join pattern
`MULTI_MATCH+RETURN`	MATCH, MULTI_MATCH, RETURN	CREATE, SET, DELETE, MERGE	80	Multiple match clauses
`MATCH+RETURN`	MATCH, RETURN	OPTIONAL, MULTI_MATCH, CREATE, SET, DELETE, MERGE	70	Simple query
`UNWIND+RETURN`	UNWIND, RETURN	CREATE	60	List processing
`CREATE`	CREATE	MATCH, UNWIND	50	Create nodes/edges
`MERGE`	MERGE	MATCH	50	Merge nodes/edges
`SET`	SET	MATCH	50	Standalone set
`FOREACH`	FOREACH	-	50	Iterate and update
`MATCH`	MATCH	RETURN, CREATE, SET, DELETE, MERGE, REMOVE	40	Match without return
`RETURN`	RETURN	MATCH, UNWIND, WITH	10	Expressions, graph algorithms
`GENERIC`	-	-	0	Fallback for any query

How Pattern Matching Works

Analyze: Extract clause flags from query AST
Match: Find highest-priority pattern where:
- All required flags are present
- No forbidden flags are present
Execute: Call the pattern's handler function

clause_flags flags = analyze_query_clauses(query);
const query_pattern *pattern = find_matching_pattern(flags);
return pattern->handler(executor, query, result, flags);

Debugging

Debug Logging

With GRAPHQLITE_DEBUG defined, pattern matching logs:

[CYPHER_DEBUG] Query clauses: MATCH|RETURN
[CYPHER_DEBUG] Matched pattern: MATCH+RETURN (priority 70)

EXPLAIN Command

Use EXPLAIN to see pattern info without executing:

SELECT cypher('EXPLAIN MATCH (n:Person) RETURN n.name');

Output:

Pattern: MATCH+RETURN
Clauses: MATCH|RETURN
SQL: SELECT ... FROM nodes ...

Adding New Patterns

Step 1: Define the Pattern

Add an entry to the patterns[] array in query_dispatch.c:

{
    .name = "MY_PATTERN",
    .required = CLAUSE_MATCH | CLAUSE_CUSTOM,
    .forbidden = CLAUSE_DELETE,
    .handler = handle_my_pattern,
    .priority = 85
}

Step 2: Implement the Handler

static int handle_my_pattern(cypher_executor *executor,
                             cypher_query *query,
                             cypher_result *result,
                             clause_flags flags)
{
    (void)flags;
    CYPHER_DEBUG("Executing MY_PATTERN via pattern dispatch");

    // Implementation here

    result->success = true;
    return 0;
}

Step 3: Add Tests

Add tests to test_query_dispatch.c:

static void test_pattern_my_pattern(void)
{
    const query_pattern *p = find_matching_pattern(CLAUSE_MATCH | CLAUSE_CUSTOM);
    CU_ASSERT_PTR_NOT_NULL(p);
    if (p) {
        CU_ASSERT_STRING_EQUAL(p->name, "MY_PATTERN");
        CU_ASSERT_EQUAL(p->priority, 85);
    }
}

Priority Guidelines

Priority	Use Case
100	Most specific multi-clause combinations
90	MATCH + write operation patterns
80	Complex read patterns (OPTIONAL, multi-MATCH)
70	Simple read patterns
50-60	Standalone clauses with modifiers
40-50	Standalone write clauses
10	Expressions and algorithms
0	Generic fallback

Files

src/include/executor/query_patterns.h - Types and API
src/backend/executor/query_dispatch.c - Pattern registry and handlers
tests/test_query_dispatch.c - Unit tests

Graph Algorithm Handling

Graph algorithms (PageRank, Dijkstra, etc.) are detected within the RETURN pattern handler. When a RETURN-only query contains a graph algorithm function call, it's executed via the C-based algorithm implementations for performance.

Performance

This document covers GraphQLite's performance characteristics and optimization strategies.

Benchmarks

Benchmarks on Apple M1 Max (10 cores, 64GB RAM).

Insertion Performance

Nodes	Edges	Time	Rate
100K	500K	445ms	1.3M/s
500K	2.5M	2.30s	1.3M/s
1M	5.0M	5.16s	1.1M/s

Traversal by Topology

Topology	Nodes	Edges	1-hop	2-hop
Chain	100K	99K	<1ms	<1ms
Sparse	100K	500K	<1ms	<1ms
Moderate	100K	2.0M	<1ms	2ms
Dense	100K	5.0M	<1ms	9ms
Normal dist.	100K	957K	<1ms	1ms
Power-law	100K	242K	<1ms	<1ms
Moderate	500K	10.0M	1ms	2ms
Moderate	1M	20.0M	<1ms	2ms

Deep Hop Traversal

Traversal time is independent of graph size - it scales only with the number of paths found.

Hops	Paths Found	Time
1-3	5-125	<1ms
4	625	2ms
5	3,125	12ms
6	15,625	58ms

Path count grows as degree^hops. With average degree 5, expect 5^n paths at n hops.

Graph Algorithms

Algorithm	Nodes	Edges	Time
PageRank	100K	500K	148ms
Label Propagation	100K	500K	154ms
PageRank	500K	2.5M	953ms
Label Propagation	500K	2.5M	811ms
PageRank	1M	5.0M	37.81s
Label Propagation	1M	5.0M	40.21s

Cypher Query Performance

Query Type	G(100K, 500K)	G(500K, 2.5M)	G(1M, 5M)
Node lookup	<1ms	1ms	<1ms
1-hop	<1ms	<1ms	<1ms
2-hop	<1ms	<1ms	<1ms
3-hop	1ms	1ms	1ms
Filter scan	341ms	1.98s	3.79s
MATCH all	360ms	2.05s	3.98s

Optimization Strategies

Use Indexes Effectively

GraphQLite creates indexes on:

nodes(user_id) - Fast node lookup by ID
nodes(label) - Fast filtering by label
edges(source_id), edges(target_id) - Fast traversal
Property tables on (node_id, key) - Fast property access

Queries that leverage these indexes are fast.

Limit Variable-Length Paths

Variable-length paths can be expensive:

-- Expensive: unlimited depth
MATCH (a)-[*]->(b) RETURN b

-- Better: limit depth
MATCH (a)-[*1..3]->(b) RETURN b

Use Specific Labels

Labels help filter early:

-- Slower: scan all nodes
MATCH (n) WHERE n.type = 'Person' RETURN n

-- Faster: use label
MATCH (n:Person) RETURN n

Batch Operations

For bulk inserts, use batch methods:

# Slow: individual inserts
for person in people:
    g.upsert_node(person["id"], person, label="Person")

# Fast: batch insert
nodes = [(p["id"], p, "Person") for p in people]
g.upsert_nodes_batch(nodes)

Graph Caching

GraphQLite can cache the graph structure in memory using a Compressed Sparse Row (CSR) format, providing 1.5-2x speedup for graph algorithms by eliminating repeated SQLite I/O.

SQL Interface

-- Load graph into memory cache
SELECT gql_load_graph();
-- Returns: {"status":"loaded","nodes":1000,"edges":5000}

-- Check if cache is loaded
SELECT gql_graph_loaded();
-- Returns: {"loaded":true,"nodes":1000,"edges":5000}

-- Reload cache after graph modifications
SELECT gql_reload_graph();

-- Free cache memory
SELECT gql_unload_graph();

Python Interface

from graphqlite import graph

g = graph(":memory:")
# ... build graph ...

# Load cache for fast algorithm execution
g.load_graph()  # {"status": "loaded", "nodes": 1000, "edges": 5000}

# Run algorithms (all use cached graph)
g.pagerank()
g.community_detection()
g.degree_centrality()

# After modifying graph, reload cache
g.upsert_node("new_node", {}, "Person")
g.reload_graph()

# Free memory when done
g.unload_graph()

Rust Interface

#![allow(unused)]
fn main() {
use graphqlite::Graph;

let g = Graph::open_in_memory()?;
// ... build graph ...

// Load cache
let status = g.load_graph()?;
println!("Loaded {} nodes", status.nodes.unwrap_or(0));

// Run algorithms with cache
// ... algorithms use cache automatically ...

// Check status
if g.graph_loaded()? {
    g.unload_graph()?;
}
}

Cache Performance

Benchmarks on Apple M1 Max with graph caching enabled:

Nodes	Edges	Algorithm	Uncached	Cached	Speedup
10K	50K	PageRank	13ms	7ms	1.8x
10K	50K	Label Prop	13ms	7ms	1.8x
100K	500K	PageRank	151ms	91ms	1.6x
100K	500K	Label Prop	151ms	87ms	1.7x
500K	2.5M	PageRank	858ms	420ms	2.0x
500K	2.5M	Label Prop	863ms	412ms	2.0x

When to use caching:

Running multiple algorithms on the same graph
Repeated analysis workflows
Interactive exploration where graph doesn't change

When NOT to use caching:

Single algorithm call (cache load overhead may exceed benefit)
Frequently modified graphs (requires reload after each change)
Memory-constrained environments

Result Caching

For application-level caching of algorithm results:

import functools

@functools.lru_cache(maxsize=1)
def get_pagerank():
    return g.pagerank()

Memory Usage

GraphQLite uses SQLite's memory management. Key factors:

Page cache: SQLite caches database pages in memory
Algorithm scratch space: Algorithms allocate temporary structures
Result buffers: Query results are buffered before returning

For large graphs, consider:

# Increase SQLite page cache (default: 2MB)
conn.execute("PRAGMA cache_size = -64000")  # 64MB

Running Benchmarks

Run benchmarks on your hardware:

# Full performance suite
./tests/performance/run_all_perf.sh full

# Cache comparison benchmark
./tests/performance/perf_cache_comparison.sh full

# Quick cache test
sqlite3 :memory: < tests/performance/perf_cache.sql

Available benchmark modes:

quick - Fast smoke test (~30s)
standard - Default benchmarks (~3min)
full - Comprehensive benchmarks (~10min)

Benchmarks cover:

Insertion performance
Traversal across topologies (chain, tree, sparse, dense, power-law)
Algorithm performance (PageRank, Label Propagation, etc.)
Query performance (lookup, hop traversals, filters)
Cache performance (uncached vs cached algorithms)

GraphQLite Documentation