How Parsing Works¶
plissken uses different parsing strategies for Rust and Python, each optimized for the language's tooling ecosystem.
Rust Parsing¶
Rust files are parsed using the syn crate, which
provides a complete abstract syntax tree (AST). This is the same parser
used by procedural macros in the Rust ecosystem.
Why syn?¶
syn is the standard Rust AST parser — the same library used by procedural
macros. Using it means plissken parses Rust code with the same fidelity as
the Rust compiler itself. This was a deliberate choice over simpler
approaches like regex or tree-sitter-rust because:
- It handles all Rust syntax correctly, including complex generics, lifetimes, and where clauses
- It provides typed access to attributes like
#[pyclass(name = "...")], which is essential for cross-reference extraction - It's battle-tested across the entire Rust ecosystem
The parser extracts all public item types (structs, enums, functions, traits, impl blocks, consts, type aliases). For the full list of extracted fields, see the Data Model Reference.
Doc Comment Design¶
Doc comments (/// and //!) are parsed into structured ParsedDocstring
objects. The parser maps Rust's conventional # Section headers to
structured fields — for example, # Arguments becomes a params list and
# Errors becomes a raises list. This means Rust doc comments and Python
docstrings produce the same structured output, enabling consistent rendering
regardless of the source language.
The # Panics section is treated as a variant of # Errors (with type
"panic") because both represent failure modes that users need to know about.
# Safety is folded into the description rather than getting its own field,
since it's prose rather than structured data.
Generic Parameters¶
Generics are preserved as strings, including:
- Type parameters:
<T: Clone> - Lifetime parameters:
<'a> - Const generics:
<const N: usize> - Complex bounds:
<T: Iterator<Item = &'a str> + Send>
PyO3 Detection¶
The Rust parser doesn't just extract documentation — it also recognizes
PyO3 attributes (#[pyclass], #[pyfunction], #[pymethods]) and
captures their arguments (like name and module). This metadata is what
enables the cross-reference system to link Python items back to their Rust
implementations. The detection happens at parse time rather than as a
separate pass, because syn makes attribute inspection trivial when you
already have the full AST.
Module Path Construction¶
File paths are converted to Rust module paths:
| File Path | Module Path |
|---|---|
src/lib.rs |
mycrate |
src/utils.rs |
mycrate::utils |
src/net/mod.rs |
mycrate::net |
src/net/tcp.rs |
mycrate::net::tcp |
The crate name is read from the [package].name field in the crate's
Cargo.toml.
Python Parsing¶
Python files are parsed using
tree-sitter with the
tree-sitter-python grammar. tree-sitter produces a concrete syntax tree
(CST) that preserves all syntactic detail.
Why tree-sitter?¶
Python can't be parsed with syn (obviously), so plissken needs a
different parser. tree-sitter was chosen over Python's built-in ast
module for a key reason: plissken is a Rust binary and doesn't require a
Python runtime to be installed. Using Python's ast module would mean
either embedding a Python interpreter or shelling out to a Python process,
both of which add complexity and a runtime dependency.
tree-sitter provides a fast, dependency-free parser implemented in C with Rust bindings. It produces a concrete syntax tree (CST) that preserves enough syntactic detail to extract classes, functions, decorators, type annotations, and docstrings. The trade-off is that tree-sitter doesn't do semantic analysis (like resolving imports), but plissken doesn't need that — it only needs syntactic structure.
Docstring Format Detection¶
plissken supports two docstring formats:
Google style (detected by Args:, Returns:, Raises:, Example: markers):
def func(x):
"""Summary line.
Extended description.
Args:
x: Description of x.
Returns:
Description of return value.
Raises:
ValueError: When x is invalid.
Examples:
>>> func(1)
2
"""
NumPy style (detected by underlined section headers):
def func(x):
"""
Summary line.
Parameters
----------
x : int
Description of x.
Returns
-------
int
Description of return value.
"""
If neither format is detected, the entire docstring is treated as the summary/description with no structured sections.
Type Information Strategy¶
Types come from two sources: function signature annotations and docstring type annotations. Signature annotations take precedence because they're machine-checked (by mypy, pyright, etc.) and therefore more reliable. Docstring types are a fallback for legacy code that predates PEP 484.
This dual-source approach means plissken produces useful type information for both modern annotated code and older codebases that only have types in docstrings.
Module Path Construction¶
File paths are converted to dotted Python module paths:
| File Path | Module Path |
|---|---|
mypackage/__init__.py |
mypackage |
mypackage/utils.py |
mypackage.utils |
mypackage/net/__init__.py |
mypackage.net |
mypackage/net/tcp.py |
mypackage.net.tcp |
Error Handling¶
Both parsers are designed to be resilient:
-
Non-fatal errors: If a file fails to parse, plissken emits a structured warning and continues with the remaining files. This means a single syntax error doesn't block the entire documentation build.
-
Partial extraction: Even within a successfully parsed file, if a specific item can't be fully extracted (e.g., a complex generic bound), the item is included with whatever information was captured.
-
Warning format: Parse warnings follow a consistent format:
Performance¶
-
Rust parsing is single-threaded and processes files sequentially.
synis fast enough that even large crates (thousands of lines) parse in milliseconds. -
Python parsing uses tree-sitter's incremental parser, though plissken currently parses each file independently. tree-sitter is implemented in C with Rust bindings, making it very fast.
-
File discovery uses
walkdirfor efficient directory traversal with configurable skip lists to avoid scanning virtual environments, caches, and build artifacts.