Performance Tuning

This guide explains how to tune Cloacina’s configuration for different workload profiles. All values referenced here come from DefaultRunnerConfig and the scheduler internals – nothing is fabricated.

1. Concurrency Tuning

The max_concurrent_tasks setting (default: 4) controls how many tasks can execute simultaneously across all active workflows. This is the single most impactful knob for throughput.

How to size based on workload

Workload Type	Recommended Setting	Rationale
CPU-bound (ML inference, image processing)	CPU cores	Exceeding core count causes context switching without gain
IO-bound (API calls, DB queries, file reads)	CPU cores x 2	Tasks spend most time waiting, so oversubscription is beneficial
Mixed	CPU cores x 1.5	Balance between CPU saturation and IO wait utilization

Slot management and deferred tasks

The executor uses a semaphore-based slot system. When a task completes (success or failure), its slot is released immediately. Tasks in Ready state wait for available slots before dispatching. The scheduler pushes tasks via the dispatcher as soon as dependencies are satisfied and a slot is available.

Key insight: if you have workflows with sequential task chains (A -> B -> C), a single workflow only occupies one slot at a time. Concurrency helps when you run many workflows simultaneously or when workflows have parallel fan-out stages.

Practical example

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// For an 8-core machine running IO-bound tasks:
let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(16)  // 8 cores x 2
    .build();

// For an 8-core machine running CPU-bound tasks:
let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(8)   // match core count
    .build();

Observed scaling behavior

From the performance test suite (SQLite backend):

Near-linear scaling up to 8 workers with ~89% efficiency
16.6x throughput improvement at 32x concurrency increase (diminishing returns)
Performance degradation at 64 workers due to database contention
Optimal efficiency in the 4-8 range for SQLite; higher for PostgreSQL

2. Scheduler Optimization

The scheduler_poll_interval (default: 100ms) controls how frequently the scheduler loop checks for active pipelines, evaluates task readiness, and dispatches ready tasks to executors.

The latency vs CPU tradeoff

Interval	Use Case	Tradeoff
10-50ms	Real-time/interactive workflows	Lower latency, higher CPU and DB query load
100ms (default)	General purpose	Good balance for most workloads
250-500ms	Batch processing, low-priority jobs	Minimal overhead, higher scheduling latency
1000ms+	Background maintenance workflows	Very low overhead, seconds of scheduling delay

What happens each poll cycle

Each scheduler tick:

Queries for all active pipeline executions
Batch-loads pending tasks across all active pipelines
Groups tasks by pipeline and evaluates trigger rules
Marks tasks as Ready when dependencies are satisfied
Dispatches all Ready tasks to executors via the dispatcher

The scheduler implements exponential backoff on errors (up to 30 seconds max) with a circuit-breaker pattern that logs a warning after 5 consecutive failures.

Configuration

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Low-latency: check every 50ms
let config = DefaultRunnerConfig::builder()
    .scheduler_poll_interval(Duration::from_millis(50))
    .build();

// Batch processing: check every 500ms
let config = DefaultRunnerConfig::builder()
    .scheduler_poll_interval(Duration::from_millis(500))
    .build();

Impact on database query frequency

At 100ms poll interval with N active pipelines, expect roughly 10 query batches per second. Each batch involves:

1 query for active executions
1 batch query for pending tasks across all pipelines
N completion checks (one per pipeline)
Variable Ready-task dispatch queries

Increasing the interval proportionally reduces this load.

3. Database Performance

Pool sizing

The db_pool_size (default: 10) controls the number of connections in the pool. Size this based on:

db_pool_size >= max_concurrent_tasks + background_services + headroom

Background services that consume connections:

Scheduler loop: 1-2 connections per tick
Cron scheduler: 1 connection per poll
Registry reconciler: 1 connection per reconcile cycle
Stale claim sweeper: 1 connection per sweep
Recovery manager: 1 connection during recovery

A practical formula:

use cloacina::runner::DefaultRunnerConfig;

// For 16 concurrent tasks:
let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(16)
    .db_pool_size(25)  // 16 tasks + 5 background + 4 headroom
    .build();

PostgreSQL vs SQLite performance

Characteristic	SQLite	PostgreSQL
Concurrent writes	Serialized (single-writer lock)	MVCC (multiple concurrent writers)
Optimal concurrency	4-8 workers	16-64 workers
Connection overhead	Minimal (file-based)	Per-connection memory (~5-10MB)
Network latency	None (in-process)	Present (even on localhost)
WAL mode	Enabled by default in URLs	N/A (uses MVCC)
Best for	Development, single-node, low concurrency	Production, multi-node, high concurrency

SQLite WAL configuration

Cloacina SQLite URLs should include WAL mode and busy timeout:

sqlite:///path/to/cloacina.db?_journal_mode=WAL&_busy_timeout=5000

WAL mode allows concurrent readers during writes. The busy timeout (5000ms) prevents immediate SQLITE_BUSY errors under contention.

Connection pool behavior

The pool uses deadpool with the following behavior:

Connections are created lazily up to db_pool_size
Idle connections are reused (no connection-per-query overhead)
Pool exhaustion blocks the caller until a connection is returned
PostgreSQL connections support schema-based isolation (SET search_path)

If you see pool exhaustion warnings in logs, increase db_pool_size or reduce max_concurrent_tasks.

4. Timeout Configuration

Task timeout

The task_timeout (default: 300 seconds / 5 minutes) is the maximum wall-clock time for a single task execution. If exceeded, the task is marked as failed and eligible for retry according to its retry policy.

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Short timeout for fast tasks (API calls)
let config = DefaultRunnerConfig::builder()
    .task_timeout(Duration::from_secs(30))
    .build();

// Long timeout for ML training tasks
let config = DefaultRunnerConfig::builder()
    .task_timeout(Duration::from_secs(1800))  // 30 minutes
    .build();

Pipeline timeout

The pipeline_timeout (default: Some(3600 seconds / 1 hour)) is the maximum time for an entire workflow execution from start to finish. Set to None to disable.

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Strict pipeline timeout for SLA-bound workflows
let config = DefaultRunnerConfig::builder()
    .pipeline_timeout(Some(Duration::from_secs(600)))  // 10 minutes total
    .build();

// No pipeline timeout (tasks still have individual timeouts)
let config = DefaultRunnerConfig::builder()
    .pipeline_timeout(None)
    .build();

Interaction with retries

When a task times out:

The task is marked as Failed
If the retry policy allows (attempt < max_attempts), it transitions back to Ready with an exponential backoff delay (retry_at timestamp)
The scheduler picks it up on the next poll after retry_at passes
Each retry attempt counts toward the pipeline timeout

If pipeline_timeout expires while a task is waiting for retry, the entire pipeline is failed. Size your pipeline timeout to accommodate worst-case retry scenarios:

pipeline_timeout >= (task_timeout * max_attempts * num_sequential_tasks) + scheduling_overhead

5. Cron Scheduling Performance

Poll interval

The cron_poll_interval (default: 30 seconds) controls how often the cron scheduler evaluates all registered schedules and launches due executions.

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Sub-minute precision needed
let config = DefaultRunnerConfig::builder()
    .cron_poll_interval(Duration::from_secs(10))
    .build();

// Only hourly schedules, minimize DB load
let config = DefaultRunnerConfig::builder()
    .cron_poll_interval(Duration::from_secs(60))
    .build();

Catchup execution limits

The cron_max_catchup_executions (default: usize::MAX, effectively unlimited) limits how many missed executions are launched after downtime. Without a limit, a schedule that runs every minute would launch 1,440 catchup executions after 24 hours of downtime.

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Limit catchup to prevent recovery storms
let config = DefaultRunnerConfig::builder()
    .cron_max_catchup_executions(5)
    .build();

// Disable catchup entirely (only run future occurrences)
let config = DefaultRunnerConfig::builder()
    .cron_max_catchup_executions(0)
    .build();

Cron recovery settings

The cron recovery system detects and re-runs lost executions:

Parameter	Default	Purpose
`cron_enable_recovery`	`true`	Enable/disable recovery entirely
`cron_recovery_interval`	300s (5 min)	How often to scan for lost executions
`cron_lost_threshold_minutes`	10	Minutes before an execution is considered lost
`cron_max_recovery_age`	86400s (24h)	Ignore executions older than this
`cron_max_recovery_attempts`	3	Give up after this many recovery retries

For high-frequency schedules (every minute or less), ensure cron_lost_threshold_minutes is larger than the expected execution duration to avoid false positives.

Database impact of high-frequency schedules

Each cron poll:

Queries all registered schedules
Checks last execution time for each
Creates new workflow executions for due schedules
Writes catchup records if behind

With 100 registered schedules at 30-second poll interval, expect ~3-4 queries per second. Scale db_pool_size accordingly.

6. Registry and Reconciler

Reconciliation interval

The registry_reconcile_interval (default: 60 seconds) controls how often the reconciler scans for new or updated workflow packages.

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

// Faster hot-reload during development
let config = DefaultRunnerConfig::builder()
    .registry_reconcile_interval(Duration::from_secs(10))
    .build();

// Production: packages change infrequently
let config = DefaultRunnerConfig::builder()
    .registry_reconcile_interval(Duration::from_secs(300))  // 5 minutes
    .build();

Startup reconciliation

The registry_enable_startup_reconciliation (default: true) causes the runner to scan and load all packages on startup before accepting work. This ensures all workflows are available immediately but adds startup time proportional to the number of packages.

To reduce startup time in environments where packages are loaded lazily:

use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .registry_enable_startup_reconciliation(false)
    .build();

Storage backend performance

Backend	Read Perf	Write Perf	Best For
`"filesystem"` (default)	Fast (direct I/O)	Fast (file write)	Single-node, simple deployments
`"sqlite"`	Fast (indexed)	Moderate (WAL writes)	Single-node with transactional guarantees
`"postgres"`	Moderate (network)	Moderate (network)	Multi-node shared registry

For filesystem storage, set registry_storage_path to a fast local disk (SSD/NVMe), not a network mount:

use std::path::PathBuf;
use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .registry_storage_backend("filesystem")
    .registry_storage_path(Some(PathBuf::from("/var/lib/cloacina/registry")))
    .build();

7. Computation Graph Performance

Computation graphs run as long-lived reactive pipelines separate from the workflow scheduler. Their performance characteristics differ from request-response workflows. Workflows are batch-oriented with database-backed state; computation graphs are streaming with in-memory state. Workflow tuning focuses on database throughput; CG tuning focuses on channel capacity and reactor latency.

Boundary channel capacity

The boundary channel between accumulators and the reactor defaults to 256 slots (set in the Graph Scheduler’s load_graph method). The merge channel within each accumulator defaults to 1024 slots (AccumulatorRuntimeConfig::default()).

If accumulators produce data faster than the reactor processes it, the channel fills and backpressure slows the producers. Signs of channel saturation:

Accumulator health transitioning to Disconnected
Increasing memory usage under sustained load
Events arriving late at the reactor

To handle high-throughput sources, increase merge_channel_capacity:

use cloacina::computation_graph::accumulator::AccumulatorRuntimeConfig;

let config = AccumulatorRuntimeConfig {
    merge_channel_capacity: 4096,  // 4x default for high-volume sources
};

Batch accumulator tuning

Batch accumulators buffer events and flush on signal, timer, or size threshold:

Parameter	Default	Impact
`flush_interval`	None	Timer-based flush (e.g., every 100ms)
`max_buffer_size`	None	Size-triggered flush (e.g., every 1000 events)

For high-throughput streams, set both to bound memory and latency:

use std::time::Duration;
use cloacina::computation_graph::accumulator::BatchAccumulatorConfig;

let config = BatchAccumulatorConfig {
    flush_interval: Some(Duration::from_millis(100)),
    max_buffer_size: Some(1000),
};

Reactor execution strategy

The InputStrategy affects how the reactor processes boundaries:

Latest: One slot per source, overwritten on each update. Fires with the freshest data. Best for real-time dashboards where only the latest value matters.
Sequential: Boundaries preserved in order, one execution per boundary. Best for event sourcing where every boundary must be processed.

Latest is more performant under high load because it coalesces rapid updates. Sequential guarantees no boundary is skipped but can fall behind if the graph function is slower than the incoming rate.

Graph compilation overhead

Graph compilation (converting the DSL into a CompiledGraphFn) is a one-time cost at load time. The compiled function is a simple closure over the input cache – there is no per-execution compilation overhead. For packaged graphs, compilation happens during reconciliation, not at runtime.

Reactor health polling

The reactor polls accumulator health every 100ms during startup gating (waiting for all accumulators to become Live). After going Live, the degraded-mode monitor polls every 1 second to detect disconnected accumulators. These intervals are not configurable but have negligible overhead.

8. Monitoring Performance

Key metrics to watch

Cloacina emits the following metrics via the metrics crate:

Metric	Type	Labels	What it tells you
`cloacina_tasks_total`	Counter	`status=completed\|failed`	Task completion rates
`cloacina_pipelines_total`	Counter	`status=completed\|failed`	Workflow completion rates

Monitor these alongside system metrics:

Queue depth: Number of workflows in Pending + tasks in Ready state (query the database)
Execution latency: Time from workflow submission to completion
Pool utilization: Connection pool wait times (available via deadpool metrics)
Scheduler loop duration: Time spent in each poll cycle (visible at debug log level)

Log levels for performance diagnosis

# Production: minimal overhead
RUST_LOG=cloacina=info

# Diagnose scheduling delays
RUST_LOG=cloacina::execution_planner=debug

# Diagnose task execution issues
RUST_LOG=cloacina::executor=debug

# Full detail (significant overhead -- never in production)
RUST_LOG=cloacina=trace

The debug level on the scheduler loop logs each cycle’s outcome (“Scheduling loop completed successfully”), which is useful for measuring actual poll timing but adds ~1KB/s of log volume at 100ms intervals.

OpenTelemetry overhead

If using the metrics crate with an OpenTelemetry exporter:

Counter increments are effectively free (atomic add)
Histogram observations add minor allocation for bucket sorting
Export intervals control network overhead – use 10-30 second export intervals in production
The tracing spans on the runner (runner_task) add ~200ns per span creation

9. Production Recommendations

Low-latency profile (target: <100ms scheduling delay)

For interactive workflows where users wait for results:

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(16)
    .scheduler_poll_interval(Duration::from_millis(25))
    .task_timeout(Duration::from_secs(30))
    .pipeline_timeout(Some(Duration::from_secs(120)))
    .db_pool_size(30)
    .enable_cron_scheduling(false)       // disable if not needed
    .enable_registry_reconciler(false)   // load workflows at startup only
    .enable_claiming(true)
    .heartbeat_interval(Duration::from_secs(5))
    .build();

Requirements:

PostgreSQL (SQLite serializes writes, adding latency under load)
SSD-backed storage for the database
Network latency to PostgreSQL < 1ms (co-located or same availability zone)

High-throughput profile (target: >1000 workflows/min)

For batch-style processing with many concurrent workflows:

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(32)
    .scheduler_poll_interval(Duration::from_millis(50))
    .task_timeout(Duration::from_secs(300))
    .pipeline_timeout(Some(Duration::from_secs(3600)))
    .db_pool_size(50)
    .cron_poll_interval(Duration::from_secs(30))
    .cron_max_catchup_executions(10)
    .registry_reconcile_interval(Duration::from_secs(120))
    .enable_claiming(true)
    .heartbeat_interval(Duration::from_secs(10))
    .build();
// Note: stale_claim_sweep_interval (default 30s) and stale_claim_threshold
// (default 60s) are struct defaults, not builder methods.

Requirements:

PostgreSQL with connection pooling (PgBouncer in transaction mode)
Database connection limit >= (num_runners x db_pool_size) + application connections
Horizontal scaling: run multiple runner instances with enable_claiming(true) for work distribution

Batch processing profile (minimize resource usage)

For nightly/weekly batch jobs where latency does not matter:

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(4)
    .scheduler_poll_interval(Duration::from_millis(500))
    .task_timeout(Duration::from_secs(1800))       // 30 min per task
    .pipeline_timeout(Some(Duration::from_secs(14400)))  // 4 hours total
    .db_pool_size(10)
    .cron_poll_interval(Duration::from_secs(60))
    .cron_max_catchup_executions(1)     // only catch up one missed run
    .registry_reconcile_interval(Duration::from_secs(300))
    .enable_claiming(false)             // single node, no claiming overhead
    .build();

Requirements:

SQLite is acceptable for single-node batch deployments
Minimal memory footprint (~50MB baseline)
Can run on smaller instances (2-4 cores)

Database hardware recommendations

Deployment	Database	CPU	Memory	Storage
Development	SQLite	Any	Any	Any
Single-node production	PostgreSQL	2+ cores	4GB+	SSD
High-throughput production	PostgreSQL	4+ cores	16GB+	NVMe SSD
Multi-tenant production	PostgreSQL	8+ cores	32GB+	NVMe SSD, IOPS provisioned

PostgreSQL tuning for Cloacina workloads

Key PostgreSQL settings to consider:

# Connection handling
max_connections = 200              # >= sum of all runner pool sizes + overhead
idle_in_transaction_session_timeout = '30s'  # prevent leaked connections

# Write performance
synchronous_commit = on            # keep on for durability; off only for benchmarks
wal_level = replica                # allows streaming replication
checkpoint_completion_target = 0.9

# Query performance
effective_cache_size = 12GB        # ~75% of available RAM
shared_buffers = 4GB               # ~25% of available RAM
work_mem = 64MB                    # per-operation sort/hash memory

Connection pooling with PgBouncer

PgBouncer is a lightweight PostgreSQL connection pooler that sits between your application and PostgreSQL, multiplexing many client connections over fewer server connections.

For deployments with multiple runner instances, place PgBouncer between runners and PostgreSQL:

[pgbouncer]
pool_mode = transaction    # required -- Cloacina uses SET search_path per-transaction
max_client_conn = 400      # total clients across all runners
default_pool_size = 40     # connections to PostgreSQL per database
reserve_pool_size = 5      # extra connections for burst load
reserve_pool_timeout = 3   # seconds before using reserve pool
server_idle_timeout = 300  # close idle server connections after 5 min

Use transaction mode (not session mode) because Cloacina issues SET search_path within transactions for schema-based multi-tenancy. Session mode would leak schema settings between clients.

Horizontal scaling with task claiming

When running multiple runner instances for horizontal scaling:

use std::time::Duration;
use cloacina::runner::DefaultRunnerConfig;

let config = DefaultRunnerConfig::builder()
    .max_concurrent_tasks(16)
    .enable_claiming(true)
    .heartbeat_interval(Duration::from_secs(10))
    // stale_claim_sweep_interval (30s) and stale_claim_threshold (60s)
    // use struct defaults — not available as builder methods
    .runner_id(Some("runner-east-1".to_string()))
    .runner_name(Some("East Region Runner 1".to_string()))
    .build();

The claiming system ensures:

Each task is executed by exactly one runner
Heartbeats prove liveness (default: every 10 seconds)
Stale claims are reclaimed after the threshold (default: 60 seconds, must be > heartbeat interval)
The sweep interval (default: 30 seconds) controls how quickly a crashed runner’s tasks are redistributed

Tune stale_claim_threshold based on your longest expected network partition or GC pause. Too short and you get duplicate executions; too long and failover is slow.

Summary of Defaults

Parameter	Default	Unit
`max_concurrent_tasks`	4	tasks
`scheduler_poll_interval`	100	ms
`task_timeout`	300	seconds
`pipeline_timeout`	3600	seconds
`db_pool_size`	10	connections
`cron_poll_interval`	30	seconds
`cron_max_catchup_executions`	unlimited	executions
`cron_recovery_interval`	300	seconds
`cron_lost_threshold_minutes`	10	minutes
`cron_max_recovery_age`	86400	seconds
`cron_max_recovery_attempts`	3	attempts
`trigger_base_poll_interval`	1	seconds
`trigger_poll_timeout`	30	seconds
`registry_reconcile_interval`	60	seconds
`heartbeat_interval`	10	seconds
`stale_claim_sweep_interval`	30	seconds
`stale_claim_threshold`	60	seconds

Performance Tuning

1. Concurrency Tuning

How to size based on workload

Slot management and deferred tasks

Practical example

Observed scaling behavior

2. Scheduler Optimization

The latency vs CPU tradeoff

What happens each poll cycle

Configuration

Impact on database query frequency

3. Database Performance

Pool sizing

PostgreSQL vs SQLite performance

SQLite WAL configuration

Connection pool behavior

4. Timeout Configuration

Task timeout

Pipeline timeout

Interaction with retries

5. Cron Scheduling Performance

Poll interval

Catchup execution limits

Cron recovery settings

Database impact of high-frequency schedules

6. Registry and Reconciler

Reconciliation interval

Startup reconciliation

Storage backend performance

7. Computation Graph Performance

Boundary channel capacity

Batch accumulator tuning

Reactor execution strategy

Graph compilation overhead

Reactor health polling

8. Monitoring Performance

Key metrics to watch

Log levels for performance diagnosis

OpenTelemetry overhead

9. Production Recommendations

Low-latency profile (target: <100ms scheduling delay)

High-throughput profile (target: >1000 workflows/min)

Batch processing profile (minimize resource usage)

Database hardware recommendations

PostgreSQL tuning for Cloacina workloads

Connection pooling with PgBouncer

Horizontal scaling with task claiming

Summary of Defaults

Related Resources