Run cloacina-compiler in Production
This guide covers operating the long-running cloacina-compiler
service: the binary that claims pending build rows from the database,
compiles user-supplied package source into .so / .dylib
artifacts, and writes them back for runner instances to load.
This is the service path. For laptop / CI use of
cloacinactl package build + pack, see
Use cloacina-compiler Locally.
State this plainly so there are no illusions: a malicious
build.rs in a submitted package is code execution on the compiler
host. The compiler runs cargo build on attacker-supplied source.
Cargo executes build.rs as part of that build. Anything build.rs
can do — read files the compiler can read, contact endpoints
reachable from the compiler, exhaust resources the kernel doesn’t
bound — happens on your infrastructure.
Phase 1 mitigations (this guide) bound what and how much. They do not prevent code execution. Phase 2 (CLOACI-I-0105) adds a kernel-enforced process sandbox (bubblewrap + landlock) that confines the cargo subprocess to a tmpfs root with no host filesystem access and no outbound network. Until Phase 2 lands, the operator responsibilities below are how you keep blast radius bounded.
The compiler trusts the operator’s configuration over the submitter’s source. Five things you must get right:
Create a cloacina-compiler system user with no shell, no sudo, and
no membership in groups that grant filesystem or service access. The
compiler needs:
- Read/write to its
--homedirectory (logs, build tmp). - Read/write to its
CARGO_TARGET_DIR(shared target cache, if set). - Read access to the configured
--vendor-dir(curatedCARGO_HOME). - A database role with
SELECT/UPDATEonworkflow_packagesandINSERTon the audit-event sink, nothing more. In particular, the compiler must not share the admin DB role that the server uses — a maliciousbuild.rsreadingDATABASE_URLfrom the process environment would otherwise gain admin DB access to every tenant.
--frozen --offline is the default. The cargo subprocess fails fast
on any dep that isn’t in the vendor dir. Pair that with a network
namespace (or, until Phase 2, a host firewall) that drops outbound
connections from the cloacina-compiler UID — defense in depth
against any future cargo flag change.
Default 600s. Tune per workload:
- Small workspaces (1-2 crate trees, ~50 dep crates): 300s plenty.
- Large workspaces (workspace member graphs, big proc-macro chains): 900s+ may be needed for cold-cache builds.
Builds that exceed the timeout are SIGKILL’d; the row’s heartbeat
stops and the existing stale-build sweeper resets it to pending.
The build can be retried, but the operator should investigate why a
package needs >600s before raising the cap blindly.
Four kernel-enforced ceilings, Linux-only. Defaults are conservative starting points; tune per workload.
| Flag | Default | What it bounds |
|---|---|---|
--build-rlimit-cpu |
--build-timeout-s (600s) |
CPU-seconds. Tracks the wall-clock as a generous upper bound. |
--build-rlimit-mem |
4G |
Virtual address space, bytes. Accepts K/M/G suffixes. |
--build-rlimit-files |
1024 |
Open file descriptors. |
--build-rlimit-procs |
256 |
User processes (bounds fork bombs). |
Tuning notes:
- Memory: release builds of crates with heavy generics can peak
4 GiB. If a known-good package fails with no obvious reason and
exit_signal=SIGKILL/SIGSEGVin the audit log, bump-memfirst. - Procs: parallel cargo with many cores spawns hundreds of rustc
invocations. If you see builds hang on a
-j32host, raise this. - No “disabled” sentinel. To remove a ceiling, pass a large value
(the kernel cap on most systems is
RLIM_INFINITY=u64::MAX). There is no--build-rlimit-mem=nonesyntax.
The compiler reads cargo deps from --vendor-dir (env
CLOACINA_COMPILER_VENDOR_DIR), default ~/.cargo. The operator
populates it via cargo vendor; see the next section.
The operator’s vendor dir is the allowlist of crates submitted
packages can resolve under --offline. Curate it explicitly.
-
Start with a known-good source tree that lists every crate you want to permit (typically the cloacina workspace itself + any in-house workflow crates your authors share):
git clone https://github.com/colliery-io/cloacina /tmp/cloacina cd /tmp/cloacina -
Run
cargo vendorwith output pointing at the compiler’s vendor dir:cargo vendor --locked /var/lib/cloacina-compiler/cargo/registryThis populates
<vendor-dir>/registry/{cache,src}with every transitive dep referenced inCargo.lock. The output emits a.cargo/config.tomlsnippet that tells cargo to use the vendored sources — copy that into<vendor-dir>/config.toml. -
Point the compiler at it:
cloacina-compiler \ --vendor-dir /var/lib/cloacina-compiler/cargo \ --database-url postgres://cloacina_compiler:...@db/cloacinaOr via env:
export CLOACINA_COMPILER_VENDOR_DIR=/var/lib/cloacina-compiler/cargo export CLOACINA_COMPILER_BUILD_TIMEOUT_S=600 cloacina-compiler --database-url $DATABASE_URL -
Adding an in-house crate. Vendor it into a sibling source tree, then re-run
cargo vendoragainst the union, and replace the compiler’s vendor dir. Restart the compiler.
All compiler flags accept env equivalents (CLOACINA_COMPILER_*).
| Flag | Env | Default | Source |
|---|---|---|---|
--build-timeout-s |
CLOACINA_COMPILER_BUILD_TIMEOUT_S |
600 | T-0573 |
--vendor-dir |
CLOACINA_COMPILER_VENDOR_DIR |
unset (cargo ~/.cargo) |
T-0574 |
--cargo-flag (repeatable) |
— | build --release --lib --frozen --offline |
T-0574 |
--build-rlimit-cpu |
CLOACINA_COMPILER_BUILD_RLIMIT_CPU |
= --build-timeout-s |
T-0575 |
--build-rlimit-mem |
CLOACINA_COMPILER_BUILD_RLIMIT_MEM |
4G |
T-0575 |
--build-rlimit-files |
CLOACINA_COMPILER_BUILD_RLIMIT_FILES |
1024 | T-0575 |
--build-rlimit-procs |
CLOACINA_COMPILER_BUILD_RLIMIT_PROCS |
256 | T-0575 |
--cargo-target-dir |
— | unset (per-build target/) |
— |
--home |
— | $HOME/.cloacina |
— |
--database-url |
DATABASE_URL |
required | — |
Setting
--cargo-flagreplaces the entire default list. If you override to add a flag, include--frozenand--offlineexplicitly or you’ll lose the offline posture.
The compiler emits two structured events per build via tracing:
compiler.build.started— emitted after the source archive unpacks and content hashes are computed, just before the cargo subprocess fires.compiler.build.finished— emitted exactly once per build on every outcome path (success, failure, timeout-kill).
Pipe tracing to your sink of choice (file, journald, Loki, SIEM).
The event fields:
build_claim_id— UUID of the build row.package_name,package_version— what was submitted.cargo_toml_hash— SHA-256 hex of the unpackedCargo.toml, or<absent>for non-Rust packages.cargo_lock_hash— SHA-256 hex of the unpackedCargo.lock, or<none>.compiler_instance_id— UUID of the compiler process. Generated at startup and stamped on the “cloacina-compiler starting” log line, so you can correlate every build to a specific compiler instance.
All of the started fields, plus:
outcome— one ofsuccess,failed,timeout_killed,internal_error.exit_status— cargo’s exit code if it exited normally, else<none>.exit_signal— signal name (SIGKILL,SIGSEGV,SIGABRT, …) if cargo was signal-terminated, else<none>.wall_clock_ms— total time fromrun_buildentry to emit.failure_reason— operator-actionable message on failure (e.g.dependencies not available offline: foo, bar), else<none>.
Reconstruct a build by claim id:
grep '"build_claim_id":"<uuid>"' /var/log/cloacina/compiler.log
Find all rlimit-like kills in the last day:
grep '"event_type":"compiler.build.finished"' /var/log/cloacina/compiler.log \
| grep '"outcome":"failed"' \
| grep -E '"exit_signal":"(SIGKILL|SIGSEGV|SIGABRT)"'
(rlimit_killed collapses into outcome=failed with a signal-based
exit_signal. A dedicated outcome bucket is intentionally not added
in Phase 1 — heuristic on signal alone is fragile across kernels.)
Find every build of a specific package version:
grep '"event_type":"compiler.build.finished"' /var/log/cloacina/compiler.log \
| grep '"package_name":"my-workflow"' \
| grep '"package_version":"1.2.3"'
Phase 1 bounds resources and confines the registry. Phase 2 (CLOACI-I-0105) closes the gap with a process sandbox:
- bubblewrap for namespace isolation: the cargo subprocess sees
only a tmpfs build root, the curated vendor dir mounted RO, and
the bare minimum of
/usrfor the toolchain. The host filesystem is invisible. - landlock as defense-in-depth where the kernel supports it (Linux 5.13+).
- Network: closed by default — no outbound connections, period. The vendor dir bind-mount supersedes any registry the build script might attempt to reach.
Until that lands, the Phase 1 posture in this guide is your bound.
- Production Deployment — TLS termination for the
cloacinactl serveserver. Separate concern from the compiler. - Use cloacina-compiler Locally — local laptop / CI path, no service.
- ADR-0005 — Deployment-mode trust model — why the compiler is Linux-only, single-tenant build.
- CLOACI-I-0104 — Phase 1 hardening initiative.
- CLOACI-I-0105 — Phase 2 sandbox initiative.