Shared side-channel primitives — silentops
The silentops crate is the single source of truth for the low-level
side-channel primitives used by quantica (and by arcana on the
classical side). Keeping these primitives in a separate crate means:
a single audit surface for CT correctness, independent of any particular algorithm;
architecture-specific assembly backends selected at compile time via Cargo features, so a downstream crate never embeds per-arch
asmin its own source;the same primitives are used by the statistical (
dudect) and the client-request (ctgrind) side-channel verifiers, keeping test coverage coherent.
This chapter is a reference for those primitives. The threats they mitigate and the algorithmic uses live in Threat model and ML-KEM — countermeasures, ML-DSA — countermeasures, SLH-DSA — countermeasures.
Module layout
Module |
Role |
|---|---|
|
Branchless constant-time primitives with architecture-specific
assembly backends. |
|
Valgrind memcheck client-request helpers ( |
|
Dudect-style timing-leak detector ( |
Constant-time primitives — silentops::ct
Surface
Function |
Signature (logical) |
Purpose |
|---|---|---|
|
|
Return |
|
|
Same, for NTT-domain coefficients in |
|
|
Same, for ML-DSA coefficients in |
|
|
Constant-time byte-slice equality. No early exit; returns
|
|
|
Conditional in-place copy. Always reads both buffers; writes to
|
|
|
Volatile zeroization resistant to dead-store elimination
( |
|
|
Same for polynomial coefficient arrays. |
Calling convention
condition: u8must be exactly0or1. The primitives compute the mask via0u8.wrapping_sub(condition); passing0xFFor any other non-0/1value breaks the CT invariant and the functional result.ct_eqalways processes the full buffer length; it isO(n)inn = a.len()with a fixed per-byte cost. Buffer length itself is considered public.The loop-based primitives (
ct_eq,ct_copy,ct_zeroize) are marked#[inline(never)]so that LLVM does not re-inline the loop into caller contexts where it might re- optimise it into variable-time code.
Architecture dispatch
The silentops/src/ct/mod.rs file selects exactly one backend at
compile time based on target_arch and the cargo features listed
below.
Target |
Feature |
Implementation technique |
|---|---|---|
|
|
Inline |
|
|
|
|
|
|
|
|
No |
|
|
No conditional move; uses AND/OR/XOR with a mask derived from
|
any (default) |
none |
Pure Rust bitwise fallback. Not recommended for production CT builds — see the warning below. |
Why the pure-Rust fallback is dangerous at opt-level >= 2
The generic fallback writes each primitive as b ^ (mask & (a ^ b)).
The LLVM back-end recognises this pattern. At opt-level = 2 or
above it will frequently rewrite the ct_select wrapper (e.g. the
32-byte select in ml_kem::kem::ct_select) into:
test ecx, ecx
cmovne rdx, rsi ; pointer CMOV
cmovne r8, rax
movups xmm0, [rdx] ; load from the selected address
movups xmm1, [r8]
— a secret-dependent pointer CMOV followed by a load. The cache
line fetched then depends on the secret cond, which is a
classical cache-timing leak recoverable by a local attacker.
This behaviour was confirmed in ctgrind runs against an early
build of quantica and is the entire reason the asm-x86_64
backend exists. See Verification methodology for the ctgrind trace.
Recommended build profile
On x86_64 hosts, build with at minimum:
cargo build --release \
-p quantica \
--features asm-x86_64
The quantica_bench/ct-grind cargo feature forwards
silentops/asm-x86_64 automatically, so builds intended for
side-channel verification always get the asm backend.
core::hint::black_box shielding — design choice
The workspace SECURITY.md (Section 4.1) lists
core::hint::black_box shielding as a workspace-wide rule
“wherever a CT mask is derived from a secret”, because without it
LLVM (rustc 1.84+) is known to recover branches over the
b ^ (mask & (a ^ b)) idiom — exactly the pattern documented
above as the failure mode of the pure-Rust fallback.
In the quantica crate this rule is satisfied structurally by
delegating every CT decision to silentops::ct_*, whose asm
backends (asm-x86_64, asm-aarch64, asm-thumbv7,
asm-thumbv6m, asm-riscv32) bypass the LLVM optimiser
entirely. Consequently quantica/src/ does not call
core::hint::black_box directly anywhere — the asm backends are
the stronger fix mentioned in the same SECURITY.md row.
Caveat — non-asm targets
On architectures without an asm backend (notably WebAssembly
through the quantica_wasm crate), the CT path falls back to
silentops::ct::generic and the LLVM-recovers-branch hazard
does apply. A planned hardening pass (no roadmap ID assigned
yet — flagged here as a workspace residual) will add explicit
core::hint::black_box calls inside
silentops::ct::generic so every consumer (quantica + arcana)
inherits the shielding regardless of target. Until that lands,
WebAssembly builds of quantica should be considered best-effort
on the CT axis.
Source pointers
Item |
File |
|---|---|
Public API & re-exports |
|
Module dispatch |
|
Generic (bit-twiddling) fallback |
|
x86_64 asm backend |
|
aarch64 asm backend |
|
thumbv7 asm backend |
|
thumbv6m asm backend |
|
riscv32 asm backend |
|
CT unit tests (run on every arch) |
|
ctgrind instrumentation — silentops::ct_grind
ct_grind provides the two-function API needed to drive
Valgrind/memcheck-based CT verification:
silentops::ct_grind::poison(buf); // mark as secret
silentops::ct_grind::unpoison(buf); // mark as public again
silentops::ct_grind::is_active(); // true only when the feature
// is enabled AND the target is
// x86_64-linux or aarch64-linux
The implementation emits the Valgrind client-request magic sequence
via stable core::arch::asm!, with no C shim or third-party crate.
Surrounding compiler_fence(SeqCst) calls prevent LLVM from
reordering subsequent memory reads past a poison / unpoison
call — a subtle but critical detail first identified during the
initial quantica_bench ctgrind bring-up.
When the ct-grind feature is disabled, or on non-supported
targets, all three functions compile to zero-cost no-ops so call
sites can stay unconditional (no #[cfg] walls in consumer
code).
The full methodology, the demo binary that validates the plumbing, and the interpretation rules for memcheck output are covered in Verification methodology.
Statistical timing verification — silentops::verify
The verify module packages the Reparaz–Balasch–Verbauwhede
methodology [RBV17] as a library — a tiny
Xorshift64 for class selection, an incremental Welch t-test
(TTest), a measure_ns sampler, and a report helper that
prints PASS / FAIL against T_THRESHOLD = 4.5
(p < 10⁻⁵).
Consumers write their own measurement loops on top of this API. The
canonical example is silentops/examples/ct_verify_pqc.rs, which
exercises ML-KEM-768 Decaps, the ML-KEM Barrett reduction, and
ML-DSA-44 Sign / Verify.
verify is the complement of ctgrind — it runs on real
hardware and catches timing leaks that depend on microarchitectural
state rather than pure control flow. A typical high-assurance run
uses both: ctgrind on the CI host for control-flow CT correctness,
dudect on the target hardware for timing-on-device evidence.