################################################################### Shared side-channel primitives — ``silentops`` ################################################################### The ``silentops`` crate is the single source of truth for the low-level side-channel primitives used by ``quantica`` (and by ``arcana`` on the classical side). Keeping these primitives in a separate crate means: * a single audit surface for CT correctness, independent of any particular algorithm; * architecture-specific assembly backends selected at compile time via Cargo features, so a downstream crate never embeds per-arch ``asm`` in its own source; * the same primitives are used by the statistical (``dudect``) and the client-request (``ctgrind``) side-channel verifiers, keeping test coverage coherent. This chapter is a reference for those primitives. The threats they mitigate and the algorithmic uses live in :doc:`threat_model` and :doc:`countermeasures/ml_kem`, :doc:`countermeasures/ml_dsa`, :doc:`countermeasures/slh_dsa`. Module layout ============= .. list-table:: :header-rows: 1 :widths: 30 70 * - Module - Role * - ``silentops::ct`` - Branchless constant-time primitives with architecture-specific assembly backends. ``no_std``. Public functions are re-exported at the crate root so call sites write ``silentops::ct_eq(...)``. * - ``silentops::ct_grind`` - Valgrind memcheck client-request helpers (``poison`` / ``unpoison``) for constant-time verification. See :doc:`verification`. * - ``silentops::verify`` - Dudect-style timing-leak detector (``TTest``, ``Xorshift64``, ``measure_ns``, ``report``). ``std`` only. Constant-time primitives — ``silentops::ct`` ============================================ Surface ------- .. list-table:: :header-rows: 1 :widths: 30 20 50 * - Function - Signature (logical) - Purpose * - ``ct_select_u8`` - ``(a: u8, b: u8, cond: u8) -> u8`` - Return ``a`` if ``cond != 0`` else ``b``. Core branchless select. * - ``ct_select_i16`` - ``(a: i16, b: i16, cond: u8) -> i16`` - Same, for NTT-domain coefficients in ``i16``. * - ``ct_select_i32`` - ``(a: i32, b: i32, cond: u8) -> i32`` - Same, for ML-DSA coefficients in ``i32``. * - ``ct_eq`` - ``(a: &[u8], b: &[u8]) -> u8`` - Constant-time byte-slice equality. No early exit; returns ``1`` on equality, ``0`` otherwise (including different lengths). * - ``ct_copy`` - ``(dst: &mut [u8], src: &[u8], cond: u8)`` - Conditional in-place copy. Always reads both buffers; writes to ``dst`` are branch-free XOR-mask updates. * - ``ct_zeroize`` - ``(buf: &mut [u8])`` - Volatile zeroization resistant to dead-store elimination (``write_volatile`` + ``compiler_fence(SeqCst)``). * - ``ct_zeroize_i16`` - ``(buf: &mut [i16])`` - Same for polynomial coefficient arrays. Calling convention ------------------ * ``condition: u8`` must be exactly ``0`` or ``1``. The primitives compute the mask via ``0u8.wrapping_sub(condition)``; passing ``0xFF`` or any other non-``0/1`` value breaks the CT invariant **and** the functional result. * ``ct_eq`` always processes the full buffer length; it is ``O(n)`` in ``n = a.len()`` with a fixed per-byte cost. Buffer length itself is considered public. * The loop-based primitives (``ct_eq``, ``ct_copy``, ``ct_zeroize``) are marked ``#[inline(never)]`` so that LLVM does not re-inline the loop into caller contexts where it might re- optimise it into variable-time code. Architecture dispatch ===================== The ``silentops/src/ct/mod.rs`` file selects exactly one backend at compile time based on ``target_arch`` and the cargo features listed below. .. list-table:: :header-rows: 1 :widths: 28 22 50 * - Target - Feature - Implementation technique * - ``x86_64`` - ``asm-x86_64`` - Inline ``cmovne`` on values held in GPRs. Each call compiles to ``test`` + ``cmov`` that LLVM cannot introspect or rewrite. * - ``aarch64`` - ``asm-aarch64`` - ``csel`` (one cycle, branch-free, unconditional in the AArch64 architecture). * - ``thumbv7em`` / ``thumbv7m`` - ``asm-thumbv7`` - ``IT`` blocks + conditional execution; Cortex-M4/M7/M33 guarantee fixed timing inside an ``IT`` block. * - ``thumbv6m`` (Cortex-M0 / M0+) - ``asm-thumbv6m`` - No ``IT``, no ``cmov``; falls back to AND/OR/XOR bitwise mask (same as the generic fallback) but written as inline asm so the compiler cannot regenerate a branch. * - ``riscv32`` - ``asm-riscv32`` - No conditional move; uses AND/OR/XOR with a mask derived from ``neg``, hand-written in asm. * - any (default) - *none* - Pure Rust bitwise fallback. **Not recommended for production CT builds** — see the warning below. Why the pure-Rust fallback is dangerous at ``opt-level >= 2`` ------------------------------------------------------------- The generic fallback writes each primitive as ``b ^ (mask & (a ^ b))``. The LLVM back-end recognises this pattern. At ``opt-level = 2`` or above it will frequently rewrite the ``ct_select`` wrapper (e.g. the 32-byte select in ``ml_kem::kem::ct_select``) into:: test ecx, ecx cmovne rdx, rsi ; pointer CMOV cmovne r8, rax movups xmm0, [rdx] ; load from the selected address movups xmm1, [r8] — a **secret-dependent pointer CMOV followed by a load**. The cache line fetched then depends on the secret ``cond``, which is a classical cache-timing leak recoverable by a local attacker. This behaviour was confirmed in ``ctgrind`` runs against an early build of ``quantica`` and is the entire reason the ``asm-x86_64`` backend exists. See :doc:`verification` for the ctgrind trace. Recommended build profile ------------------------- On ``x86_64`` hosts, build with at minimum:: cargo build --release \ -p quantica \ --features asm-x86_64 The ``quantica_bench/ct-grind`` cargo feature forwards ``silentops/asm-x86_64`` automatically, so builds intended for side-channel verification always get the asm backend. ``core::hint::black_box`` shielding — design choice --------------------------------------------------- The workspace ``SECURITY.md`` (Section 4.1) lists ``core::hint::black_box`` shielding as a workspace-wide rule "wherever a CT mask is derived from a secret", because without it LLVM (rustc 1.84+) is known to recover branches over the ``b ^ (mask & (a ^ b))`` idiom — exactly the pattern documented above as the failure mode of the pure-Rust fallback. In the quantica crate this rule is satisfied **structurally** by delegating every CT decision to ``silentops::ct_*``, whose asm backends (``asm-x86_64``, ``asm-aarch64``, ``asm-thumbv7``, ``asm-thumbv6m``, ``asm-riscv32``) bypass the LLVM optimiser entirely. Consequently ``quantica/src/`` does not call ``core::hint::black_box`` directly anywhere — the asm backends are the *stronger fix* mentioned in the same SECURITY.md row. .. admonition:: Caveat — non-asm targets :class: important On architectures without an asm backend (notably WebAssembly through the ``quantica_wasm`` crate), the CT path falls back to ``silentops::ct::generic`` and the LLVM-recovers-branch hazard **does** apply. A planned hardening pass (no roadmap ID assigned yet — flagged here as a workspace residual) will add explicit ``core::hint::black_box`` calls inside ``silentops::ct::generic`` so every consumer (quantica + arcana) inherits the shielding regardless of target. Until that lands, WebAssembly builds of quantica should be considered *best-effort* on the CT axis. Source pointers --------------- .. list-table:: :header-rows: 1 :widths: 40 60 * - Item - File * - Public API & re-exports - ``silentops/src/lib.rs`` * - Module dispatch - ``silentops/src/ct/mod.rs`` * - Generic (bit-twiddling) fallback - ``silentops/src/ct/generic.rs`` * - x86_64 asm backend - ``silentops/src/ct/x86_64.rs`` * - aarch64 asm backend - ``silentops/src/ct/aarch64.rs`` * - thumbv7 asm backend - ``silentops/src/ct/thumbv7.rs`` * - thumbv6m asm backend - ``silentops/src/ct/thumbv6m.rs`` * - riscv32 asm backend - ``silentops/src/ct/riscv32.rs`` * - CT unit tests (run on every arch) - ``silentops/src/ct/tests.rs`` ctgrind instrumentation — ``silentops::ct_grind`` ================================================= ``ct_grind`` provides the two-function API needed to drive Valgrind/memcheck-based CT verification: .. code-block:: rust silentops::ct_grind::poison(buf); // mark as secret silentops::ct_grind::unpoison(buf); // mark as public again silentops::ct_grind::is_active(); // true only when the feature // is enabled AND the target is // x86_64-linux or aarch64-linux The implementation emits the Valgrind client-request magic sequence via stable ``core::arch::asm!``, with no C shim or third-party crate. Surrounding ``compiler_fence(SeqCst)`` calls prevent LLVM from reordering subsequent memory reads past a ``poison`` / ``unpoison`` call — a subtle but critical detail first identified during the initial ``quantica_bench`` ctgrind bring-up. When the ``ct-grind`` feature is disabled, or on non-supported targets, all three functions compile to zero-cost no-ops so call sites can stay unconditional (no ``#[cfg]`` walls in consumer code). The full methodology, the demo binary that validates the plumbing, and the interpretation rules for memcheck output are covered in :doc:`verification`. Statistical timing verification — ``silentops::verify`` ======================================================= The ``verify`` module packages the Reparaz–Balasch–Verbauwhede methodology :cite:`reparaz2017dudect` as a library — a tiny ``Xorshift64`` for class selection, an incremental Welch t-test (``TTest``), a ``measure_ns`` sampler, and a ``report`` helper that prints ``PASS`` / ``FAIL`` against ``T_THRESHOLD = 4.5`` (``p < 10⁻⁵``). Consumers write their own measurement loops on top of this API. The canonical example is ``silentops/examples/ct_verify_pqc.rs``, which exercises ML-KEM-768 Decaps, the ML-KEM Barrett reduction, and ML-DSA-44 Sign / Verify. ``verify`` is the complement of ``ctgrind`` — it runs on real hardware and catches timing leaks that depend on microarchitectural state rather than pure control flow. A typical high-assurance run uses both: ctgrind on the CI host for control-flow CT correctness, dudect on the target hardware for timing-on-device evidence.