################################################################### Verification methodology ################################################################### Claims in the previous chapters are only as strong as the evidence that the code in ``quantica`` actually behaves the way those chapters describe. This chapter catalogues the **evidence artefacts** the project maintains and how they are produced. Two complementary techniques are used: * **ctgrind** — Valgrind memcheck client-requests applied to a dedicated test binary. Covers *control-flow* and *memory-access* constant-time properties, run on the host CI. * **silentops::verify** (dudect) — statistical timing test, run on the target hardware. Covers *microarchitectural* timing properties (cache, branch predictor, variable-latency instructions). Neither alone is sufficient; both are required for a credible CT assurance story. .. contents:: :local: :depth: 2 ctgrind (Valgrind memcheck) — control-flow CT ============================================= Methodology ----------- The technique was first described by Adam Langley in 2010 :cite:`langley2010ctgrind`. A memory buffer holding a secret is marked **undefined** via the Valgrind ``make_mem_undefined`` client request. Any subsequent conditional branch, pointer dereference, or syscall argument that consumes bits of the undefined buffer raises a memcheck error — i.e. any control-flow or addressing decision that depends on the secret is reported. Because memcheck propagates definedness through pure arithmetic, masked bitwise selects (``b ^ (mask & (a ^ b))``) do not raise errors; only the downstream *use* of their result in a branch or load does. A correctly written CT primitive therefore passes ctgrind cleanly; a bug such as a pointer-CMOV → load (as seen on the early ``quantica`` ``ct_select``) triggers an error on the load. Implementation -------------- * ``silentops::ct_grind::poison`` / ``unpoison`` emit the Valgrind client requests via inline ``asm!`` — no C shim, no crate dependency. Surrounded by ``compiler_fence(SeqCst)`` to prevent LLVM from reordering memory reads past a poison boundary. * ``quantica_bench/src/bin/ctgrind.rs`` is the driver binary: it poisons only the truly secret portions of each ``sk`` (skipping the public components that FIPS layouts embed inline), runs keygen + encaps/decaps or sign/verify for each parameter set of each algorithm, then unpoisons before the correctness round-trip. * ``tools/ctgrind.sh`` wraps ``valgrind --error-exitcode=1 --track-origins=yes`` with the suppression file documented below. * ``tools/ctgrind.supp`` holds named, individually justified suppressions — one stanza per legitimate finding, each with a paragraph of threat-model reasoning. Demo / self-test ---------------- ``silentops/examples/ct_grind_demo.rs`` exercises both a CT path (``silentops::ct_eq`` on a poisoned buffer, should pass) and an intentional leak (``if secret[0] > 0x80``, should be reported). Running ``tools/ctgrind.sh demo`` validates the entire plumbing in one command — any failure on the demo means the infrastructure is broken and subsequent algorithm runs are not trustworthy. Running against the production algorithms ----------------------------------------- :: # all three algorithms, all parameter sets tools/ctgrind.sh all # individual runs tools/ctgrind.sh mlkem tools/ctgrind.sh mldsa tools/ctgrind.sh slh Exit code ``0`` means "no unsuppressed findings". Exit code ``1`` means memcheck reported a new finding that is not justified in the suppression file — this is a CT regression and must be fixed or explicitly added to the suppression file **with a new threat-model paragraph in this document**. Current suppressions -------------------- The file ``tools/ctgrind.supp`` lists the known, documented findings for each algorithm. They fall in two families: 1. **Branches on data that becomes publicly emitted** — ML-DSA ``decompose::high_bits_vec``, ``encode::w1_encode``, ``decompose::make_hint_vec``, ``encode::sig_encode``, and the inlined rejection-loop intermediates inside ``MlDsa::
::sign``. SLH-DSA ``fors::fors_pk_from_sig``,
``wots::chain_iter``, ``xmss::xmss_sign_into``,
``xmss::xmss_pk_from_sig``.
The leaked bits are byte-for-byte identical to bytes present in
the emitted signature, so timing observation reveals no
information beyond what the signature itself reveals.
2. **Rejection sampling on a PRF-derived stream** — ML-DSA
``shuffle::generate_permutation``. The rejection condition
``if r < limit`` branches on one byte of a SHAKE256 output
computationally indistinguishable from uniform without ``K``;
leaking SHAKE output bits is at least as hard as breaking
SHAKE256's PRF property.
Each suppression has an inline rationale in ``tools/ctgrind.supp``;
they are reproduced in the per-algorithm chapters as well
(:doc:`countermeasures/ml_dsa`, :doc:`countermeasures/slh_dsa`).
.. admonition:: Roadmap — all suppressions closed before release
:class: important
Each suppression corresponds to a **T4-x** item in the Tier-4
hardening roadmap. A release tree is built from a state where
each item has been implemented and the corresponding suppression
removed. The present draft documents both postures so that the
reviewer sees the *current code* and the *intended end state*
side by side.
dudect (``silentops::verify``) — microarchitectural timing
==========================================================
Methodology
-----------
The Reparaz–Balasch–Verbauwhede methodology
:cite:`reparaz2017dudect` proceeds as follows:
1. Build two *input classes* — typically a "fixed" (worst-case)
input and a "random" input.
2. Interleave measurements of the two classes randomly to
decorrelate from slow system drifts.
3. Feed each measurement into a Welch t-test maintained
incrementally with Welford's algorithm.
4. If ``|t| > 4.5`` after a reasonable sample count, reject the
null hypothesis "the two classes have the same mean timing"
(``p < 10⁻⁵``).
``silentops::verify`` packages the statistical machinery as a
library — ``TTest``, ``Xorshift64``, ``measure_ns``, ``report`` —
so consumers write their own class builders and harness.
Reference harness
-----------------
``silentops/examples/ct_verify_pqc.rs`` provides a canonical
harness exercising:
* ML-KEM-768 Decaps (valid vs. random ciphertext) — expected PASS;
* ML-KEM Barrett reduce (small vs. large input) — expected PASS;
* ML-DSA-44 Verify (valid vs. invalid signature) — expected PASS;
* ML-DSA-44 Sign (two messages) — expected FAIL *on timing
variation* because the FIPS 204 rejection loop is inherently
non-constant-time in the number of iterations. This is not a
vulnerability (the variation is message-dependent but not
secret-key-dependent) but dudect cannot distinguish the two —
the test comment makes this explicit.
Role in the verification pipeline
---------------------------------
ctgrind catches *control-flow* leaks — any branch or addressing
decision that depends on a secret. dudect catches *timing* leaks
that are invisible to ctgrind because they arise from
microarchitectural state: branch-predictor warming, port
contention on CPUs with µ-ops that exhibit input-dependent
latency, and cache eviction patterns that manifest only on real
hardware.
The two are not redundant — they look at disjoint failure modes.
A high-assurance verification posture runs both:
* ctgrind on the CI host for every merge (blocking);
* dudect on the target platform (representative MCU / SE dev
board) on a nightly or release-candidate cadence.
Integration plan (Gitea workflows, upcoming)
============================================
The Gitea workflows being set up on the home-lab runner (``turtle``)
will wire both verification techniques into the merge gate:
* **Per-PR** — fast ctgrind run (mlkem + mldsa + slh) against the
``ct-grind`` feature build; failure blocks merge.
* **Nightly** — full dudect harness on the Cortex-M4 dev board,
with trace counts sized for statistical power at
``p < 10⁻⁵``; report uploaded as a build artefact.
* **Pre-release tag** — stack-sizes per-function analysis,
cross-architecture build (``thumbv6m``, ``thumbv7em``,
``thumbv8m.main``, ``riscv32imc``), and the QEMU-user ctgrind
run covering ``aarch64`` / ``armv7`` / ``riscv64``.
The CI definitions themselves live in ``.gitea/workflows/`` once
committed. This chapter will be revised to reference them directly
as soon as they land.
release-asm branch counting — methodology and current state
===========================================================
A complementary day-to-day CT regression indicator, called for in
the workspace-level ``SECURITY.md`` (Section 4.4), is **expected
branch count per CT-critical function** measured on the release
build. The methodology is:
#. Compile the function under ``cargo rustc --release -p quantica
--emit=asm``.
#. Filter the resulting ``.s`` for conditional jump opcodes
(``je`` / ``jne`` / ``jz`` / ``jnz`` / ``jl`` / ``jle`` /
``jg`` / ``jge`` / ``ja`` / ``jb`` on x86_64, the equivalent
``b.eq`` / ``b.ne`` / … on aarch64, ``cbz`` / ``cbnz`` / ``b.cs``
on Cortex-M).
#. Compare the count against an **expected reference** baked into
the relevant ``countermeasures/