################################################################### Verification methodology ################################################################### Claims in the previous chapters are only as strong as the evidence that the code in ``quantica`` actually behaves the way those chapters describe. This chapter catalogues the **evidence artefacts** the project maintains and how they are produced. Two complementary techniques are used: * **ctgrind** — Valgrind memcheck client-requests applied to a dedicated test binary. Covers *control-flow* and *memory-access* constant-time properties, run on the host CI. * **silentops::verify** (dudect) — statistical timing test, run on the target hardware. Covers *microarchitectural* timing properties (cache, branch predictor, variable-latency instructions). Neither alone is sufficient; both are required for a credible CT assurance story. .. contents:: :local: :depth: 2 ctgrind (Valgrind memcheck) — control-flow CT ============================================= Methodology ----------- The technique was first described by Adam Langley in 2010 :cite:`langley2010ctgrind`. A memory buffer holding a secret is marked **undefined** via the Valgrind ``make_mem_undefined`` client request. Any subsequent conditional branch, pointer dereference, or syscall argument that consumes bits of the undefined buffer raises a memcheck error — i.e. any control-flow or addressing decision that depends on the secret is reported. Because memcheck propagates definedness through pure arithmetic, masked bitwise selects (``b ^ (mask & (a ^ b))``) do not raise errors; only the downstream *use* of their result in a branch or load does. A correctly written CT primitive therefore passes ctgrind cleanly; a bug such as a pointer-CMOV → load (as seen on the early ``quantica`` ``ct_select``) triggers an error on the load. Implementation -------------- * ``silentops::ct_grind::poison`` / ``unpoison`` emit the Valgrind client requests via inline ``asm!`` — no C shim, no crate dependency. Surrounded by ``compiler_fence(SeqCst)`` to prevent LLVM from reordering memory reads past a poison boundary. * ``quantica_bench/src/bin/ctgrind.rs`` is the driver binary: it poisons only the truly secret portions of each ``sk`` (skipping the public components that FIPS layouts embed inline), runs keygen + encaps/decaps or sign/verify for each parameter set of each algorithm, then unpoisons before the correctness round-trip. * ``tools/ctgrind.sh`` wraps ``valgrind --error-exitcode=1 --track-origins=yes`` with the suppression file documented below. * ``tools/ctgrind.supp`` holds named, individually justified suppressions — one stanza per legitimate finding, each with a paragraph of threat-model reasoning. Demo / self-test ---------------- ``silentops/examples/ct_grind_demo.rs`` exercises both a CT path (``silentops::ct_eq`` on a poisoned buffer, should pass) and an intentional leak (``if secret[0] > 0x80``, should be reported). Running ``tools/ctgrind.sh demo`` validates the entire plumbing in one command — any failure on the demo means the infrastructure is broken and subsequent algorithm runs are not trustworthy. Running against the production algorithms ----------------------------------------- :: # all three algorithms, all parameter sets tools/ctgrind.sh all # individual runs tools/ctgrind.sh mlkem tools/ctgrind.sh mldsa tools/ctgrind.sh slh Exit code ``0`` means "no unsuppressed findings". Exit code ``1`` means memcheck reported a new finding that is not justified in the suppression file — this is a CT regression and must be fixed or explicitly added to the suppression file **with a new threat-model paragraph in this document**. Current suppressions -------------------- The file ``tools/ctgrind.supp`` lists the known, documented findings for each algorithm. They fall in two families: 1. **Branches on data that becomes publicly emitted** — ML-DSA ``decompose::high_bits_vec``, ``encode::w1_encode``, ``decompose::make_hint_vec``, ``encode::sig_encode``, and the inlined rejection-loop intermediates inside ``MlDsa::

::sign``. SLH-DSA ``fors::fors_pk_from_sig``, ``wots::chain_iter``, ``xmss::xmss_sign_into``, ``xmss::xmss_pk_from_sig``. The leaked bits are byte-for-byte identical to bytes present in the emitted signature, so timing observation reveals no information beyond what the signature itself reveals. 2. **Rejection sampling on a PRF-derived stream** — ML-DSA ``shuffle::generate_permutation``. The rejection condition ``if r < limit`` branches on one byte of a SHAKE256 output computationally indistinguishable from uniform without ``K``; leaking SHAKE output bits is at least as hard as breaking SHAKE256's PRF property. Each suppression has an inline rationale in ``tools/ctgrind.supp``; they are reproduced in the per-algorithm chapters as well (:doc:`countermeasures/ml_dsa`, :doc:`countermeasures/slh_dsa`). .. admonition:: Roadmap — all suppressions closed before release :class: important Each suppression corresponds to a **T4-x** item in the Tier-4 hardening roadmap. A release tree is built from a state where each item has been implemented and the corresponding suppression removed. The present draft documents both postures so that the reviewer sees the *current code* and the *intended end state* side by side. dudect (``silentops::verify``) — microarchitectural timing ========================================================== Methodology ----------- The Reparaz–Balasch–Verbauwhede methodology :cite:`reparaz2017dudect` proceeds as follows: 1. Build two *input classes* — typically a "fixed" (worst-case) input and a "random" input. 2. Interleave measurements of the two classes randomly to decorrelate from slow system drifts. 3. Feed each measurement into a Welch t-test maintained incrementally with Welford's algorithm. 4. If ``|t| > 4.5`` after a reasonable sample count, reject the null hypothesis "the two classes have the same mean timing" (``p < 10⁻⁵``). ``silentops::verify`` packages the statistical machinery as a library — ``TTest``, ``Xorshift64``, ``measure_ns``, ``report`` — so consumers write their own class builders and harness. Reference harness ----------------- ``silentops/examples/ct_verify_pqc.rs`` provides a canonical harness exercising: * ML-KEM-768 Decaps (valid vs. random ciphertext) — expected PASS; * ML-KEM Barrett reduce (small vs. large input) — expected PASS; * ML-DSA-44 Verify (valid vs. invalid signature) — expected PASS; * ML-DSA-44 Sign (two messages) — expected FAIL *on timing variation* because the FIPS 204 rejection loop is inherently non-constant-time in the number of iterations. This is not a vulnerability (the variation is message-dependent but not secret-key-dependent) but dudect cannot distinguish the two — the test comment makes this explicit. Role in the verification pipeline --------------------------------- ctgrind catches *control-flow* leaks — any branch or addressing decision that depends on a secret. dudect catches *timing* leaks that are invisible to ctgrind because they arise from microarchitectural state: branch-predictor warming, port contention on CPUs with µ-ops that exhibit input-dependent latency, and cache eviction patterns that manifest only on real hardware. The two are not redundant — they look at disjoint failure modes. A high-assurance verification posture runs both: * ctgrind on the CI host for every merge (blocking); * dudect on the target platform (representative MCU / SE dev board) on a nightly or release-candidate cadence. Integration plan (Gitea workflows, upcoming) ============================================ The Gitea workflows being set up on the home-lab runner (``turtle``) will wire both verification techniques into the merge gate: * **Per-PR** — fast ctgrind run (mlkem + mldsa + slh) against the ``ct-grind`` feature build; failure blocks merge. * **Nightly** — full dudect harness on the Cortex-M4 dev board, with trace counts sized for statistical power at ``p < 10⁻⁵``; report uploaded as a build artefact. * **Pre-release tag** — stack-sizes per-function analysis, cross-architecture build (``thumbv6m``, ``thumbv7em``, ``thumbv8m.main``, ``riscv32imc``), and the QEMU-user ctgrind run covering ``aarch64`` / ``armv7`` / ``riscv64``. The CI definitions themselves live in ``.gitea/workflows/`` once committed. This chapter will be revised to reference them directly as soon as they land. release-asm branch counting — methodology and current state =========================================================== A complementary day-to-day CT regression indicator, called for in the workspace-level ``SECURITY.md`` (Section 4.4), is **expected branch count per CT-critical function** measured on the release build. The methodology is: #. Compile the function under ``cargo rustc --release -p quantica --emit=asm``. #. Filter the resulting ``.s`` for conditional jump opcodes (``je`` / ``jne`` / ``jz`` / ``jnz`` / ``jl`` / ``jle`` / ``jg`` / ``jge`` / ``ja`` / ``jb`` on x86_64, the equivalent ``b.eq`` / ``b.ne`` / … on aarch64, ``cbz`` / ``cbnz`` / ``b.cs`` on Cortex-M). #. Compare the count against an **expected reference** baked into the relevant ``countermeasures/.rst`` chapter. Any regression — say, an unintended branch reintroduced by a compiler upgrade or by a refactor — surfaces as a count delta on the next CI run. .. admonition:: Current gap (honest disclosure) :class: important The methodology above is documented but the per-countermeasure *reference branch counts* are **not yet populated** in :doc:`countermeasures/ml_kem`, :doc:`countermeasures/ml_dsa`, or :doc:`countermeasures/slh_dsa` as of the current draft. The workspace-level ``SECURITY.md`` row that mentions documented branch counts therefore describes the **target** posture, not the current state. **How the gap will be closed:** each Tier-1 / Tier-2 item from the roadmap (see ``quantica/README.md`` "Roadmap" chapter) produces a delta on one or more CT-critical functions. The commit that lands the item also adds an "Expected branch count" sub-table to the relevant ``countermeasures/*.rst`` file with columns ``Function`` / ``Branches before`` / ``Branches after``. By the end of the Tier-1 + Tier-2 sprints, every CT-critical function in quantica has a reference baseline, and CI compares subsequent builds against it. **Day-to-day discipline today:** ctgrind (control-flow CT) and dudect (microarchitectural timing) are the two production indicators currently relied on. They cover the same failure class with stronger statistical guarantees, at the cost of a slower feedback loop than a simple ``grep`` over an asm dump.