###################################################################
Verification methodology
###################################################################

Claims in the previous chapters are only as strong as the evidence
that the code in ``quantica`` actually behaves the way those chapters
describe. This chapter catalogues the **evidence artefacts** the
project maintains and how they are produced.

Two complementary techniques are used:

* **ctgrind** — Valgrind memcheck client-requests applied to a
  dedicated test binary. Covers *control-flow* and *memory-access*
  constant-time properties, run on the host CI.
* **silentops::verify** (dudect) — statistical timing test, run on
  the target hardware. Covers *microarchitectural* timing properties
  (cache, branch predictor, variable-latency instructions).

Neither alone is sufficient; both are required for a credible CT
assurance story.

.. contents::
   :local:
   :depth: 2

ctgrind (Valgrind memcheck) — control-flow CT
=============================================

Methodology
-----------

The technique was first described by Adam Langley in 2010
:cite:`langley2010ctgrind`. A memory buffer holding a secret is
marked **undefined** via the Valgrind ``make_mem_undefined``
client request. Any subsequent conditional branch, pointer
dereference, or syscall argument that consumes bits of the
undefined buffer raises a memcheck error — i.e. any
control-flow or addressing decision that depends on the secret is
reported.

Because memcheck propagates definedness through pure arithmetic,
masked bitwise selects (``b ^ (mask & (a ^ b))``) do not raise
errors; only the downstream *use* of their result in a branch or
load does. A correctly written CT primitive therefore passes
ctgrind cleanly; a bug such as a pointer-CMOV → load (as seen on
the early ``quantica`` ``ct_select``) triggers an error on the
load.

Implementation
--------------

* ``silentops::ct_grind::poison`` / ``unpoison`` emit the Valgrind
  client requests via inline ``asm!`` — no C shim, no crate
  dependency. Surrounded by ``compiler_fence(SeqCst)`` to prevent
  LLVM from reordering memory reads past a poison boundary.
* ``quantica_bench/src/bin/ctgrind.rs`` is the driver binary: it
  poisons only the truly secret portions of each ``sk`` (skipping
  the public components that FIPS layouts embed inline), runs
  keygen + encaps/decaps or sign/verify for each parameter set of
  each algorithm, then unpoisons before the correctness round-trip.
* ``tools/ctgrind.sh`` wraps ``valgrind --error-exitcode=1
  --track-origins=yes`` with the suppression file documented below.
* ``tools/ctgrind.supp`` holds named, individually justified
  suppressions — one stanza per legitimate finding, each with a
  paragraph of threat-model reasoning.

Demo / self-test
----------------

``silentops/examples/ct_grind_demo.rs`` exercises both a CT path
(``silentops::ct_eq`` on a poisoned buffer, should pass) and an
intentional leak (``if secret[0] > 0x80``, should be reported).
Running ``tools/ctgrind.sh demo`` validates the entire plumbing in
one command — any failure on the demo means the infrastructure is
broken and subsequent algorithm runs are not trustworthy.

Running against the production algorithms
-----------------------------------------

::

    # all three algorithms, all parameter sets
    tools/ctgrind.sh all

    # individual runs
    tools/ctgrind.sh mlkem
    tools/ctgrind.sh mldsa
    tools/ctgrind.sh slh

Exit code ``0`` means "no unsuppressed findings". Exit code ``1``
means memcheck reported a new finding that is not justified in the
suppression file — this is a CT regression and must be fixed or
explicitly added to the suppression file **with a new threat-model
paragraph in this document**.

Current suppressions
--------------------

The file ``tools/ctgrind.supp`` lists the known, documented
findings for each algorithm. They fall in two families:

1. **Branches on data that becomes publicly emitted** — ML-DSA
   ``decompose::high_bits_vec``, ``encode::w1_encode``,
   ``decompose::make_hint_vec``, ``encode::sig_encode``, and the
   inlined rejection-loop intermediates inside
   ``MlDsa::<P>::sign``. SLH-DSA ``fors::fors_pk_from_sig``,
   ``wots::chain_iter``, ``xmss::xmss_sign_into``,
   ``xmss::xmss_pk_from_sig``.
   The leaked bits are byte-for-byte identical to bytes present in
   the emitted signature, so timing observation reveals no
   information beyond what the signature itself reveals.

2. **Rejection sampling on a PRF-derived stream** — ML-DSA
   ``shuffle::generate_permutation``. The rejection condition
   ``if r < limit`` branches on one byte of a SHAKE256 output
   computationally indistinguishable from uniform without ``K``;
   leaking SHAKE output bits is at least as hard as breaking
   SHAKE256's PRF property.

Each suppression has an inline rationale in ``tools/ctgrind.supp``;
they are reproduced in the per-algorithm chapters as well
(:doc:`countermeasures/ml_dsa`, :doc:`countermeasures/slh_dsa`).

.. admonition:: Roadmap — all suppressions closed before release
   :class: important

   Each suppression corresponds to a **T4-x** item in the Tier-4
   hardening roadmap. A release tree is built from a state where
   each item has been implemented and the corresponding suppression
   removed. The present draft documents both postures so that the
   reviewer sees the *current code* and the *intended end state*
   side by side.

dudect (``silentops::verify``) — microarchitectural timing
==========================================================

Methodology
-----------

The Reparaz–Balasch–Verbauwhede methodology
:cite:`reparaz2017dudect` proceeds as follows:

1. Build two *input classes* — typically a "fixed" (worst-case)
   input and a "random" input.
2. Interleave measurements of the two classes randomly to
   decorrelate from slow system drifts.
3. Feed each measurement into a Welch t-test maintained
   incrementally with Welford's algorithm.
4. If ``|t| > 4.5`` after a reasonable sample count, reject the
   null hypothesis "the two classes have the same mean timing"
   (``p < 10⁻⁵``).

``silentops::verify`` packages the statistical machinery as a
library — ``TTest``, ``Xorshift64``, ``measure_ns``, ``report`` —
so consumers write their own class builders and harness.

Reference harness
-----------------

``silentops/examples/ct_verify_pqc.rs`` provides a canonical
harness exercising:

* ML-KEM-768 Decaps (valid vs. random ciphertext) — expected PASS;
* ML-KEM Barrett reduce (small vs. large input) — expected PASS;
* ML-DSA-44 Verify (valid vs. invalid signature) — expected PASS;
* ML-DSA-44 Sign (two messages) — expected FAIL *on timing
  variation* because the FIPS 204 rejection loop is inherently
  non-constant-time in the number of iterations. This is not a
  vulnerability (the variation is message-dependent but not
  secret-key-dependent) but dudect cannot distinguish the two —
  the test comment makes this explicit.

Role in the verification pipeline
---------------------------------

ctgrind catches *control-flow* leaks — any branch or addressing
decision that depends on a secret. dudect catches *timing* leaks
that are invisible to ctgrind because they arise from
microarchitectural state: branch-predictor warming, port
contention on CPUs with µ-ops that exhibit input-dependent
latency, and cache eviction patterns that manifest only on real
hardware.

The two are not redundant — they look at disjoint failure modes.
A high-assurance verification posture runs both:

* ctgrind on the CI host for every merge (blocking);
* dudect on the target platform (representative MCU / SE dev
  board) on a nightly or release-candidate cadence.

Integration plan (Gitea workflows, upcoming)
============================================

The Gitea workflows being set up on the home-lab runner (``turtle``)
will wire both verification techniques into the merge gate:

* **Per-PR** — fast ctgrind run (mlkem + mldsa + slh) against the
  ``ct-grind`` feature build; failure blocks merge.
* **Nightly** — full dudect harness on the Cortex-M4 dev board,
  with trace counts sized for statistical power at
  ``p < 10⁻⁵``; report uploaded as a build artefact.
* **Pre-release tag** — stack-sizes per-function analysis,
  cross-architecture build (``thumbv6m``, ``thumbv7em``,
  ``thumbv8m.main``, ``riscv32imc``), and the QEMU-user ctgrind
  run covering ``aarch64`` / ``armv7`` / ``riscv64``.

The CI definitions themselves live in ``.gitea/workflows/`` once
committed. This chapter will be revised to reference them directly
as soon as they land.

release-asm branch counting — methodology and current state
===========================================================

A complementary day-to-day CT regression indicator, called for in
the workspace-level ``SECURITY.md`` (Section 4.4), is **expected
branch count per CT-critical function** measured on the release
build. The methodology is:

#. Compile the function under ``cargo rustc --release -p quantica
   --emit=asm``.
#. Filter the resulting ``.s`` for conditional jump opcodes
   (``je`` / ``jne`` / ``jz`` / ``jnz`` / ``jl`` / ``jle`` /
   ``jg`` / ``jge`` / ``ja`` / ``jb`` on x86_64, the equivalent
   ``b.eq`` / ``b.ne`` / … on aarch64, ``cbz`` / ``cbnz`` / ``b.cs``
   on Cortex-M).
#. Compare the count against an **expected reference** baked into
   the relevant ``countermeasures/<algo>.rst`` chapter. Any
   regression — say, an unintended branch reintroduced by a
   compiler upgrade or by a refactor — surfaces as a count delta on
   the next CI run.

.. admonition:: Current gap (honest disclosure)
   :class: important

   The methodology above is documented but the per-countermeasure
   *reference branch counts* are **not yet populated** in
   :doc:`countermeasures/ml_kem`, :doc:`countermeasures/ml_dsa`, or
   :doc:`countermeasures/slh_dsa` as of the current draft. The
   workspace-level ``SECURITY.md`` row that mentions documented
   branch counts therefore describes the **target** posture, not
   the current state.

   **How the gap will be closed:** each Tier-1 / Tier-2 item from
   the roadmap (see ``quantica/README.md`` "Roadmap" chapter)
   produces a delta on one or more CT-critical functions. The
   commit that lands the item also adds an "Expected branch count"
   sub-table to the relevant ``countermeasures/*.rst`` file with
   columns ``Function`` / ``Branches before`` / ``Branches after``.
   By the end of the Tier-1 + Tier-2 sprints, every CT-critical
   function in quantica has a reference baseline, and CI compares
   subsequent builds against it.

   **Day-to-day discipline today:** ctgrind (control-flow CT) and
   dudect (microarchitectural timing) are the two production
   indicators currently relied on. They cover the same failure
   class with stronger statistical guarantees, at the cost of a
   slower feedback loop than a simple ``grep`` over an asm dump.