Verification methodology

Claims in the previous chapters are only as strong as the evidence that the code in quantica actually behaves the way those chapters describe. This chapter catalogues the evidence artefacts the project maintains and how they are produced.

Two complementary techniques are used:

ctgrind — Valgrind memcheck client-requests applied to a dedicated test binary. Covers control-flow and memory-access constant-time properties, run on the host CI.
silentops::verify (dudect) — statistical timing test, run on the target hardware. Covers microarchitectural timing properties (cache, branch predictor, variable-latency instructions).

Neither alone is sufficient; both are required for a credible CT assurance story.

ctgrind (Valgrind memcheck) — control-flow CT 

Methodology 

The technique was first described by Adam Langley in 2010 [Lan10]. A memory buffer holding a secret is marked undefined via the Valgrind make_mem_undefined client request. Any subsequent conditional branch, pointer dereference, or syscall argument that consumes bits of the undefined buffer raises a memcheck error — i.e. any control-flow or addressing decision that depends on the secret is reported.

Because memcheck propagates definedness through pure arithmetic, masked bitwise selects (b ^ (mask & (a ^ b))) do not raise errors; only the downstream use of their result in a branch or load does. A correctly written CT primitive therefore passes ctgrind cleanly; a bug such as a pointer-CMOV → load (as seen on the early quantica ct_select) triggers an error on the load.

Implementation 

silentops::ct_grind::poison / unpoison emit the Valgrind client requests via inline asm! — no C shim, no crate dependency. Surrounded by compiler_fence(SeqCst) to prevent LLVM from reordering memory reads past a poison boundary.
quantica_bench/src/bin/ctgrind.rs is the driver binary: it poisons only the truly secret portions of each sk (skipping the public components that FIPS layouts embed inline), runs keygen + encaps/decaps or sign/verify for each parameter set of each algorithm, then unpoisons before the correctness round-trip.
tools/ctgrind.sh wraps valgrind --error-exitcode=1 --track-origins=yes with the suppression file documented below.
tools/ctgrind.supp holds named, individually justified suppressions — one stanza per legitimate finding, each with a paragraph of threat-model reasoning.

Demo / self-test 

silentops/examples/ct_grind_demo.rs exercises both a CT path (silentops::ct_eq on a poisoned buffer, should pass) and an intentional leak (if secret[0] > 0x80, should be reported). Running tools/ctgrind.sh demo validates the entire plumbing in one command — any failure on the demo means the infrastructure is broken and subsequent algorithm runs are not trustworthy.

Running against the production algorithms 

# all three algorithms, all parameter sets
tools/ctgrind.sh all

# individual runs
tools/ctgrind.sh mlkem
tools/ctgrind.sh mldsa
tools/ctgrind.sh slh

Exit code 0 means “no unsuppressed findings”. Exit code 1 means memcheck reported a new finding that is not justified in the suppression file — this is a CT regression and must be fixed or explicitly added to the suppression file with a new threat-model paragraph in this document.

Current suppressions 

The file tools/ctgrind.supp lists the known, documented findings for each algorithm. They fall in two families:

Branches on data that becomes publicly emitted — ML-DSA decompose::high_bits_vec, encode::w1_encode, decompose::make_hint_vec, encode::sig_encode, and the inlined rejection-loop intermediates inside MlDsa::<P>::sign. SLH-DSA fors::fors_pk_from_sig, wots::chain_iter, xmss::xmss_sign_into, xmss::xmss_pk_from_sig. The leaked bits are byte-for-byte identical to bytes present in the emitted signature, so timing observation reveals no information beyond what the signature itself reveals.
Rejection sampling on a PRF-derived stream — ML-DSA shuffle::generate_permutation. The rejection condition if r < limit branches on one byte of a SHAKE256 output computationally indistinguishable from uniform without K; leaking SHAKE output bits is at least as hard as breaking SHAKE256’s PRF property.

Each suppression has an inline rationale in tools/ctgrind.supp; they are reproduced in the per-algorithm chapters as well (ML-DSA — countermeasures, SLH-DSA — countermeasures).

Roadmap — all suppressions closed before release

Each suppression corresponds to a T4-x item in the Tier-4 hardening roadmap. A release tree is built from a state where each item has been implemented and the corresponding suppression removed. The present draft documents both postures so that the reviewer sees the current code and the intended end state side by side.

dudect (`silentops::verify`) — microarchitectural timing 

Methodology 

The Reparaz–Balasch–Verbauwhede methodology [RBV17] proceeds as follows:

Build two input classes — typically a “fixed” (worst-case) input and a “random” input.
Interleave measurements of the two classes randomly to decorrelate from slow system drifts.
Feed each measurement into a Welch t-test maintained incrementally with Welford’s algorithm.
If |t| > 4.5 after a reasonable sample count, reject the null hypothesis “the two classes have the same mean timing” (p < 10⁻⁵).

silentops::verify packages the statistical machinery as a library — TTest, Xorshift64, measure_ns, report — so consumers write their own class builders and harness.

Reference harness 

silentops/examples/ct_verify_pqc.rs provides a canonical harness exercising:

ML-KEM-768 Decaps (valid vs. random ciphertext) — expected PASS;
ML-KEM Barrett reduce (small vs. large input) — expected PASS;
ML-DSA-44 Verify (valid vs. invalid signature) — expected PASS;
ML-DSA-44 Sign (two messages) — expected FAIL on timing variation because the FIPS 204 rejection loop is inherently non-constant-time in the number of iterations. This is not a vulnerability (the variation is message-dependent but not secret-key-dependent) but dudect cannot distinguish the two — the test comment makes this explicit.

Role in the verification pipeline 

ctgrind catches control-flow leaks — any branch or addressing decision that depends on a secret. dudect catches timing leaks that are invisible to ctgrind because they arise from microarchitectural state: branch-predictor warming, port contention on CPUs with µ-ops that exhibit input-dependent latency, and cache eviction patterns that manifest only on real hardware.

The two are not redundant — they look at disjoint failure modes. A high-assurance verification posture runs both:

ctgrind on the CI host for every merge (blocking);
dudect on the target platform (representative MCU / SE dev board) on a nightly or release-candidate cadence.

Integration plan (Gitea workflows, upcoming)

The Gitea workflows being set up on the home-lab runner (turtle) will wire both verification techniques into the merge gate:

Per-PR — fast ctgrind run (mlkem + mldsa + slh) against the ct-grind feature build; failure blocks merge.
Nightly — full dudect harness on the Cortex-M4 dev board, with trace counts sized for statistical power at p < 10⁻⁵; report uploaded as a build artefact.
Pre-release tag — stack-sizes per-function analysis, cross-architecture build (thumbv6m, thumbv7em, thumbv8m.main, riscv32imc), and the QEMU-user ctgrind run covering aarch64 / armv7 / riscv64.

The CI definitions themselves live in .gitea/workflows/ once committed. This chapter will be revised to reference them directly as soon as they land.

release-asm branch counting — methodology and current state 

A complementary day-to-day CT regression indicator, called for in the workspace-level SECURITY.md (Section 4.4), is expected branch count per CT-critical function measured on the release build. The methodology is:

Compile the function under cargo rustc --release -p quantica --emit=asm.
Filter the resulting .s for conditional jump opcodes (je / jne / jz / jnz / jl / jle / jg / jge / ja / jb on x86_64, the equivalent b.eq / b.ne / … on aarch64, cbz / cbnz / b.cs on Cortex-M).
Compare the count against an expected reference baked into the relevant countermeasures/<algo>.rst chapter. Any regression — say, an unintended branch reintroduced by a compiler upgrade or by a refactor — surfaces as a count delta on the next CI run.

Current gap (honest disclosure)

The methodology above is documented but the per-countermeasure reference branch counts are not yet populated in ML-KEM — countermeasures, ML-DSA — countermeasures, or SLH-DSA — countermeasures as of the current draft. The workspace-level SECURITY.md row that mentions documented branch counts therefore describes the target posture, not the current state.

How the gap will be closed: each Tier-1 / Tier-2 item from the roadmap (see quantica/README.md “Roadmap” chapter) produces a delta on one or more CT-critical functions. The commit that lands the item also adds an “Expected branch count” sub-table to the relevant countermeasures/*.rst file with columns Function / Branches before / Branches after. By the end of the Tier-1 + Tier-2 sprints, every CT-critical function in quantica has a reference baseline, and CI compares subsequent builds against it.

Day-to-day discipline today: ctgrind (control-flow CT) and dudect (microarchitectural timing) are the two production indicators currently relied on. They cover the same failure class with stronger statistical guarantees, at the cost of a slower feedback loop than a simple grep over an asm dump.