################################################################### SLH-DSA — countermeasures ################################################################### :FIPS spec: :cite:`fips205` :Crate path: ``quantica::slh_dsa`` :Cargo feature: ``slh-dsa`` (on by default). SLH-DSA (SPHINCS+) is hash-based: it has no rejection sampling, no secret polynomial arithmetic, no NTT — only a large tree of SHAKE / SHA2 calls. That removes most of the classical PQC side-channel concerns and concentrates the remaining risks on three well-studied attack surfaces: * **Fault injection anywhere in the hash-tree construction**. Corrupting any intermediate hash — whether FORS, WOTS+, or an XMSS authentication node — produces a signature that verifies under a different subtree root; collecting a handful of faulted signatures yields a universal forgery (:cite:`castelnovi2018_grafting_trees`). Practical voltage-glitch realisation is documented in :cite:`genet2018_practical_fault_sphincs`, and recent work (:cite:`adiletta2025_slashdsa_rowhammer`) extends the threat to a purely software attacker via Rowhammer — making **fault redundancy** the central hardening axis for hash-based signatures. * **DPA on the PRF that expands ``SK.seed``** into WOTS+ and FORS leaf secrets. The same seed is reused across every leaf of every tree, so a DPA attacker accumulates arbitrarily many traces on a controllable input (:cite:`kannwischer2018_dpa_xmss_sphincs`). Countermeasure directions: masked PRF (:cite:`fluhrer2024_sca_resistant_sphincs`) or a threshold-implementation Keccak core (:cite:`saarinen2024_sloth_slhdsa`). * **Template / SPA on FORS index extraction and sibling ordering**. The FORS digit (a secret derived from the message digest via ``SK.PRF``) drives the order in which sibling subtrees are computed. Template matching on the PRF absorption patterns can recover the digit bit by bit (discussion in :cite:`kannwischer2018_dpa_xmss_sphincs`, SoK update in :cite:`dobias2025_sok_pqc_sca`). This chapter lists what is implemented today and what is scheduled for the next hardening round. Threat classes: :doc:`../threat_model`. .. contents:: :local: :depth: 2 Coverage matrix =============== .. list-table:: SLH-DSA countermeasure / threat matrix :header-rows: 1 :widths: 25 18 57 * - Threat - Status - Countermeasure(s) * - Fault on FORS / WOTS+ / XMSS (grafting-tree forgery) - implemented (``sca-fors-redundancy``) - Recompute-and-compare redundancy on the FORS signature path (``T1-C``, opt-in feature ``sca-fors-redundancy``), aligned with :cite:`genet2023_protecting_sphincs_faults`. Addresses both physical fault injection (:cite:`castelnovi2018_grafting_trees`, :cite:`genet2018_practical_fault_sphincs`) and software Rowhammer (:cite:`adiletta2025_slashdsa_rowhammer`). Consumes the constant-time ``fors_pk_from_sig`` (``T1-F``). * - DPA on the master PRF (``SK.seed`` → leaf secrets) - **planned** (tier 4, ``T4-B``) - First-order masking of the PRF call that derives WOTS+ and FORS leaf secrets, following the 3-share SHAKE posture of :cite:`fluhrer2024_sca_resistant_sphincs`; long-term alternative is a TI Keccak core (:cite:`saarinen2024_sloth_slhdsa`). * - SPA / template on FORS sibling PRF addresses (leaks idx bits) - implemented (``sca-fors-dummy-siblings``) - Full-tree streaming FORS sign (``T1-D``, opt-in feature ``sca-fors-dummy-siblings``): all ``2^A`` leaves of each FORS tree are absorbed in fixed order ``[base, base + 2^A)``, the leaf secret + auth-path siblings are extracted branchlessly via ``silentops::ct_copy`` / ``silentops::ct_eq_u32``. PRF address sequence becomes idx-independent, closing the per-bit template oracle of :cite:`kannwischer2018_dpa_xmss_sphincs`. Output bytes unchanged; cost ~2×. * - Fault on digest → FORS indices - implemented (``sca-fors-indices-check``) - Recompute-and-compare check at the tail of ``fors_sign_into`` (``T1-E``, opt-in feature ``sca-fors-indices-check``). The index vector is re-derived from ``md`` and CT-compared via ``silentops::ct_eq`` to the vector consumed during signing; on mismatch returns ``Err(SlhDsaError::FaultDetected)`` before the hypertree step runs. Output bytes unchanged; cost negligible. * - SPA on hypertree walk / memory-stack SPA on FORS - implemented (tier 2 RAM) - Iterative BDS treehash (``fors_node``) with data-independent stack depth; streaming signature emission avoids heap allocator side-channels. * - Software / remote timing - implemented - No secret-dependent early exit in the public signing path; all intermediate comparisons use ``silentops::ct_eq``. The branches ctgrind still flags during signing are on values that are byte-for-byte part of the emitted signature (``R``, ``digest``, indices) — interim suppression documented under :doc:`../verification`; scheduled removal via ``T2-D``. * - Template attacks on WOTS+ chain values - implemented - Same ``ct_*`` routing + fixed-length chain iteration (``chain_iter`` executes a constant number of F-hashes per chain). Memory / stack-timing — iterative treehash + streaming signature ================================================================ Principle --------- The FORS treehash, if written recursively, allocates ~256 KiB of stack in the worst parameter set. A recursive trace exposes a memory- access envelope that matches the tree geometry and indirectly leaks the FORS digit. The iterative variant keeps a BDS-style stack of ``z+1`` nodes (~448 B) and walks the tree with a loop counter that is independent of the secret. Streaming signature emission complements this: the final signature is allocated once at the top level and sub-slices are passed to ``fors_sign_into`` / ``ht_sign_into`` / ``xmss_sign_into`` / ``wots_sign_into``. No intermediate heap buffer is ever resized, so the allocator state cannot leak intermediate component sizes. Published basis --------------- * :cite:`bos2022_dilithium_memory_constrained` — the memory- constrained methodology used as inspiration (originally for ML-DSA but transferrable). * :cite:`eprint2025_sphincslet` — compact SLH-DSA variant whose engineering choices match our streaming approach. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - Iterative FORS treehash - ``quantica/src/slh_dsa/fors.rs`` ``fors_node`` * - Streaming sign entry point - ``quantica/src/slh_dsa/slh.rs`` ``slh_sign_internal`` (passes sub-slices of the output buffer to each layer's ``*_sign_into`` function). * - Per-layer streaming variants - ``quantica/src/slh_dsa/wots.rs`` ``wots_sign_into`` ; ``quantica/src/slh_dsa/xmss.rs`` ``xmss_sign_into`` ; ``quantica/src/slh_dsa/hypertree.rs`` ``ht_sign_into`` ; ``quantica/src/slh_dsa/fors.rs`` ``fors_sign_into`` Timing — no secret-dependent branches in the public path ======================================================== Principle --------- The SLH-DSA public-signing path (``slh_sign_internal``) is deterministic modulo the randomizer ``R`` and does not take a secret- dependent early exit. The three inner functions that do contain conditional branches during signing — ``fors::fors_pk_from_sig``, ``wots::chain_iter``, ``xmss::xmss_pk_from_sig`` — branch on ``(md, idx_tree, idx_leaf, digits)`` which are derived from ``R`` and the public keys. ``R`` is the first ``n`` bytes of the emitted signature; once transmitted, an observer recomputes these values from ``R`` and the public keys, so leaking them via timing is information-theoretically equivalent to reading the signature. This is formally documented as the SLH-DSA block of the ctgrind suppression file (:doc:`../verification` has the full threat-model paragraph); the suppression is scheduled to be closed by item ``T2-D`` below. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - Signing entry + component layout - ``quantica/src/slh_dsa/slh.rs`` ``slh_sign_internal`` * - Constant-time helpers used by verify - ``quantica/src/slh_dsa/slh.rs`` (``slh_verify_internal`` uses ``silentops::ct_eq`` for the final PK equality check). DFA / fault injection — current posture ======================================= SLH-DSA has no rejection sampling and no double representation of intermediates, so the current implementation does **not** yet include a DFA hardening layer. This is known to be the dominant residual risk for hash-based signatures: a single-fault universal forgery is the canonical attack class since :cite:`castelnovi2018_grafting_trees`, with a practical voltage- glitch realisation in :cite:`genet2018_practical_fault_sphincs` and, more recently, a purely software Rowhammer realisation in :cite:`adiletta2025_slashdsa_rowhammer` — the latter removing the "needs a lab" argument that previously justified deferring this layer. ``T1-C`` (the canonical recompute-and-compare redundancy) and its CT prerequisite ``T1-F`` have shipped — see below; ``T1-E`` (digest → FORS-indices integrity check) remains planned for the next sprint. Planned hardening ========================== The following items are planned for the next hardening round. Signatures are provided as rustdoc sketches ahead of implementation — the code stubs are deliberately left out so that the API surface can be reviewed before implementation starts. T1-C — FORS signature redundancy — **shipped** ---------------------------------------------- **Addresses:** grafting-tree universal forgery (:cite:`castelnovi2018_grafting_trees`, :cite:`genet2018_practical_fault_sphincs`, :cite:`adiletta2025_slashdsa_rowhammer`). Canonical recommendation of :cite:`genet2023_protecting_sphincs_faults`: sign the FORS component twice, compare the results in constant time, abort on divergence before the signature can leave the device. **Implementation:** ``fors::fors_sign_into_redundant`` in ``quantica/src/slh_dsa/fors.rs``, gated by the ``sca-fors-redundancy`` cargo feature. The routine signs FORS twice into independent heap- backed [``SecretBytes``] scratch buffers, derives the FORS public key from each signature via the constant-time ``fors_pk_from_sig`` (``T1-F``), then compares **both signatures and both derived public keys** under ``silentops::ct_eq``. On any mismatch it returns ``Err(SlhDsaError::FaultDetected)`` *without* writing anything into the caller's signature buffer — the faulted signature never propagates. On a clean run it copies the validated signature into ``out`` and returns the FORS pk, which the caller (``slh::slh_sign_internal_redundant``) feeds straight into the hypertree signer. .. code-block:: rust /// Recompute-and-compare FORS signing (T1-C). Returns the validated /// FORS public key, or `Err(FaultDetected)` on a single-fault attack /// against the FORS hash chain. pub fn fors_sign_into_redundant( md: &[u8], sk_seed: &[u8], pk_seed: &[u8], adrs_template: &Adrs, out: &mut [u8], ) -> Result, SlhDsaError>; Comparing both surfaces (signature bytes *and* derived pk) is defence-in-depth: a fault that corrupts auth-path bytes might round-trip to the same FORS root under the verifier path; the byte- level ``ct_eq`` catches that case. Symmetrically, a fault inside the second ``fors_pk_from_sig`` derivation is caught by the pk ``ct_eq``. Both checks together cost a single extra ``ct_eq`` and are paid only on the slow path that already runs the FORS signer twice. **Abort posture** — unlike ML-KEM's double-decaps + branchless fault-fallback (:doc:`ml_kem`), this routine aborts rather than substituting a fault-derived value. The asymmetry is deliberate: a KEM must always return a shared secret, while a signer that detects a fault must, per :cite:`genet2023_protecting_sphincs_faults`, refuse to emit so the faulted signature does not propagate. **Dispatch.** The public ``SlhDsa::

::sign`` switches between the redundant path (``slh::slh_sign_internal_redundant``) and the historic non-redundant path (``slh::slh_sign_internal``) at compile time via ``#[cfg(feature = "sca-fors-redundancy")]``. The non-redundant path stays publicly re-exported as the CAVP / KAT deterministic entry point. **Validation.** Three module tests in ``fors.rs``: * ``fors_sign_into_redundant_matches_reference_shake128s`` / ``…_shake128f`` — drive the redundant path on multiple ``seed × message`` permutations and assert that (a) the validated signature is byte-identical to the non-redundant ``fors_sign_into`` output, and (b) the returned FORS pk matches the standalone ``fors_pk_from_sig`` derivation from the produced signature. * ``fors_redundancy_compare_detects_divergence`` — exercises the internal ``fors_redundancy_compare`` helper with synthetically divergent buffers (signature mismatch, pk mismatch, both) and asserts each surfaces ``Err(FaultDetected)``; the all-equal case surfaces ``Ok``. Lets us validate the abort logic without injecting a real fault into the FORS signer. **Cost.** One extra ``fors_sign_into`` (~1× FORS signing time again) plus two ``fors_pk_from_sig`` derivations and two ``silentops::ct_eq`` checks. The bulk is the second signing — mirrors the double-decaps posture of ML-KEM in spirit. **Memory.** One ``SecretBytes`` scratch of length ``fors_sig_len = K * (1 + A) * N`` (~10 KiB for SHAKE-256s, ~7 KiB for SHAKE-128f) heap-allocated so the M0 baseline stack budget stays honest, drop- zeroized on both the success and the abort path. T1-D — full-tree streaming FORS sign — **shipped** -------------------------------------------------- **Addresses:** template attack on FORS sibling PRF addresses (:cite:`kannwischer2018_dpa_xmss_sphincs`). In the FIPS-205 default path, the address passed to ``fors_node`` during the authentication- path loop is ``base + s * 2^j`` where ``s = floor(idx / 2^j) XOR 1`` — the upper ``(A - j)`` bits of the secret FORS digit ``idx`` with the lowest bit flipped. The set of addresses absorbed by Keccak across ``j ∈ [0, A)`` reveals ``idx`` byte-by-byte to a template attacker. **Implementation:** the per-FORS-tree inner loop of ``fors::fors_sign_into`` (gated by the ``sca-fors-dummy-siblings`` cargo feature) is replaced by a single **BDS-style full-tree streaming traversal**: 1. Iterate ``k`` from ``0`` to ``2^A - 1`` in fixed order. 2. For each leaf at position ``leaf_idx = base + k``: * Generate the leaf secret via ``fors_sk_gen`` (absorbs the idx-independent address ``set_tree_index(leaf_idx)``). * Branchlessly save the leaf secret into the signature's "leaf secret" slot if ``k == idx``, via ``silentops::ct_copy`` guarded by ``silentops::ct_eq_u32``. * Hash the leaf via ``f_hash``; push the height-0 node onto a BDS stack. * Iteratively merge same-height stack tops via ``hash_h`` (absorbs idx-independent ``set_tree_index(absolute_pos)`` where ``absolute_pos`` depends only on ``i`` and ``k``). * At each merge to height ``h``, branchlessly save the resulting node to ``auth_path[h]`` if ``(k >> h) == ((idx >> h) XOR 1)``. After streaming all ``2^A`` leaves, the BDS stack contains exactly one node — the FORS root, discarded (the caller re-derives it via ``fors_pk_from_sig``). Both the leaf secret and the ``A`` auth-path siblings are populated in the signature slot. **Signature stays unchanged** — the output bytes are byte-identical to the FIPS-205 default path on every input (KAT-verified across all six SHAKE parameter sets, with and without ``sca-fors-redundancy`` composed). .. code-block:: rust /// `fors_sign_into` under `sca-fors-dummy-siblings` — sketch. for k in 0..(1u32 << P::A) { let leaf_idx = base + k; let sk = fors_sk_gen::

(sk_seed, pk_seed, &mut adrs, leaf_idx); silentops::ct_copy(leaf_slot, &sk, silentops::ct_eq_u32(k, idx)); let mut node = hash::f_hash::

(pk_seed, &mut adrs, &sk); let mut height = 0u32; let mut local_pos = k; silentops::ct_copy( &mut auth_slot[0..P::N], &node, silentops::ct_eq_u32(local_pos, idx ^ 1), ); while let Some(&(_, top_h)) = stack.last() { if top_h != height { break; } // ... pop, merge, save auth_slot[h] branchlessly ... } stack.push((node, height)); } **What this kills.** The Keccak absorption sequence becomes a deterministic function of the public FORS-tree index ``i`` only; no ``idx``-dependent address ever reaches the PRF. The template oracle of :cite:`kannwischer2018_dpa_xmss_sphincs` is closed for FORS signing. The same reasoning protects against DPA on the leaf-secret PRF (``fors_sk_gen``) since its address argument is likewise ``idx``-independent in the streamed path. **Cost.** Roughly **2× the default FORS hash count** per signature. The default path computes ``sum_{j=0..A-1} 2^j = 2^A - 1`` leaves across the auth-path subtrees; the full-tree stream computes ``2^A`` leaves + ``2^A - 1`` internal merges. KAT wall-time (host x86_64) goes from ~80 s to ~135 s under ``--features slh-dsa,sca-fors-dummy-siblings`` — ratio consistent with the predicted ~2×. **Memory.** Stack budget unchanged at ``O(A * N)`` for the BDS stack (same as the existing iterative treehash in ``fors_node``, ``quantica/src/slh_dsa/fors.rs:62-118``). No new heap hot-spot. **Historical correction.** An earlier draft of this section described T1-D as "compute both possible siblings (``s = 0`` and ``s = 1``) at fixed positions, select the right one branchlessly". That framing is wrong: FIPS-205 Algorithm 16 has ``s = floor(idx / 2^j) XOR 1``, multi-bit, taking values in ``[0, 2^(A-j))`` at level ``j``. At ``j = 0`` (deepest level) the sibling sits at one of up to ``2^A`` ``idx``-dependent positions, not at one of a fixed pair. A first implementation along the "two-candidate" line silently produced non-FIPS-compliant signatures (5/16 KAT vectors diverged). The full-tree streaming traversal documented above is the only mechanism that produces an ``idx``-independent address sequence at the same asymptotic cost. **Validation.** End-to-end KAT (``cargo test --release -p quantica --test slh_dsa_kat --features slh-dsa,sca-fors-dummy-siblings``) — 16/16 vectors byte-identical to the default path. Lib tests (``cargo test --release -p quantica --lib --features sca-fors-dummy-siblings``) — 5/5 green; composition with ``sca-fors-redundancy`` also green (``--features sca-fors-dummy-siblings,sca-fors-redundancy``). Aligns with the SLotH threshold-implementation posture (:cite:`saarinen2024_sloth_slhdsa`). **Out of scope.** Extension of full-tree streaming to WOTS+ chains inside the hypertree — same template-oracle reasoning applies but the leak surface is smaller; tracked as a Tier-4 candidate. T1-E — digest → indices integrity check — **shipped** ------------------------------------------------------ **Addresses:** single-fault attack forcing one of the FORS indices to a controlled value (zero-index variant of :cite:`castelnovi2018_grafting_trees`). The corruption reveals ``PRF(SK.seed, addr_0)`` cleanly. Even with ``T1-D`` (full-tree streaming) shipped, a fault during the upstream ``message_to_indices`` derivation, or during the digest extraction itself, could redirect the leaf-secret commit to a faulted position before the streaming traversal kicks in. **Implementation:** at the tail of ``fors::fors_sign_into``, the FORS index vector is **re-derived from the same ``md`` slice** and **CT-compared** to the vector consumed during signing. The check is gated by the ``sca-fors-indices-check`` cargo feature; on a mismatch ``fors_sign_into`` returns ``Err(SlhDsaError::FaultDetected)``, the ``slh_sign_internal`` caller propagates via ``?``, and the hypertree-signing step never runs — the faulted FORS sub-signature never gets wrapped into a full signature emitted to the host. .. code-block:: rust pub(crate) fn fors_indices_consistency_check( md: &[u8], used: &[u32], ) -> Result<(), SlhDsaError> { let recomputed = message_to_indices::

(md); if recomputed.len() != used.len() { return Err(SlhDsaError::FaultDetected); } let used_b: Vec = used.iter().flat_map(|x| x.to_le_bytes()).collect(); let rec_b: Vec = recomputed.iter().flat_map(|x| x.to_le_bytes()).collect(); if silentops::ct_eq(&used_b, &rec_b) != 1 { return Err(SlhDsaError::FaultDetected); } Ok(()) } The fresh derivation is run on the **same** ``md`` slice, so a fault that lands persistently on ``md`` itself (e.g. Rowhammer on the stack region holding the digest) passes the check; that threat is the redundant-signing class ``T1-C`` already covers (two independent FORS signings see different intermediate state). ``T1-E`` specifically catches transient faults in the ``base_2b`` bit-extraction or in the index vector storage between production and consumption. **Cost.** One extra ``message_to_indices`` (= one ``base_2b``) per FORS signature — negligible byte-shuffling, no hashing, ``K * A / 8`` bytes processed. The two ``Vec`` serialisations for ``silentops::ct_eq`` allocate ``4 * K`` bytes each, freed at function return; well under any M0-baseline budget. **Composition.** Orthogonal to ``T1-C`` (which compares two independent FORS signings to catch in-FORS faults) and to ``T1-D`` (which closes the template oracle on Keccak addresses). Under ``--features sca-fors-redundancy``, ``T1-C``'s ``fors_sign_into_redundant`` calls ``fors_sign_into`` twice and each call independently runs the ``T1-E`` check (if also enabled). KAT determinism preserved in every combination (``sca-fors-indices-check`` on its own; combined with ``T1-D``; combined with ``T1-D + T1-C``). **Validation.** Lib tests ``fors_indices_check_accepts_correct_shake128s`` / ``…_shake128f`` exercise the positive path on multiple seed permutations. ``fors_indices_check_rejects_flipped_index`` drives the helper with synthetically corrupted index vectors (one bit flipped, and a length mismatch) and asserts ``FaultDetected`` in each case. End-to-end determinism: KAT ``cargo test --release -p quantica --test slh_dsa_kat --features slh-dsa,sca-fors-indices-check`` — 16/16 vectors byte-identical to the default path (~85 s wall-time vs ~80 s default, the overhead is in the integrity check, the signing itself is unchanged). T4-B — PRF masking ------------------ **Addresses:** DPA on ``SK.seed`` through the FORS / WOTS+ leaf PRF (:cite:`kannwischer2018_dpa_xmss_sphincs`). The baseline construction is :cite:`fluhrer2024_sca_resistant_sphincs` (3-share SHAKE), with a hardware-side alternative documented in :cite:`saarinen2024_sloth_slhdsa`. **Planned API** (transparent wrapper over the existing ``hash::prf``): .. code-block:: rust /// 3-share masked PRF. Emits the same byte string as /// `hash::prf` but keeps `sk_seed` split into shares through /// every SHAKE-absorb step, per Fluhrer's construction. #[cfg(feature = "sca-masked-prf")] pub fn prf_masked( pk_seed: &[u8], sk_seed_s: &MaskedSeed, // two shares of SK.seed adrs: &Adrs, ) -> Vec; Cost: roughly 1.7× per signature. Gated behind an opt-in feature until SHAKE masking lands in ``silentops``. T1-F — constant-time ``fors_pk_from_sig`` — **shipped** ------------------------------------------------------- **Addresses:** the secret-dependent branch ``if ((idx >> j) & 1) == 0 { ... } else { ... }`` inside the original FIPS-205 Algorithm 17. Verifier-side, the branch is on public data; but when the same routine is reused under ``T1-C`` as part of the signing-side redundancy check, its input becomes secret and a Rust ``if`` would re-introduce a timing leak. **Implementation:** ``fors::fors_pk_from_sig`` in ``quantica/src/slh_dsa/fors.rs`` was reworked to a single constant-time routine. For every authentication-path level, the original branch is replaced by a byte-wise ``silentops::ct_select_u8`` cswap that materialises the ``(left, right)`` ``hash_h`` inputs into two ``N``-byte stack buffers, then calls ``hash_h(left, right)`` unconditionally. The ``tree_index`` written into ``adrs`` is identical in both original branches so it needs no extra masking. Scratch buffers are ``silentops::ct_zeroize``-d at the end of the routine. .. code-block:: rust /// Constant-time FORS pk-from-sig (FIPS-205 Alg. 17). The /// secret-dependent `hash_h` argument ordering is resolved by /// a branchless `silentops::ct_select_u8` cswap. Single routine — /// used by both the standalone verifier and the T1-C signing-side /// redundancy check (where `idx` is secret). pub fn fors_pk_from_sig( sig_fors: &[u8], md: &[u8], pk_seed: &[u8], adrs: &mut Adrs, ) -> Vec; A previous variable-time sibling has been removed: keeping a single CT implementation eliminates the foot-gun of a future call site picking a leaky variant by autocomplete. **Validation:** two round-trip tests in ``quantica/src/slh_dsa/fors.rs`` (``fors_pk_from_sig_round_trip_shake128s`` and ``…_shake128f``) exercise the sign → pk-from-sig pipeline across multiple seed / message permutations and assert that two back-to-back derivations agree (determinism) and produce ``N``-byte outputs. End-to-end correctness against FIPS-205 reference output is covered by the KAT suite ``quantica/tests/slh_dsa_kat.rs``. Cost: ``2 * N`` byte scratch (~32 B for SHAKE-128, ~64 B for SHAKE-256) plus ``2 * N * A`` ``ct_select_u8`` calls per FORS tree per signature. Negligible compared to the underlying SHAKE work. T2-D — explicit unpoison of ``R``, ``digest``, indices ------------------------------------------------------ Programmatic proof to ctgrind that the branches inside ``fors::fors_pk_from_sig``, ``wots::chain_iter``, ``xmss::xmss_sign_into`` / ``xmss_pk_from_sig`` are on data that has reached the "publish-ready" state. Closes the four suppressions listed in ``tools/ctgrind.supp``. Zero-cost on production builds.