###################################################################
SLH-DSA — countermeasures
###################################################################

:FIPS spec:   :cite:`fips205`
:Crate path:  ``quantica::slh_dsa``
:Cargo feature:  ``slh-dsa`` (on by default).

SLH-DSA (SPHINCS+) is hash-based: it has no rejection sampling, no
secret polynomial arithmetic, no NTT — only a large tree of SHAKE /
SHA2 calls. That removes most of the classical PQC side-channel
concerns and concentrates the remaining risks on three well-studied
attack surfaces:

* **Fault injection anywhere in the hash-tree construction**.
  Corrupting any intermediate hash — whether FORS, WOTS+, or an
  XMSS authentication node — produces a signature that verifies
  under a different subtree root; collecting a handful of faulted
  signatures yields a universal forgery
  (:cite:`castelnovi2018_grafting_trees`). Practical voltage-glitch
  realisation is documented in
  :cite:`genet2018_practical_fault_sphincs`, and recent work
  (:cite:`adiletta2025_slashdsa_rowhammer`) extends the threat to a
  purely software attacker via Rowhammer — making **fault
  redundancy** the central hardening axis for hash-based
  signatures.
* **DPA on the PRF that expands ``SK.seed``** into WOTS+ and FORS
  leaf secrets. The same seed is reused across every leaf of every
  tree, so a DPA attacker accumulates arbitrarily many traces on a
  controllable input
  (:cite:`kannwischer2018_dpa_xmss_sphincs`). Countermeasure
  directions: masked PRF (:cite:`fluhrer2024_sca_resistant_sphincs`)
  or a threshold-implementation Keccak core
  (:cite:`saarinen2024_sloth_slhdsa`).
* **Template / SPA on FORS index extraction and sibling ordering**.
  The FORS digit (a secret derived from the message digest via
  ``SK.PRF``) drives the order in which sibling subtrees are
  computed. Template matching on the PRF absorption patterns can
  recover the digit bit by bit (discussion in
  :cite:`kannwischer2018_dpa_xmss_sphincs`, SoK update in
  :cite:`dobias2025_sok_pqc_sca`).

This chapter lists what is implemented today and what is scheduled
for the next hardening round. Threat classes: :doc:`../threat_model`.

.. contents::
   :local:
   :depth: 2

Coverage matrix
===============

.. list-table:: SLH-DSA countermeasure / threat matrix
   :header-rows: 1
   :widths: 25 18 57

   * - Threat
     - Status
     - Countermeasure(s)
   * - Fault on FORS / WOTS+ / XMSS (grafting-tree forgery)
     - implemented (``sca-fors-redundancy``)
     - Recompute-and-compare redundancy on the FORS signature path
       (``T1-C``, opt-in feature ``sca-fors-redundancy``), aligned
       with :cite:`genet2023_protecting_sphincs_faults`. Addresses
       both physical fault injection
       (:cite:`castelnovi2018_grafting_trees`,
       :cite:`genet2018_practical_fault_sphincs`) and software
       Rowhammer (:cite:`adiletta2025_slashdsa_rowhammer`). Consumes
       the constant-time ``fors_pk_from_sig`` (``T1-F``).
   * - DPA on the master PRF (``SK.seed`` → leaf secrets)
     - **planned** (tier 4, ``T4-B``)
     - First-order masking of the PRF call that derives WOTS+ and
       FORS leaf secrets, following the 3-share SHAKE posture of
       :cite:`fluhrer2024_sca_resistant_sphincs`; long-term
       alternative is a TI Keccak core
       (:cite:`saarinen2024_sloth_slhdsa`).
   * - SPA / template on FORS sibling PRF addresses (leaks idx bits)
     - implemented (``sca-fors-dummy-siblings``)
     - Full-tree streaming FORS sign (``T1-D``, opt-in feature
       ``sca-fors-dummy-siblings``): all ``2^A`` leaves of each
       FORS tree are absorbed in fixed order ``[base, base + 2^A)``,
       the leaf secret + auth-path siblings are extracted
       branchlessly via ``silentops::ct_copy`` /
       ``silentops::ct_eq_u32``. PRF address sequence becomes
       idx-independent, closing the per-bit template oracle of
       :cite:`kannwischer2018_dpa_xmss_sphincs`. Output bytes
       unchanged; cost ~2×.
   * - Fault on digest → FORS indices
     - implemented (``sca-fors-indices-check``)
     - Recompute-and-compare check at the tail of
       ``fors_sign_into`` (``T1-E``, opt-in feature
       ``sca-fors-indices-check``). The index vector is re-derived
       from ``md`` and CT-compared via ``silentops::ct_eq`` to the
       vector consumed during signing; on mismatch returns
       ``Err(SlhDsaError::FaultDetected)`` before the hypertree
       step runs. Output bytes unchanged; cost negligible.
   * - SPA on hypertree walk / memory-stack SPA on FORS
     - implemented (tier 2 RAM)
     - Iterative BDS treehash (``fors_node``) with data-independent
       stack depth; streaming signature emission avoids heap
       allocator side-channels.
   * - Software / remote timing
     - implemented
     - No secret-dependent early exit in the public signing path;
       all intermediate comparisons use ``silentops::ct_eq``. The
       branches ctgrind still flags during signing are on values
       that are byte-for-byte part of the emitted signature (``R``,
       ``digest``, indices) — interim suppression documented under
       :doc:`../verification`; scheduled removal via ``T2-D``.
   * - Template attacks on WOTS+ chain values
     - implemented
     - Same ``ct_*`` routing + fixed-length chain iteration
       (``chain_iter`` executes a constant number of F-hashes per
       chain).

Memory / stack-timing — iterative treehash + streaming signature
================================================================

Principle
---------

The FORS treehash, if written recursively, allocates ~256 KiB of
stack in the worst parameter set. A recursive trace exposes a memory-
access envelope that matches the tree geometry and indirectly leaks
the FORS digit. The iterative variant keeps a BDS-style stack of
``z+1`` nodes (~448 B) and walks the tree with a loop counter that
is independent of the secret.

Streaming signature emission complements this: the final signature
is allocated once at the top level and sub-slices are passed to
``fors_sign_into`` / ``ht_sign_into`` / ``xmss_sign_into`` /
``wots_sign_into``. No intermediate heap buffer is ever resized, so
the allocator state cannot leak intermediate component sizes.

Published basis
---------------

* :cite:`bos2022_dilithium_memory_constrained` — the memory-
  constrained methodology used as inspiration (originally for
  ML-DSA but transferrable).
* :cite:`eprint2025_sphincslet` — compact SLH-DSA variant whose
  engineering choices match our streaming approach.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - Iterative FORS treehash
     - ``quantica/src/slh_dsa/fors.rs`` ``fors_node``
   * - Streaming sign entry point
     - ``quantica/src/slh_dsa/slh.rs`` ``slh_sign_internal`` (passes
       sub-slices of the output buffer to each layer's
       ``*_sign_into`` function).
   * - Per-layer streaming variants
     - ``quantica/src/slh_dsa/wots.rs`` ``wots_sign_into``
       ; ``quantica/src/slh_dsa/xmss.rs`` ``xmss_sign_into``
       ; ``quantica/src/slh_dsa/hypertree.rs`` ``ht_sign_into``
       ; ``quantica/src/slh_dsa/fors.rs`` ``fors_sign_into``

Timing — no secret-dependent branches in the public path
========================================================

Principle
---------

The SLH-DSA public-signing path (``slh_sign_internal``) is
deterministic modulo the randomizer ``R`` and does not take a secret-
dependent early exit. The three inner functions that do contain
conditional branches during signing — ``fors::fors_pk_from_sig``,
``wots::chain_iter``, ``xmss::xmss_pk_from_sig`` — branch on
``(md, idx_tree, idx_leaf, digits)`` which are derived from ``R`` and
the public keys. ``R`` is the first ``n`` bytes of the emitted
signature; once transmitted, an observer recomputes these values
from ``R`` and the public keys, so leaking them via timing is
information-theoretically equivalent to reading the signature.

This is formally documented as the SLH-DSA block of the ctgrind
suppression file (:doc:`../verification` has the full threat-model
paragraph); the suppression is scheduled to be closed by item
``T2-D`` below.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - Signing entry + component layout
     - ``quantica/src/slh_dsa/slh.rs`` ``slh_sign_internal``
   * - Constant-time helpers used by verify
     - ``quantica/src/slh_dsa/slh.rs`` (``slh_verify_internal`` uses
       ``silentops::ct_eq`` for the final PK equality check).

DFA / fault injection — current posture
=======================================

SLH-DSA has no rejection sampling and no double representation of
intermediates, so the current implementation does **not** yet
include a DFA hardening layer. This is known to be the dominant
residual risk for hash-based signatures: a single-fault universal
forgery is the canonical attack class since
:cite:`castelnovi2018_grafting_trees`, with a practical voltage-
glitch realisation in :cite:`genet2018_practical_fault_sphincs` and,
more recently, a purely software Rowhammer realisation in
:cite:`adiletta2025_slashdsa_rowhammer` — the latter removing the
"needs a lab" argument that previously justified deferring this
layer. ``T1-C`` (the canonical recompute-and-compare redundancy)
and its CT prerequisite ``T1-F`` have shipped — see below; ``T1-E``
(digest → FORS-indices integrity check) remains planned for the
next sprint.

Planned hardening
==========================

The following items are planned for the next hardening round.
Signatures are provided as rustdoc sketches ahead of implementation
— the code stubs are deliberately left out so that the API surface
can be reviewed before implementation starts.

T1-C — FORS signature redundancy — **shipped**
----------------------------------------------

**Addresses:** grafting-tree universal forgery
(:cite:`castelnovi2018_grafting_trees`,
:cite:`genet2018_practical_fault_sphincs`,
:cite:`adiletta2025_slashdsa_rowhammer`). Canonical recommendation
of :cite:`genet2023_protecting_sphincs_faults`: sign the FORS
component twice, compare the results in constant time, abort on
divergence before the signature can leave the device.

**Implementation:** ``fors::fors_sign_into_redundant`` in
``quantica/src/slh_dsa/fors.rs``, gated by the ``sca-fors-redundancy``
cargo feature. The routine signs FORS twice into independent heap-
backed [``SecretBytes``] scratch buffers, derives the FORS public key
from each signature via the constant-time ``fors_pk_from_sig``
(``T1-F``), then compares **both signatures and both derived public
keys** under ``silentops::ct_eq``. On any mismatch it returns
``Err(SlhDsaError::FaultDetected)`` *without* writing anything into
the caller's signature buffer — the faulted signature never propagates.
On a clean run it copies the validated signature into ``out`` and
returns the FORS pk, which the caller (``slh::slh_sign_internal_redundant``)
feeds straight into the hypertree signer.

.. code-block:: rust

   /// Recompute-and-compare FORS signing (T1-C). Returns the validated
   /// FORS public key, or `Err(FaultDetected)` on a single-fault attack
   /// against the FORS hash chain.
   pub fn fors_sign_into_redundant<P: Params>(
       md:            &[u8],
       sk_seed:       &[u8],
       pk_seed:       &[u8],
       adrs_template: &Adrs,
       out:           &mut [u8],
   ) -> Result<Vec<u8>, SlhDsaError>;

Comparing both surfaces (signature bytes *and* derived pk) is
defence-in-depth: a fault that corrupts auth-path bytes might
round-trip to the same FORS root under the verifier path; the byte-
level ``ct_eq`` catches that case. Symmetrically, a fault inside the
second ``fors_pk_from_sig`` derivation is caught by the pk
``ct_eq``. Both checks together cost a single extra ``ct_eq`` and
are paid only on the slow path that already runs the FORS signer
twice.

**Abort posture** — unlike ML-KEM's double-decaps + branchless
fault-fallback (:doc:`ml_kem`), this routine aborts rather than
substituting a fault-derived value. The asymmetry is deliberate: a
KEM must always return a shared secret, while a signer that detects
a fault must, per :cite:`genet2023_protecting_sphincs_faults`,
refuse to emit so the faulted signature does not propagate.

**Dispatch.** The public ``SlhDsa::<P>::sign`` switches between the
redundant path (``slh::slh_sign_internal_redundant``) and the
historic non-redundant path (``slh::slh_sign_internal``) at
compile time via ``#[cfg(feature = "sca-fors-redundancy")]``. The
non-redundant path stays publicly re-exported as the CAVP / KAT
deterministic entry point.

**Validation.** Three module tests in ``fors.rs``:

* ``fors_sign_into_redundant_matches_reference_shake128s`` /
  ``…_shake128f`` — drive the redundant path on multiple
  ``seed × message`` permutations and assert that (a) the validated
  signature is byte-identical to the non-redundant ``fors_sign_into``
  output, and (b) the returned FORS pk matches the standalone
  ``fors_pk_from_sig`` derivation from the produced signature.
* ``fors_redundancy_compare_detects_divergence`` — exercises the
  internal ``fors_redundancy_compare`` helper with synthetically
  divergent buffers (signature mismatch, pk mismatch, both) and
  asserts each surfaces ``Err(FaultDetected)``; the all-equal case
  surfaces ``Ok``. Lets us validate the abort logic without
  injecting a real fault into the FORS signer.

**Cost.** One extra ``fors_sign_into`` (~1× FORS signing time
again) plus two ``fors_pk_from_sig`` derivations and two
``silentops::ct_eq`` checks. The bulk is the second signing —
mirrors the double-decaps posture of ML-KEM in spirit.

**Memory.** One ``SecretBytes`` scratch of length ``fors_sig_len =
K * (1 + A) * N`` (~10 KiB for SHAKE-256s, ~7 KiB for SHAKE-128f)
heap-allocated so the M0 baseline stack budget stays honest, drop-
zeroized on both the success and the abort path.

T1-D — full-tree streaming FORS sign — **shipped**
--------------------------------------------------

**Addresses:** template attack on FORS sibling PRF addresses
(:cite:`kannwischer2018_dpa_xmss_sphincs`). In the FIPS-205 default
path, the address passed to ``fors_node`` during the authentication-
path loop is ``base + s * 2^j`` where ``s = floor(idx / 2^j) XOR 1``
— the upper ``(A - j)`` bits of the secret FORS digit ``idx`` with
the lowest bit flipped. The set of addresses absorbed by Keccak
across ``j ∈ [0, A)`` reveals ``idx`` byte-by-byte to a template
attacker.

**Implementation:** the per-FORS-tree inner loop of
``fors::fors_sign_into`` (gated by the ``sca-fors-dummy-siblings``
cargo feature) is replaced by a single **BDS-style full-tree
streaming traversal**:

1. Iterate ``k`` from ``0`` to ``2^A - 1`` in fixed order.
2. For each leaf at position ``leaf_idx = base + k``:

   * Generate the leaf secret via ``fors_sk_gen`` (absorbs the
     idx-independent address ``set_tree_index(leaf_idx)``).
   * Branchlessly save the leaf secret into the signature's
     "leaf secret" slot if ``k == idx``, via
     ``silentops::ct_copy`` guarded by ``silentops::ct_eq_u32``.
   * Hash the leaf via ``f_hash``; push the height-0 node onto
     a BDS stack.
   * Iteratively merge same-height stack tops via ``hash_h``
     (absorbs idx-independent ``set_tree_index(absolute_pos)``
     where ``absolute_pos`` depends only on ``i`` and ``k``).
   * At each merge to height ``h``, branchlessly save the
     resulting node to ``auth_path[h]`` if
     ``(k >> h) == ((idx >> h) XOR 1)``.

After streaming all ``2^A`` leaves, the BDS stack contains exactly
one node — the FORS root, discarded (the caller re-derives it via
``fors_pk_from_sig``). Both the leaf secret and the ``A`` auth-path
siblings are populated in the signature slot.

**Signature stays unchanged** — the output bytes are byte-identical
to the FIPS-205 default path on every input (KAT-verified across
all six SHAKE parameter sets, with and without ``sca-fors-redundancy``
composed).

.. code-block:: rust

   /// `fors_sign_into` under `sca-fors-dummy-siblings` — sketch.
   for k in 0..(1u32 << P::A) {
       let leaf_idx = base + k;
       let sk = fors_sk_gen::<P>(sk_seed, pk_seed, &mut adrs, leaf_idx);
       silentops::ct_copy(leaf_slot, &sk, silentops::ct_eq_u32(k, idx));
       let mut node = hash::f_hash::<P>(pk_seed, &mut adrs, &sk);
       let mut height = 0u32;
       let mut local_pos = k;
       silentops::ct_copy(
           &mut auth_slot[0..P::N], &node,
           silentops::ct_eq_u32(local_pos, idx ^ 1),
       );
       while let Some(&(_, top_h)) = stack.last() {
           if top_h != height { break; }
           // ... pop, merge, save auth_slot[h] branchlessly ...
       }
       stack.push((node, height));
   }

**What this kills.** The Keccak absorption sequence becomes a
deterministic function of the public FORS-tree index ``i`` only;
no ``idx``-dependent address ever reaches the PRF. The template
oracle of :cite:`kannwischer2018_dpa_xmss_sphincs` is closed for
FORS signing. The same reasoning protects against DPA on the
leaf-secret PRF (``fors_sk_gen``) since its address argument is
likewise ``idx``-independent in the streamed path.

**Cost.** Roughly **2× the default FORS hash count** per signature.
The default path computes ``sum_{j=0..A-1} 2^j = 2^A - 1`` leaves
across the auth-path subtrees; the full-tree stream computes
``2^A`` leaves + ``2^A - 1`` internal merges. KAT wall-time
(host x86_64) goes from ~80 s to ~135 s under
``--features slh-dsa,sca-fors-dummy-siblings`` — ratio consistent
with the predicted ~2×.

**Memory.** Stack budget unchanged at ``O(A * N)`` for the BDS
stack (same as the existing iterative treehash in ``fors_node``,
``quantica/src/slh_dsa/fors.rs:62-118``). No new heap hot-spot.

**Historical correction.** An earlier draft of this section
described T1-D as "compute both possible siblings (``s = 0`` and
``s = 1``) at fixed positions, select the right one branchlessly".
That framing is wrong: FIPS-205 Algorithm 16 has
``s = floor(idx / 2^j) XOR 1``, multi-bit, taking values in
``[0, 2^(A-j))`` at level ``j``. At ``j = 0`` (deepest level) the
sibling sits at one of up to ``2^A`` ``idx``-dependent positions,
not at one of a fixed pair. A first implementation along the
"two-candidate" line silently produced non-FIPS-compliant
signatures (5/16 KAT vectors diverged). The full-tree streaming
traversal documented above is the only mechanism that produces an
``idx``-independent address sequence at the same asymptotic cost.

**Validation.** End-to-end KAT
(``cargo test --release -p quantica --test slh_dsa_kat --features
slh-dsa,sca-fors-dummy-siblings``) — 16/16 vectors byte-identical
to the default path. Lib tests
(``cargo test --release -p quantica --lib --features
sca-fors-dummy-siblings``) — 5/5 green; composition with
``sca-fors-redundancy`` also green
(``--features sca-fors-dummy-siblings,sca-fors-redundancy``).
Aligns with the SLotH threshold-implementation posture
(:cite:`saarinen2024_sloth_slhdsa`).

**Out of scope.** Extension of full-tree streaming to WOTS+ chains
inside the hypertree — same template-oracle reasoning applies but
the leak surface is smaller; tracked as a Tier-4 candidate.

T1-E — digest → indices integrity check — **shipped**
------------------------------------------------------

**Addresses:** single-fault attack forcing one of the FORS indices
to a controlled value (zero-index variant of
:cite:`castelnovi2018_grafting_trees`). The corruption reveals
``PRF(SK.seed, addr_0)`` cleanly. Even with ``T1-D`` (full-tree
streaming) shipped, a fault during the upstream
``message_to_indices`` derivation, or during the digest extraction
itself, could redirect the leaf-secret commit to a faulted
position before the streaming traversal kicks in.

**Implementation:** at the tail of ``fors::fors_sign_into``, the
FORS index vector is **re-derived from the same ``md`` slice** and
**CT-compared** to the vector consumed during signing. The check
is gated by the ``sca-fors-indices-check`` cargo feature; on a
mismatch ``fors_sign_into`` returns
``Err(SlhDsaError::FaultDetected)``, the ``slh_sign_internal``
caller propagates via ``?``, and the hypertree-signing step never
runs — the faulted FORS sub-signature never gets wrapped into a
full signature emitted to the host.

.. code-block:: rust

   pub(crate) fn fors_indices_consistency_check<P: Params>(
       md:   &[u8],
       used: &[u32],
   ) -> Result<(), SlhDsaError> {
       let recomputed = message_to_indices::<P>(md);
       if recomputed.len() != used.len() {
           return Err(SlhDsaError::FaultDetected);
       }
       let used_b: Vec<u8> = used.iter().flat_map(|x| x.to_le_bytes()).collect();
       let rec_b: Vec<u8> = recomputed.iter().flat_map(|x| x.to_le_bytes()).collect();
       if silentops::ct_eq(&used_b, &rec_b) != 1 {
           return Err(SlhDsaError::FaultDetected);
       }
       Ok(())
   }

The fresh derivation is run on the **same** ``md`` slice, so a
fault that lands persistently on ``md`` itself (e.g. Rowhammer on
the stack region holding the digest) passes the check; that
threat is the redundant-signing class ``T1-C`` already covers
(two independent FORS signings see different intermediate state).
``T1-E`` specifically catches transient faults in the
``base_2b`` bit-extraction or in the index vector storage between
production and consumption.

**Cost.** One extra ``message_to_indices`` (= one ``base_2b``) per
FORS signature — negligible byte-shuffling, no hashing,
``K * A / 8`` bytes processed. The two ``Vec<u8>`` serialisations
for ``silentops::ct_eq`` allocate ``4 * K`` bytes each, freed at
function return; well under any M0-baseline budget.

**Composition.** Orthogonal to ``T1-C`` (which compares two
independent FORS signings to catch in-FORS faults) and to ``T1-D``
(which closes the template oracle on Keccak addresses). Under
``--features sca-fors-redundancy``, ``T1-C``'s
``fors_sign_into_redundant`` calls ``fors_sign_into`` twice and
each call independently runs the ``T1-E`` check (if also enabled).
KAT determinism preserved in every combination
(``sca-fors-indices-check`` on its own; combined with ``T1-D``;
combined with ``T1-D + T1-C``).

**Validation.** Lib tests
``fors_indices_check_accepts_correct_shake128s`` /
``…_shake128f`` exercise the positive path on multiple seed
permutations. ``fors_indices_check_rejects_flipped_index``
drives the helper with synthetically corrupted index vectors
(one bit flipped, and a length mismatch) and asserts
``FaultDetected`` in each case. End-to-end determinism: KAT
``cargo test --release -p quantica --test slh_dsa_kat
--features slh-dsa,sca-fors-indices-check`` — 16/16 vectors
byte-identical to the default path (~85 s wall-time vs ~80 s
default, the overhead is in the integrity check, the signing
itself is unchanged).

T4-B — PRF masking
------------------

**Addresses:** DPA on ``SK.seed`` through the FORS / WOTS+ leaf PRF
(:cite:`kannwischer2018_dpa_xmss_sphincs`). The baseline construction
is :cite:`fluhrer2024_sca_resistant_sphincs` (3-share SHAKE),
with a hardware-side alternative documented in
:cite:`saarinen2024_sloth_slhdsa`.

**Planned API** (transparent wrapper over the existing
``hash::prf``):

.. code-block:: rust

   /// 3-share masked PRF. Emits the same byte string as
   /// `hash::prf` but keeps `sk_seed` split into shares through
   /// every SHAKE-absorb step, per Fluhrer's construction.
   #[cfg(feature = "sca-masked-prf")]
   pub fn prf_masked<P: Params>(
       pk_seed:   &[u8],
       sk_seed_s: &MaskedSeed,      // two shares of SK.seed
       adrs:      &Adrs,
   ) -> Vec<u8>;

Cost: roughly 1.7× per signature. Gated behind an opt-in feature
until SHAKE masking lands in ``silentops``.

T1-F — constant-time ``fors_pk_from_sig`` — **shipped**
-------------------------------------------------------

**Addresses:** the secret-dependent branch
``if ((idx >> j) & 1) == 0 { ... } else { ... }`` inside the original
FIPS-205 Algorithm 17. Verifier-side, the branch is on public data;
but when the same routine is reused under ``T1-C`` as part of the
signing-side redundancy check, its input becomes secret and a Rust
``if`` would re-introduce a timing leak.

**Implementation:** ``fors::fors_pk_from_sig`` in
``quantica/src/slh_dsa/fors.rs`` was reworked to a single
constant-time routine. For every authentication-path level, the
original branch is replaced by a byte-wise
``silentops::ct_select_u8`` cswap that materialises the
``(left, right)`` ``hash_h`` inputs into two ``N``-byte stack
buffers, then calls ``hash_h(left, right)`` unconditionally. The
``tree_index`` written into ``adrs`` is identical in both original
branches so it needs no extra masking. Scratch buffers are
``silentops::ct_zeroize``-d at the end of the routine.

.. code-block:: rust

   /// Constant-time FORS pk-from-sig (FIPS-205 Alg. 17). The
   /// secret-dependent `hash_h` argument ordering is resolved by
   /// a branchless `silentops::ct_select_u8` cswap. Single routine —
   /// used by both the standalone verifier and the T1-C signing-side
   /// redundancy check (where `idx` is secret).
   pub fn fors_pk_from_sig<P: Params>(
       sig_fors: &[u8],
       md:       &[u8],
       pk_seed:  &[u8],
       adrs:     &mut Adrs,
   ) -> Vec<u8>;

A previous variable-time sibling has been removed: keeping a single
CT implementation eliminates the foot-gun of a future call site
picking a leaky variant by autocomplete.

**Validation:** two round-trip tests in
``quantica/src/slh_dsa/fors.rs``
(``fors_pk_from_sig_round_trip_shake128s`` and ``…_shake128f``)
exercise the sign → pk-from-sig pipeline across multiple seed /
message permutations and assert that two back-to-back derivations
agree (determinism) and produce ``N``-byte outputs. End-to-end
correctness against FIPS-205 reference output is covered by the
KAT suite ``quantica/tests/slh_dsa_kat.rs``.

Cost: ``2 * N`` byte scratch (~32 B for SHAKE-128, ~64 B for
SHAKE-256) plus ``2 * N * A`` ``ct_select_u8`` calls per FORS
tree per signature. Negligible compared to the underlying SHAKE
work.

T2-D — explicit unpoison of ``R``, ``digest``, indices
------------------------------------------------------

Programmatic proof to ctgrind that the branches inside
``fors::fors_pk_from_sig``, ``wots::chain_iter``,
``xmss::xmss_sign_into`` / ``xmss_pk_from_sig`` are on data that
has reached the "publish-ready" state. Closes the four suppressions
listed in ``tools/ctgrind.supp``. Zero-cost on production builds.