.. krypteia — Hermelink 2025/276 audit annex on `ml_dsa::masked`
   Doc-only deliverable, T1-B of the SLH-DSA / ML-DSA Tier-1 roadmap.

Hermelink 2025/276 audit pass on ``ml_dsa::masked``
====================================================

:Status:        Shipped (audit only — code work-list cross-referenced below).
:Scope:         ``quantica/src/ml_dsa/masked.rs`` plus the rejection-loop
                callers in ``quantica/src/ml_dsa/dsa.rs``.
:Reference:     :cite:`hermelink2025_weakest_link_masked_mldsa` (CRYPTO 2025,
                IACR ePrint 2025/276).
:Audit method:  Static walk of every public masked gadget and every
                ``unmask()`` call site, classified against the Hermelink
                leak taxonomy.

.. contents::
   :local:
   :depth: 2

Why this audit
--------------

:cite:`hermelink2025_weakest_link_masked_mldsa` does **not** break ML-DSA
or its masked variant; it provides an information-theoretic enumeration
of the operations inside a masked Dilithium implementation that
*leak even when each individual gadget is masked*, because the
**aggregate statistic** combines shares unsafely. The dominant leak class
is the rejection-loop recombination: shares of ``y``, ``s1``, ``s2``,
``t0`` get unmasked into plaintext aggregates (``w``, ``z``, ``cs1``,
``cs2``, ``ct0``) that then drive data-dependent non-linear operations
(``Decompose``, ``MakeHint``, norm checks).

The paper is **the** auditor checklist for any masked-ML-DSA claim. T1-B
of the krypteia Tier-1 roadmap walks that checklist over our code and
records, gadget by gadget, whether the leak class is *closed*,
*acknowledged residual risk*, or *scheduled for reinforcement*. This
file is the resulting evidence piece — readable as a stand-alone
evaluation surface and traceable down to a file:line for every claim.

Scope and out-of-scope
----------------------

**In scope** — the masked-arithmetic surface area of ML-DSA:

* the masked-poly type and its primitives
  (``MaskedPoly`` in ``quantica/src/ml_dsa/masked.rs``);
* the masked algorithmic kernels reused inside ``sign_internal``
  (masked NTT, masked × public, masked matrix-vector multiplication);
* every ``unmask()`` call site inside the rejection loop of
  ``quantica/src/ml_dsa/dsa.rs::sign_internal``, and every data-dependent
  operation consuming the resulting plaintext aggregate.

**Out of scope** — outside Hermelink's target:

* the unmasked NTT (``ml_dsa::ntt``), the samplers
  (``ml_dsa::sample``), and the rejection-orchestration tail
  (``sign_internal`` after a successful rejection-loop exit), which
  operate on quantities already public per FIPS-204 or on the final
  signature bytes;
* ML-KEM masked code and SLH-DSA: not the paper's target. The same
  audit-annex pattern can be reused there if needed in a future tier.

Leak-class taxonomy (Hermelink 2025/276)
----------------------------------------

We use a five-class shorthand that maps the paper's section structure to
our code:

* **C1 — Recombination of secret shares.** Every ``unmask()`` call
  produces a plaintext aggregate. Even when the upstream operations are
  masked, the resulting plaintext is the natural DPA / SPA target.
* **C2 — Decompose / HighBits / LowBits on unmasked aggregates.** Once
  the plaintext is on the stack, any non-linear bit-extraction with
  data-dependent control flow becomes a CPA target (the resulting
  ``HighBits`` polynomial directly enters the signature).
* **C3 — Hint compression.** ``MakeHint`` is a per-coefficient
  comparison whose count is bounded by ``ω``; the count itself and the
  per-coefficient outcome leak the shape of ``w - c·s2`` modulo
  ``2·γ₂``.
* **C4 — Mask refresh sufficiency.** Higher-order DPA aggregates traces
  across rejection iterations; a missing refresh between iterations
  collapses the higher-order assumption.
* **C5 — Sampler-side leakage on `y`.** ``ExpandMask`` is the canonical
  DPA target on ``SK.prf``; the paper requires that ``y`` never
  materialises in plaintext on the stack between sampling and consumption.

Per-gadget audit matrix
-----------------------

.. list-table:: ML-DSA masked gadgets vs Hermelink 2025/276 leak classes
   :header-rows: 1
   :widths: 32 16 8 14 30

   * - Gadget / call site
     - File:line
     - Class
     - Status
     - Rationale and follow-up

   * - ``MaskedPoly::sample_expand_mask``
     - ``masked.rs:158-218``
     - C5
     - protected
     - DPA-safe sampling: ``y`` is produced as two arithmetic shares
       directly from SHAKE256, never reconstructed in plaintext. Verified
       by ``masked_expand_mask_matches_unmasked_expand_mask`` test.

   * - ``MaskedPoly::mask`` / ``unmask`` (primitive round trip)
     - ``masked.rs:229-269``
     - C1 (primitive)
     - protected
     - Correctness verified by ``mask_unmask_roundtrip``; the primitive
       itself is sound. Per-call-site analysis below for the *use* of
       ``unmask()`` inside the rejection loop.

   * - ``MaskedPoly::refresh``
     - ``masked.rs:284-299``
     - C4 (primitive)
     - protected
     - Re-randomises both shares without changing the unmasked value;
       verified by ``refresh_preserves_unmasked_value``. The primitive
       is sound; the C4 *sufficiency* question (refresh per iteration)
       is addressed in the row below.

   * - ``masked_ntt`` / ``masked_ntt_inv``
     - ``masked.rs:314-323``
     - — (linear)
     - protected
     - NTT is linear over the prime field; applying it to each share
       preserves the additive masking invariant. Verified by
       ``masked_ntt_matches_regular_ntt``.

   * - ``masked_pointwise_mul_public``
     - ``masked.rs:332-337``
     - — (linear × public)
     - protected
     - Multiplication by the public challenge polynomial ``c`` is
       linear in the secret share; first-order security preserved.
       Verified by ``masked_pointwise_mul_public_matches_unmasked``.

   * - ``masked_mat_vec_mul`` / ``masked_mat_vec_mul_lazy``
     - ``masked.rs:351-395``
     - — (linear × public)
     - protected
     - Same linearity argument: ``A`` is public, each share is matrix-
       multiplied independently. The lazy variant recomputes ``a_hat``
       from ``rho`` on the fly for low-memory targets and is currently
       not covered by a dedicated unit test (covered transitively by
       end-to-end KAT) — minor follow-up.

   * - Per-iteration mask refresh sufficiency
     - whole rejection loop
     - C4
     - protected
     - ``T1-A`` shipped (``dsa.rs`` head-of-loop refresh block).
       Every polynomial of ``s1_hat_m``, ``s2_hat_m``, ``t0_hat_m`` is
       re-randomised via ``MaskedPoly::refresh`` at the start of every
       rejection iteration, before any operation on the shares —
       exactly the Hermelink §4 prescription. Output bytes unchanged
       (mask cancels in unmask, KAT 9/9 byte-identical). Cost
       unchanged versus the previous end-of-cs/ct refresh placement
       (same number of refresh calls per iteration, same ScaRng-byte
       consumption).

   * - ``w_m[i].unmask()`` → ``w_tmp[i]``
     - ``dsa.rs:727``
     - C1
     - residual
     - ``w = A·y`` is intentionally produced as a plaintext aggregate;
       it is the public input to ``HighBits`` whose output is emitted
       in the signature. ``T1-A`` shipped — the upstream ``y_m`` /
       ``s1_hat_m`` / ``s2_hat_m`` shares are refreshed at the start
       of every rejection iteration, killing cross-iteration
       higher-order DPA aggregation. The remaining residual is the
       plaintext-aggregate floor (the unmask itself); full closure
       would require a HighBits-on-shares gadget — Tier-3 candidate.

   * - ``y_m[r].unmask()`` → ``y_out[r]``
     - ``dsa.rs:735``
     - C1
     - residual
     - ``y`` is consumed in plaintext for the time-domain
       ``z = y + c·s1`` formation. ``y_m`` is resampled fresh every
       iteration (``sample_expand_mask``), so cross-iteration
       aggregation is already neutralised at the source — no T1-A
       contribution needed here. Remaining residual is the
       plaintext-aggregate floor (the unmask itself).

   * - ``cs1[i] = masked_pointwise_mul_public(s1_hat_m[i], c_hat).unmask()``
     - ``dsa.rs:1000``
     - C1
     - residual
     - First-order safe (masked × public is linear, the unmasking
       happens after the multiply). ``T1-A`` shipped — ``s1_hat_m``
       is refreshed at the head of every rejection iteration, so the
       multi-iteration higher-order DPA window is closed. Remaining
       residual is the plaintext-aggregate floor; closure via
       share-domain multiply-and-accumulate is a Tier-3 candidate.

   * - ``cs2[i] = masked_pointwise_mul_public(s2_hat_m[i], c_hat).unmask()``
     - ``dsa.rs:1045``
     - C1
     - residual
     - Same as ``cs1``. ``s2_hat_m`` covered by the same T1-A
       per-iteration refresh; ``s2`` shares feed into
       ``r0 = LowBits(w - c·s2)`` whose threshold check leaks
       outcome-level — Tier-2 candidate (CT norm-on-shares).

   * - ``ct0[i] = masked_pointwise_mul_public(t0_hat_m[i], c_hat).unmask()``
     - ``dsa.rs:1128``
     - C1
     - residual
     - Same as ``cs1``/``cs2``. ``t0_hat_m`` covered by the same T1-A
       per-iteration refresh; ``t0`` shares feed into hint generation,
       whose count and threshold leak outcome-level — Tier-3
       candidate (share-domain MakeHint).

   * - ``decompose::high_bits_vec(&w_tmp, …)``
     - ``dsa.rs:816``
     - C2
     - residual
     - HighBits is on the unmasked ``w``; the output is part of the
       public signature footprint, so the leak is on ``y`` only via
       the upstream chain. Hardening would require a HighBits-on-shares
       gadget. Tier-3 candidate (not on the current roadmap).

   * - ``decompose::low_bits(wbuf[j], γ₂)`` (per-coef)
     - ``dsa.rs:935``
     - C2
     - residual
     - Inner-loop low-bits extraction inside Phase 2; data-dependent
       on unmasked ``w - cs2``. Same Tier-3 candidate as above.

   * - ``decompose::low_bits_vec(&w_minus_cs2, …)``
     - ``dsa.rs:1094``
     - C2
     - residual
     - Vector variant feeding ``r0`` for the norm check; same class.

   * - ``decompose::make_hint(mod_q(-tmp[j]), wbuf[j] + tmp[j], γ₂)``
     - ``dsa.rs:961``
     - C3
     - residual
     - Per-coefficient hint generation, observation of the hint bit
       pattern is the Hermelink §3 leak. Tier-3 candidate
       (share-domain MakeHint).

   * - ``decompose::make_hint_vec(&neg_ct0, &w_cs2_ct0, γ₂, k)``
     - ``dsa.rs:1179``
     - C3
     - residual
     - Vector variant that returns ``(h, num_ones)``; ``num_ones > ω``
       drives a rejection branch whose timing is closed by
       ``sca-ct-rejection`` already, but the C3 information leak on
       the count itself is not. Tier-3 candidate.

   * - ``check_norm_vec(&z, γ₁ − β, l)``
     - ``dsa.rs:895, 1100, 1111``
     - C1/C2
     - **partial**
     - The classic infinity-norm check on the unmasked ``z`` aggregate.
       The early-abort timing leak is closed by ``sca-ct-rejection``
       (every iteration computes every intermediate, a single branch-
       free decision is taken). The information-theoretic leak (the
       *outcome* of the check across many signatures) is the residual
       C1/C2 contribution. Tier-2 candidate (CT norm-check on shares).

   * - ``check_norm_vec(&r0, γ₂ − β, k)``
     - ``dsa.rs:1104, 1112``
     - C1/C2
     - partial
     - Same as ``z`` norm check. ``r0 = LowBits(w − c·s2)`` is
       data-dependent on both unmasked aggregates; ``sca-ct-rejection``
       closes the timing leak; the outcome-leak is the residual.

   * - ``check_norm_vec(&ct0, γ₂, k)``
     - ``dsa.rs:1146, 1153``
     - C1/C2
     - partial
     - Same posture. Used to decide whether ``ct0`` is safe to commit
       to hint generation.

Status legend
-------------

* **protected** — the leak class is *closed* for this gadget given the
  threat model in :doc:`../threat_model`. Either the operation is
  linear (so it preserves the additive sharing invariant), or it
  operates on values that are already public per the FIPS-204
  specification.

* **residual** — the leak class is *known* and *acknowledged*. The
  unmasked aggregate is short-lived (immediately re-masked, refreshed,
  or zeroized through ``MaskedPoly::zeroize`` /
  ``silentops::ct_zeroize`` after use). The practical
  DPA-cost-to-recover-key remains high but is not eliminated. A
  follow-up Roadmap item that *would* close the class is named in the
  matrix row.

* **partial** — a complementary defence is already in place (e.g.
  ``sca-ct-rejection`` removes the early-abort timing leak on norm
  checks) but the information-theoretic class is not fully closed.

Summary work-list
-----------------

The audit originally surfaced three categories of follow-up; ``T1-A``
has since shipped (see updated rows above). The remaining active
candidates are:

1. ✅ **``T1-A`` — per-iteration mask refresh** — **shipped**.
   ``s1_hat_m`` / ``s2_hat_m`` / ``t0_hat_m`` refreshed at the head
   of every rejection iteration via the ``dsa.rs`` head-of-loop
   refresh block (Hermelink §4 prescription matched exactly). Closes
   the C4 sufficiency gap; reduces every C1 residual on the unmask
   call sites to the plaintext-aggregate floor.

2. **Tier-2 candidate — CT norm check on shares** for
   ``check_norm_vec(&z, …)``, ``check_norm_vec(&r0, …)``,
   ``check_norm_vec(&ct0, …)``. Would close the C1/C2 partial rows by
   moving the norm comparison into the share domain via the
   silentops CT primitives. Cost: a redesign of ``check_norm_vec``
   to take ``MaskedPoly`` instead of plaintext. Tracked as a future
   ticket, not yet on the Roadmap.

3. **Tier-3 candidate — share-domain Decompose / MakeHint** for the
   five Decompose / MakeHint call sites listed above. Would close
   the C2 / C3 rows. Larger redesign (full share-domain
   ``HighBits``/``LowBits``/``MakeHint`` kernels). Tracked as a
   future ticket, not yet on the Roadmap.

The above three categories — one shipped, two future candidates —
cover every ``residual`` and ``partial`` row in the matrix. No row
is unaccounted for, and no current ``protected`` row degrades under
any of the follow-ups.

References
----------

* :cite:`hermelink2025_weakest_link_masked_mldsa` — the audit
  reference itself.
* :cite:`damm2025_concealed_ilwe` — concealed-ILWE attack on
  partially-masked Dilithium; motivates the same hardening vector
  on the ``masked_mat_vec_mul`` gadgets.
* :cite:`zhao2026_rejection_matters` — rejection-loop attack on
  the unmasked / hedged path; motivates the
  ``sca-ct-rejection`` posture cross-referenced in the partial
  rows above.
* :doc:`../countermeasures/ml_dsa` — the per-threat coverage matrix
  and the ``T1-A`` roadmap entry the audit cross-references.