.. krypteia — Hermelink 2025/276 audit annex on `ml_dsa::masked` Doc-only deliverable, T1-B of the SLH-DSA / ML-DSA Tier-1 roadmap. Hermelink 2025/276 audit pass on ``ml_dsa::masked`` ==================================================== :Status: Shipped (audit only — code work-list cross-referenced below). :Scope: ``quantica/src/ml_dsa/masked.rs`` plus the rejection-loop callers in ``quantica/src/ml_dsa/dsa.rs``. :Reference: :cite:`hermelink2025_weakest_link_masked_mldsa` (CRYPTO 2025, IACR ePrint 2025/276). :Audit method: Static walk of every public masked gadget and every ``unmask()`` call site, classified against the Hermelink leak taxonomy. .. contents:: :local: :depth: 2 Why this audit -------------- :cite:`hermelink2025_weakest_link_masked_mldsa` does **not** break ML-DSA or its masked variant; it provides an information-theoretic enumeration of the operations inside a masked Dilithium implementation that *leak even when each individual gadget is masked*, because the **aggregate statistic** combines shares unsafely. The dominant leak class is the rejection-loop recombination: shares of ``y``, ``s1``, ``s2``, ``t0`` get unmasked into plaintext aggregates (``w``, ``z``, ``cs1``, ``cs2``, ``ct0``) that then drive data-dependent non-linear operations (``Decompose``, ``MakeHint``, norm checks). The paper is **the** auditor checklist for any masked-ML-DSA claim. T1-B of the krypteia Tier-1 roadmap walks that checklist over our code and records, gadget by gadget, whether the leak class is *closed*, *acknowledged residual risk*, or *scheduled for reinforcement*. This file is the resulting evidence piece — readable as a stand-alone evaluation surface and traceable down to a file:line for every claim. Scope and out-of-scope ---------------------- **In scope** — the masked-arithmetic surface area of ML-DSA: * the masked-poly type and its primitives (``MaskedPoly`` in ``quantica/src/ml_dsa/masked.rs``); * the masked algorithmic kernels reused inside ``sign_internal`` (masked NTT, masked × public, masked matrix-vector multiplication); * every ``unmask()`` call site inside the rejection loop of ``quantica/src/ml_dsa/dsa.rs::sign_internal``, and every data-dependent operation consuming the resulting plaintext aggregate. **Out of scope** — outside Hermelink's target: * the unmasked NTT (``ml_dsa::ntt``), the samplers (``ml_dsa::sample``), and the rejection-orchestration tail (``sign_internal`` after a successful rejection-loop exit), which operate on quantities already public per FIPS-204 or on the final signature bytes; * ML-KEM masked code and SLH-DSA: not the paper's target. The same audit-annex pattern can be reused there if needed in a future tier. Leak-class taxonomy (Hermelink 2025/276) ---------------------------------------- We use a five-class shorthand that maps the paper's section structure to our code: * **C1 — Recombination of secret shares.** Every ``unmask()`` call produces a plaintext aggregate. Even when the upstream operations are masked, the resulting plaintext is the natural DPA / SPA target. * **C2 — Decompose / HighBits / LowBits on unmasked aggregates.** Once the plaintext is on the stack, any non-linear bit-extraction with data-dependent control flow becomes a CPA target (the resulting ``HighBits`` polynomial directly enters the signature). * **C3 — Hint compression.** ``MakeHint`` is a per-coefficient comparison whose count is bounded by ``ω``; the count itself and the per-coefficient outcome leak the shape of ``w - c·s2`` modulo ``2·γ₂``. * **C4 — Mask refresh sufficiency.** Higher-order DPA aggregates traces across rejection iterations; a missing refresh between iterations collapses the higher-order assumption. * **C5 — Sampler-side leakage on `y`.** ``ExpandMask`` is the canonical DPA target on ``SK.prf``; the paper requires that ``y`` never materialises in plaintext on the stack between sampling and consumption. Per-gadget audit matrix ----------------------- .. list-table:: ML-DSA masked gadgets vs Hermelink 2025/276 leak classes :header-rows: 1 :widths: 32 16 8 14 30 * - Gadget / call site - File:line - Class - Status - Rationale and follow-up * - ``MaskedPoly::sample_expand_mask`` - ``masked.rs:158-218`` - C5 - protected - DPA-safe sampling: ``y`` is produced as two arithmetic shares directly from SHAKE256, never reconstructed in plaintext. Verified by ``masked_expand_mask_matches_unmasked_expand_mask`` test. * - ``MaskedPoly::mask`` / ``unmask`` (primitive round trip) - ``masked.rs:229-269`` - C1 (primitive) - protected - Correctness verified by ``mask_unmask_roundtrip``; the primitive itself is sound. Per-call-site analysis below for the *use* of ``unmask()`` inside the rejection loop. * - ``MaskedPoly::refresh`` - ``masked.rs:284-299`` - C4 (primitive) - protected - Re-randomises both shares without changing the unmasked value; verified by ``refresh_preserves_unmasked_value``. The primitive is sound; the C4 *sufficiency* question (refresh per iteration) is addressed in the row below. * - ``masked_ntt`` / ``masked_ntt_inv`` - ``masked.rs:314-323`` - — (linear) - protected - NTT is linear over the prime field; applying it to each share preserves the additive masking invariant. Verified by ``masked_ntt_matches_regular_ntt``. * - ``masked_pointwise_mul_public`` - ``masked.rs:332-337`` - — (linear × public) - protected - Multiplication by the public challenge polynomial ``c`` is linear in the secret share; first-order security preserved. Verified by ``masked_pointwise_mul_public_matches_unmasked``. * - ``masked_mat_vec_mul`` / ``masked_mat_vec_mul_lazy`` - ``masked.rs:351-395`` - — (linear × public) - protected - Same linearity argument: ``A`` is public, each share is matrix- multiplied independently. The lazy variant recomputes ``a_hat`` from ``rho`` on the fly for low-memory targets and is currently not covered by a dedicated unit test (covered transitively by end-to-end KAT) — minor follow-up. * - Per-iteration mask refresh sufficiency - whole rejection loop - C4 - protected - ``T1-A`` shipped (``dsa.rs`` head-of-loop refresh block). Every polynomial of ``s1_hat_m``, ``s2_hat_m``, ``t0_hat_m`` is re-randomised via ``MaskedPoly::refresh`` at the start of every rejection iteration, before any operation on the shares — exactly the Hermelink §4 prescription. Output bytes unchanged (mask cancels in unmask, KAT 9/9 byte-identical). Cost unchanged versus the previous end-of-cs/ct refresh placement (same number of refresh calls per iteration, same ScaRng-byte consumption). * - ``w_m[i].unmask()`` → ``w_tmp[i]`` - ``dsa.rs:727`` - C1 - residual - ``w = A·y`` is intentionally produced as a plaintext aggregate; it is the public input to ``HighBits`` whose output is emitted in the signature. ``T1-A`` shipped — the upstream ``y_m`` / ``s1_hat_m`` / ``s2_hat_m`` shares are refreshed at the start of every rejection iteration, killing cross-iteration higher-order DPA aggregation. The remaining residual is the plaintext-aggregate floor (the unmask itself); full closure would require a HighBits-on-shares gadget — Tier-3 candidate. * - ``y_m[r].unmask()`` → ``y_out[r]`` - ``dsa.rs:735`` - C1 - residual - ``y`` is consumed in plaintext for the time-domain ``z = y + c·s1`` formation. ``y_m`` is resampled fresh every iteration (``sample_expand_mask``), so cross-iteration aggregation is already neutralised at the source — no T1-A contribution needed here. Remaining residual is the plaintext-aggregate floor (the unmask itself). * - ``cs1[i] = masked_pointwise_mul_public(s1_hat_m[i], c_hat).unmask()`` - ``dsa.rs:1000`` - C1 - residual - First-order safe (masked × public is linear, the unmasking happens after the multiply). ``T1-A`` shipped — ``s1_hat_m`` is refreshed at the head of every rejection iteration, so the multi-iteration higher-order DPA window is closed. Remaining residual is the plaintext-aggregate floor; closure via share-domain multiply-and-accumulate is a Tier-3 candidate. * - ``cs2[i] = masked_pointwise_mul_public(s2_hat_m[i], c_hat).unmask()`` - ``dsa.rs:1045`` - C1 - residual - Same as ``cs1``. ``s2_hat_m`` covered by the same T1-A per-iteration refresh; ``s2`` shares feed into ``r0 = LowBits(w - c·s2)`` whose threshold check leaks outcome-level — Tier-2 candidate (CT norm-on-shares). * - ``ct0[i] = masked_pointwise_mul_public(t0_hat_m[i], c_hat).unmask()`` - ``dsa.rs:1128`` - C1 - residual - Same as ``cs1``/``cs2``. ``t0_hat_m`` covered by the same T1-A per-iteration refresh; ``t0`` shares feed into hint generation, whose count and threshold leak outcome-level — Tier-3 candidate (share-domain MakeHint). * - ``decompose::high_bits_vec(&w_tmp, …)`` - ``dsa.rs:816`` - C2 - residual - HighBits is on the unmasked ``w``; the output is part of the public signature footprint, so the leak is on ``y`` only via the upstream chain. Hardening would require a HighBits-on-shares gadget. Tier-3 candidate (not on the current roadmap). * - ``decompose::low_bits(wbuf[j], γ₂)`` (per-coef) - ``dsa.rs:935`` - C2 - residual - Inner-loop low-bits extraction inside Phase 2; data-dependent on unmasked ``w - cs2``. Same Tier-3 candidate as above. * - ``decompose::low_bits_vec(&w_minus_cs2, …)`` - ``dsa.rs:1094`` - C2 - residual - Vector variant feeding ``r0`` for the norm check; same class. * - ``decompose::make_hint(mod_q(-tmp[j]), wbuf[j] + tmp[j], γ₂)`` - ``dsa.rs:961`` - C3 - residual - Per-coefficient hint generation, observation of the hint bit pattern is the Hermelink §3 leak. Tier-3 candidate (share-domain MakeHint). * - ``decompose::make_hint_vec(&neg_ct0, &w_cs2_ct0, γ₂, k)`` - ``dsa.rs:1179`` - C3 - residual - Vector variant that returns ``(h, num_ones)``; ``num_ones > ω`` drives a rejection branch whose timing is closed by ``sca-ct-rejection`` already, but the C3 information leak on the count itself is not. Tier-3 candidate. * - ``check_norm_vec(&z, γ₁ − β, l)`` - ``dsa.rs:895, 1100, 1111`` - C1/C2 - **partial** - The classic infinity-norm check on the unmasked ``z`` aggregate. The early-abort timing leak is closed by ``sca-ct-rejection`` (every iteration computes every intermediate, a single branch- free decision is taken). The information-theoretic leak (the *outcome* of the check across many signatures) is the residual C1/C2 contribution. Tier-2 candidate (CT norm-check on shares). * - ``check_norm_vec(&r0, γ₂ − β, k)`` - ``dsa.rs:1104, 1112`` - C1/C2 - partial - Same as ``z`` norm check. ``r0 = LowBits(w − c·s2)`` is data-dependent on both unmasked aggregates; ``sca-ct-rejection`` closes the timing leak; the outcome-leak is the residual. * - ``check_norm_vec(&ct0, γ₂, k)`` - ``dsa.rs:1146, 1153`` - C1/C2 - partial - Same posture. Used to decide whether ``ct0`` is safe to commit to hint generation. Status legend ------------- * **protected** — the leak class is *closed* for this gadget given the threat model in :doc:`../threat_model`. Either the operation is linear (so it preserves the additive sharing invariant), or it operates on values that are already public per the FIPS-204 specification. * **residual** — the leak class is *known* and *acknowledged*. The unmasked aggregate is short-lived (immediately re-masked, refreshed, or zeroized through ``MaskedPoly::zeroize`` / ``silentops::ct_zeroize`` after use). The practical DPA-cost-to-recover-key remains high but is not eliminated. A follow-up Roadmap item that *would* close the class is named in the matrix row. * **partial** — a complementary defence is already in place (e.g. ``sca-ct-rejection`` removes the early-abort timing leak on norm checks) but the information-theoretic class is not fully closed. Summary work-list ----------------- The audit originally surfaced three categories of follow-up; ``T1-A`` has since shipped (see updated rows above). The remaining active candidates are: 1. ✅ **``T1-A`` — per-iteration mask refresh** — **shipped**. ``s1_hat_m`` / ``s2_hat_m`` / ``t0_hat_m`` refreshed at the head of every rejection iteration via the ``dsa.rs`` head-of-loop refresh block (Hermelink §4 prescription matched exactly). Closes the C4 sufficiency gap; reduces every C1 residual on the unmask call sites to the plaintext-aggregate floor. 2. **Tier-2 candidate — CT norm check on shares** for ``check_norm_vec(&z, …)``, ``check_norm_vec(&r0, …)``, ``check_norm_vec(&ct0, …)``. Would close the C1/C2 partial rows by moving the norm comparison into the share domain via the silentops CT primitives. Cost: a redesign of ``check_norm_vec`` to take ``MaskedPoly`` instead of plaintext. Tracked as a future ticket, not yet on the Roadmap. 3. **Tier-3 candidate — share-domain Decompose / MakeHint** for the five Decompose / MakeHint call sites listed above. Would close the C2 / C3 rows. Larger redesign (full share-domain ``HighBits``/``LowBits``/``MakeHint`` kernels). Tracked as a future ticket, not yet on the Roadmap. The above three categories — one shipped, two future candidates — cover every ``residual`` and ``partial`` row in the matrix. No row is unaccounted for, and no current ``protected`` row degrades under any of the follow-ups. References ---------- * :cite:`hermelink2025_weakest_link_masked_mldsa` — the audit reference itself. * :cite:`damm2025_concealed_ilwe` — concealed-ILWE attack on partially-masked Dilithium; motivates the same hardening vector on the ``masked_mat_vec_mul`` gadgets. * :cite:`zhao2026_rejection_matters` — rejection-loop attack on the unmasked / hedged path; motivates the ``sca-ct-rejection`` posture cross-referenced in the partial rows above. * :doc:`../countermeasures/ml_dsa` — the per-threat coverage matrix and the ``T1-A`` roadmap entry the audit cross-references.