Hermelink 2025/276 audit pass on `ml_dsa::masked`

Status:: Shipped (audit only — code work-list cross-referenced below).
Scope:: quantica/src/ml_dsa/masked.rs plus the rejection-loop callers in quantica/src/ml_dsa/dsa.rs.
Reference:: [HNP25] (CRYPTO 2025, IACR ePrint 2025/276).
Audit method:: Static walk of every public masked gadget and every unmask() call site, classified against the Hermelink leak taxonomy.

Why this audit 

[HNP25] does not break ML-DSA or its masked variant; it provides an information-theoretic enumeration of the operations inside a masked Dilithium implementation that leak even when each individual gadget is masked, because the aggregate statistic combines shares unsafely. The dominant leak class is the rejection-loop recombination: shares of y, s1, s2, t0 get unmasked into plaintext aggregates (w, z, cs1, cs2, ct0) that then drive data-dependent non-linear operations (Decompose, MakeHint, norm checks).

The paper is the auditor checklist for any masked-ML-DSA claim. T1-B of the krypteia Tier-1 roadmap walks that checklist over our code and records, gadget by gadget, whether the leak class is closed, acknowledged residual risk, or scheduled for reinforcement. This file is the resulting evidence piece — readable as a stand-alone evaluation surface and traceable down to a file:line for every claim.

Scope and out-of-scope 

In scope — the masked-arithmetic surface area of ML-DSA:

the masked-poly type and its primitives (MaskedPoly in quantica/src/ml_dsa/masked.rs);
the masked algorithmic kernels reused inside sign_internal (masked NTT, masked × public, masked matrix-vector multiplication);
every unmask() call site inside the rejection loop of quantica/src/ml_dsa/dsa.rs::sign_internal, and every data-dependent operation consuming the resulting plaintext aggregate.

Out of scope — outside Hermelink’s target:

the unmasked NTT (ml_dsa::ntt), the samplers (ml_dsa::sample), and the rejection-orchestration tail (sign_internal after a successful rejection-loop exit), which operate on quantities already public per FIPS-204 or on the final signature bytes;
ML-KEM masked code and SLH-DSA: not the paper’s target. The same audit-annex pattern can be reused there if needed in a future tier.

Leak-class taxonomy (Hermelink 2025/276)

We use a five-class shorthand that maps the paper’s section structure to our code:

C1 — Recombination of secret shares. Every unmask() call produces a plaintext aggregate. Even when the upstream operations are masked, the resulting plaintext is the natural DPA / SPA target.
C2 — Decompose / HighBits / LowBits on unmasked aggregates. Once the plaintext is on the stack, any non-linear bit-extraction with data-dependent control flow becomes a CPA target (the resulting HighBits polynomial directly enters the signature).
C3 — Hint compression. MakeHint is a per-coefficient comparison whose count is bounded by ω; the count itself and the per-coefficient outcome leak the shape of w - c·s2 modulo 2·γ₂.
C4 — Mask refresh sufficiency. Higher-order DPA aggregates traces across rejection iterations; a missing refresh between iterations collapses the higher-order assumption.
C5 — Sampler-side leakage on `y`. ExpandMask is the canonical DPA target on SK.prf; the paper requires that y never materialises in plaintext on the stack between sampling and consumption.

Per-gadget audit matrix 

ML-DSA masked gadgets vs Hermelink 2025/276 leak classes
Gadget / call site	File:line	Class	Status	Rationale and follow-up
`MaskedPoly::sample_expand_mask`	`masked.rs:158-218`	C5	protected	DPA-safe sampling: `y` is produced as two arithmetic shares directly from SHAKE256, never reconstructed in plaintext. Verified by `masked_expand_mask_matches_unmasked_expand_mask` test.
`MaskedPoly::mask` / `unmask` (primitive round trip)	`masked.rs:229-269`	C1 (primitive)	protected	Correctness verified by `mask_unmask_roundtrip`; the primitive itself is sound. Per-call-site analysis below for the use of `unmask()` inside the rejection loop.
`MaskedPoly::refresh`	`masked.rs:284-299`	C4 (primitive)	protected	Re-randomises both shares without changing the unmasked value; verified by `refresh_preserves_unmasked_value`. The primitive is sound; the C4 sufficiency question (refresh per iteration) is addressed in the row below.
`masked_ntt` / `masked_ntt_inv`	`masked.rs:314-323`	— (linear)	protected	NTT is linear over the prime field; applying it to each share preserves the additive masking invariant. Verified by `masked_ntt_matches_regular_ntt`.
`masked_pointwise_mul_public`	`masked.rs:332-337`	— (linear × public)	protected	Multiplication by the public challenge polynomial `c` is linear in the secret share; first-order security preserved. Verified by `masked_pointwise_mul_public_matches_unmasked`.
`masked_mat_vec_mul` / `masked_mat_vec_mul_lazy`	`masked.rs:351-395`	— (linear × public)	protected	Same linearity argument: `A` is public, each share is matrix- multiplied independently. The lazy variant recomputes `a_hat` from `rho` on the fly for low-memory targets and is currently not covered by a dedicated unit test (covered transitively by end-to-end KAT) — minor follow-up.
Per-iteration mask refresh sufficiency	whole rejection loop	C4	protected	`T1-A` shipped (`dsa.rs` head-of-loop refresh block). Every polynomial of `s1_hat_m`, `s2_hat_m`, `t0_hat_m` is re-randomised via `MaskedPoly::refresh` at the start of every rejection iteration, before any operation on the shares — exactly the Hermelink §4 prescription. Output bytes unchanged (mask cancels in unmask, KAT 9/9 byte-identical). Cost unchanged versus the previous end-of-cs/ct refresh placement (same number of refresh calls per iteration, same ScaRng-byte consumption).
`w_m[i].unmask()` → `w_tmp[i]`	`dsa.rs:727`	C1	residual	`w = A·y` is intentionally produced as a plaintext aggregate; it is the public input to `HighBits` whose output is emitted in the signature. `T1-A` shipped — the upstream `y_m` / `s1_hat_m` / `s2_hat_m` shares are refreshed at the start of every rejection iteration, killing cross-iteration higher-order DPA aggregation. The remaining residual is the plaintext-aggregate floor (the unmask itself); full closure would require a HighBits-on-shares gadget — Tier-3 candidate.
`y_m[r].unmask()` → `y_out[r]`	`dsa.rs:735`	C1	residual	`y` is consumed in plaintext for the time-domain `z = y + c·s1` formation. `y_m` is resampled fresh every iteration (`sample_expand_mask`), so cross-iteration aggregation is already neutralised at the source — no T1-A contribution needed here. Remaining residual is the plaintext-aggregate floor (the unmask itself).
`cs1[i] = masked_pointwise_mul_public(s1_hat_m[i], c_hat).unmask()`	`dsa.rs:1000`	C1	residual	First-order safe (masked × public is linear, the unmasking happens after the multiply). `T1-A` shipped — `s1_hat_m` is refreshed at the head of every rejection iteration, so the multi-iteration higher-order DPA window is closed. Remaining residual is the plaintext-aggregate floor; closure via share-domain multiply-and-accumulate is a Tier-3 candidate.
`cs2[i] = masked_pointwise_mul_public(s2_hat_m[i], c_hat).unmask()`	`dsa.rs:1045`	C1	residual	Same as `cs1`. `s2_hat_m` covered by the same T1-A per-iteration refresh; `s2` shares feed into `r0 = LowBits(w - c·s2)` whose threshold check leaks outcome-level — Tier-2 candidate (CT norm-on-shares).
`ct0[i] = masked_pointwise_mul_public(t0_hat_m[i], c_hat).unmask()`	`dsa.rs:1128`	C1	residual	Same as `cs1`/`cs2`. `t0_hat_m` covered by the same T1-A per-iteration refresh; `t0` shares feed into hint generation, whose count and threshold leak outcome-level — Tier-3 candidate (share-domain MakeHint).
`decompose::high_bits_vec(&w_tmp, …)`	`dsa.rs:816`	C2	residual	HighBits is on the unmasked `w`; the output is part of the public signature footprint, so the leak is on `y` only via the upstream chain. Hardening would require a HighBits-on-shares gadget. Tier-3 candidate (not on the current roadmap).
`decompose::low_bits(wbuf[j], γ₂)` (per-coef)	`dsa.rs:935`	C2	residual	Inner-loop low-bits extraction inside Phase 2; data-dependent on unmasked `w - cs2`. Same Tier-3 candidate as above.
`decompose::low_bits_vec(&w_minus_cs2, …)`	`dsa.rs:1094`	C2	residual	Vector variant feeding `r0` for the norm check; same class.
`decompose::make_hint(mod_q(-tmp[j]), wbuf[j] + tmp[j], γ₂)`	`dsa.rs:961`	C3	residual	Per-coefficient hint generation, observation of the hint bit pattern is the Hermelink §3 leak. Tier-3 candidate (share-domain MakeHint).
`decompose::make_hint_vec(&neg_ct0, &w_cs2_ct0, γ₂, k)`	`dsa.rs:1179`	C3	residual	Vector variant that returns `(h, num_ones)`; `num_ones > ω` drives a rejection branch whose timing is closed by `sca-ct-rejection` already, but the C3 information leak on the count itself is not. Tier-3 candidate.
`check_norm_vec(&z, γ₁ − β, l)`	`dsa.rs:895, 1100, 1111`	C1/C2	partial	The classic infinity-norm check on the unmasked `z` aggregate. The early-abort timing leak is closed by `sca-ct-rejection` (every iteration computes every intermediate, a single branch- free decision is taken). The information-theoretic leak (the outcome of the check across many signatures) is the residual C1/C2 contribution. Tier-2 candidate (CT norm-check on shares).
`check_norm_vec(&r0, γ₂ − β, k)`	`dsa.rs:1104, 1112`	C1/C2	partial	Same as `z` norm check. `r0 = LowBits(w − c·s2)` is data-dependent on both unmasked aggregates; `sca-ct-rejection` closes the timing leak; the outcome-leak is the residual.
`check_norm_vec(&ct0, γ₂, k)`	`dsa.rs:1146, 1153`	C1/C2	partial	Same posture. Used to decide whether `ct0` is safe to commit to hint generation.

Status legend 

protected — the leak class is closed for this gadget given the threat model in Threat model. Either the operation is linear (so it preserves the additive sharing invariant), or it operates on values that are already public per the FIPS-204 specification.
residual — the leak class is known and acknowledged. The unmasked aggregate is short-lived (immediately re-masked, refreshed, or zeroized through MaskedPoly::zeroize / silentops::ct_zeroize after use). The practical DPA-cost-to-recover-key remains high but is not eliminated. A follow-up Roadmap item that would close the class is named in the matrix row.
partial — a complementary defence is already in place (e.g. sca-ct-rejection removes the early-abort timing leak on norm checks) but the information-theoretic class is not fully closed.

Summary work-list 

The audit originally surfaced three categories of follow-up; T1-A has since shipped (see updated rows above). The remaining active candidates are:

✅ ``T1-A`` — per-iteration mask refresh — shipped. s1_hat_m / s2_hat_m / t0_hat_m refreshed at the head of every rejection iteration via the dsa.rs head-of-loop refresh block (Hermelink §4 prescription matched exactly). Closes the C4 sufficiency gap; reduces every C1 residual on the unmask call sites to the plaintext-aggregate floor.
Tier-2 candidate — CT norm check on shares for check_norm_vec(&z, …), check_norm_vec(&r0, …), check_norm_vec(&ct0, …). Would close the C1/C2 partial rows by moving the norm comparison into the share domain via the silentops CT primitives. Cost: a redesign of check_norm_vec to take MaskedPoly instead of plaintext. Tracked as a future ticket, not yet on the Roadmap.
Tier-3 candidate — share-domain Decompose / MakeHint for the five Decompose / MakeHint call sites listed above. Would close the C2 / C3 rows. Larger redesign (full share-domain HighBits/LowBits/MakeHint kernels). Tracked as a future ticket, not yet on the Roadmap.

The above three categories — one shipped, two future candidates — cover every residual and partial row in the matrix. No row is unaccounted for, and no current protected row degrades under any of the follow-ups.

References 

[HNP25] — the audit reference itself.
[DFM+25] — concealed-ILWE attack on partially-masked Dilithium; motivates the same hardening vector on the masked_mat_vec_mul gadgets.
[ZCQ+26] — rejection-loop attack on the unmasked / hedged path; motivates the sca-ct-rejection posture cross-referenced in the partial rows above.
ML-DSA — countermeasures — the per-threat coverage matrix and the T1-A roadmap entry the audit cross-references.

Hermelink 2025/276 audit pass on ml_dsa::masked

Why this audit

Scope and out-of-scope

Leak-class taxonomy (Hermelink 2025/276)

Per-gadget audit matrix

Status legend

Summary work-list

References