ML-DSA — countermeasures
- FIPS spec:
- Crate path:
quantica::ml_dsa- Cargo features:
ml-dsa(on by default);sca-protected(on by default, gates masking + shuffling);sca-masked-y(maskedypipeline, on in hardened builds);sca-ct-rejection(branchless rejection loop, on in hardened builds).
ML-DSA has the richest SCA threat surface of the three algorithms
— it mixes a non-deterministic rejection-sampling loop, several
secret polynomials used in linear combinations, and a Fiat–Shamir
challenge that exposes bit-level intermediates. This chapter lists
the countermeasures implemented in quantica::ml_dsa, indexed by
threat class, and the planned hardening items still outstanding for the next
hardening round.
Threat classes reference: Threat model. Primitive reference: Shared side-channel primitives — silentops. Verification methodology: Verification methodology.
Coverage matrix
Threat |
Status |
Countermeasure(s) |
|---|---|---|
SPA / SEMA on secret NTT |
implemented |
Fisher–Yates shuffled NTT for |
DPA on |
implemented |
Masked |
DPA on |
implemented |
First-order arithmetic masking kept across the rejection loop
( |
Timing on rejection loop |
implemented |
Compute all intermediates (cs1, z, cs2, r0, ct0, hint) every
iteration, single branch-free accept/reject decision
( |
Software / remote timing |
partial (interim) |
All conditional selections route through |
DFA on norm checks |
partial |
CT rejection loop already double-checks norms before emission; explicit redundant signing planned (see Roadmap chapter of the README). |
Template attacks |
implemented |
NTT shuffling destroys trace alignment; masking multiplies the profile cost. |
Higher-order DPA via mask re-use |
implemented ( |
Per-iteration |
Hermelink 2025/276 leakage map of masked-y gadgets |
audit shipped ( |
Information-theoretic audit pass over every masked gadget and unmask call site, classified against the Hermelink leak taxonomy (C1-C5) with a per-row follow-up tracker. See Hermelink 2025/276 audit pass on ml_dsa::masked. |
SPA / SEMA — Fisher-Yates shuffled NTT
Principle
Same idea as ML-KEM: draw a random permutation of the NTT butterfly
groups and of the butterflies within each group, execute in the
permuted order. The shuffle is applied to s1 (l polynomials),
s2 (k) and t0 (k) — the three secret vectors. The
public matrix A uses the classical NTT.
The permutations are drawn from a dedicated ScaRng seeded with
K ‖ rnd ‖ tr ‖ M' (SHAKE256), so a given signature uses a
reproducible but unpredictable-to-an-attacker order.
Published basis
Code pointers
Item |
Location |
|---|---|
Fisher-Yates permutation generator |
|
Shuffled NTT |
|
Call sites (Step 1 of |
|
|
|
DPA — first-order masking of secret polynomials
Principle
Each secret polynomial (s1, s2, t0) is kept as a pair
(P_0, P_1) with P = P_0 + P_1 (mod q). Operations taking a
secret as operand (NTT, pointwise mul with public A, matrix-
vector multiplication) are rewritten on shares. The A·y step is
the most DPA-critical: it operates on the masked y and the public
A, yielding a share representation of w that is unmasked
only once in the accept/reject logic.
Published basis
Code pointers
Item |
Location |
|---|---|
|
|
Call sites for masked NTT on secrets |
|
Zeroization of masked polynomials |
|
DPA on y — the sca-masked-y pipeline
Principle
The masking vector y is the main vector target for DPA: the
published signature component z = y + c·s1 reveals a linear
combination of y and s1, so averaging many signatures on
equal message / equal c recovers s1 from y if y ever
appears unmasked on the power trace.
quantica samples y as two arithmetic shares directly from
SHAKE256, runs masked NTT on the shares, computes A·y with the
public matrix on the shares, and unmasks w = A·y only when the
rejection loop has committed to publishing it — exactly the
construction of [CGerardL+24].
Published basis
[CGerardL+24] — canonical high-order masked generation of the masking vector and masked rejection sampling gadget (TCHES 2024.4). Construction followed by our implementation.
[BelaidBD+26] — SUCRE (TCHES 2026.1), a shuffle-and-unmask alternative that delivers 4–6× speedup on the same security claim. Candidate for
T4-Amigration evaluation (see below).
Known attacks against the construction
[HNP25] (CRYPTO 2025): information-theoretic leakage map of masked-
yimplementations at first, second, and higher orders. Not a break of the construction itself, but an auditor’s checklist for the gadgets instantiating it.T1-Btracks the pass-through of this checklist on our code.[DFM+25] (ASIACRYPT 2025): introduces concealed ILWE with Huber/Cauchy regression; breaks masked-Dilithium implementations that leak up to 90% of the shares. Motivates strong care on the masked-NTT and masked-
A·ygadgets.[ZCQ+26] (DATE 2026): non-profiling attack on the unmasked / hedged rejection loop (96 traces for
c, ~300 traces for the key on a Cortex-M4 target). Primary motivator for the ``sca-ct-rejection`` feature below.
Code pointers
Item |
Location |
|---|---|
Masked |
|
Masked |
|
Call site in the sign loop |
|
Tests |
|
Timing — constant-time rejection loop
Principle
The FIPS 204 rejection loop as written branches out of the iteration as soon as the norm check fails:
repeat
compute w, w1, c, z, r0
if norm(z) >= gamma1 - beta then restart
if norm(r0) >= gamma2 - beta then restart
...
until accepted
A timing observer can therefore tell at which test the candidate
was rejected, which leaks information about z and r0 — and
thereby about s1, s2 [LWW+25].
The sca-ct-rejection feature rewrites the loop so that every
iteration computes all intermediates (cs1, z, cs2,
r0, ct0, hint) and accumulates a single branch-free
accept flag that is consulted only at the very end of the iteration.
The loop keeps running until accept; observing an iteration cannot
tell which norm check decided the fate.
Published basis
[LWW+25] — the initial timing-leak analysis that motivates the countermeasure.
[ZCQ+26] (DATE 2026) — a non-profiling public-template attack recovering
cin 96 traces and the signing key in ~300 traces on a Cortex-M4 target in hedged / unprotected mode. Thesca-ct-rejectionfeature is the intended answer to this attack class.
Code pointers
Item |
Location |
|---|---|
Rejection loop with branch-free accept |
|
Norm-check helpers returning bit flags |
|
Template attacks
Template attacks against ML-DSA rely on profile-matching the NTT
coefficients of s1, s2, t0 or the y sampling. The
defences already described — masking + shuffling — destroy the
inter-trace alignment a template attack depends on, and multiply
the profile size the attacker has to maintain.
See Threat model for cost estimates; see [Chh26] for a practical profile against an unprotected Cortex-M0 implementation and the required trace counts once shuffling is in place.
Planned hardening
The following items are scheduled for the next hardening round; each
closes one of the tools/ctgrind.supp entries documented under
Verification methodology.
T2-A — explicit
ct_grind::unpoisonafter the algorithmic unmasking point ofw1,h,z. Lets ctgrind re-verify with zero suppressions on thedecompose::high_bits_vec,encode::w1_encode,decompose::make_hint_vec,encode::sig_encodepaths.T2-B — branch-free
generate_permutation(Feistel- or Floyd-based) to close the suppression onshuffle::generate_permutation.T1-A — A3: refresh the shares of
s1,s2,t0at the start of every rejection iteration to defeat higher-order DPA variants that combine two iterations’ leakage — shipped. Thedsa.rsrejection loop opens with a#[cfg(feature = "sca-protected")]block that callsMaskedPoly::refreshon every polynomial ofs1_hat_m,s2_hat_m,t0_hat_mbefore any operation on the shares — the Hermelink [HNP25] §4 prescription matched exactly. Output bytes are byte-identical to the pre-T1-A baseline (mask cancels in unmask); cost is unchanged versus the previous end-of-cs/ct refresh placement (same number ofScaRngbytes consumed per iteration). Audit row flipped to protected in Hermelink 2025/276 audit pass on ml_dsa::masked.T2-C — documentation traceability: after A/B/C land, the historical suppression file becomes a “resolved-findings” annex in Verification methodology.
T4-A — SUCRE migration evaluation ([BelaidBD+26]). Benchmark
sca-masked-yagainst SUCRE’s shuffle-and-unmask gadget on our target platforms (Cortex-M4 class). Migrate the masked rejection path if the published 4–6× speedup holds on-device and the transient memory footprint fits the embedded budget. The existing masked-ypipeline remains the fallback if the speedup is swallowed by our other constraints.T1-B — Hermelink audit pass on masked.rs — shipped ([HNP25]). The information-theoretic leakage map of CRYPTO 2025 has been applied to every gadget of
quantica/src/ml_dsa/masked.rsand every unmask call site of the rejection loop inquantica/src/ml_dsa/dsa.rs::sign_internal; each row is classified as protected, partial, or acknowledged residual risk, with a per-row follow-up pointer. The full audit annex is Hermelink 2025/276 audit pass on ml_dsa::masked. Primary follow-up surfaced by the audit —T1-A(per-iteration share refresh) — has since shipped, closing the C4 sufficiency row and reducing the C1 residuals to the plaintext-aggregate floor. Remaining open follow-ups (Tier-2 CT norm-on-shares and Tier-3 share-domain Decompose/MakeHint) are tracked in the audit’s work-list.