ML-KEM — countermeasures

FIPS spec:: [NationalIoSaTechnology24a]
Crate path:: quantica::ml_kem
Cargo feature:: ml-kem (on by default); sca-protected (on by default) gates the masked / shuffled variants.

This chapter lists the side-channel countermeasures implemented in the ML-KEM module, indexed by the threat class they address. Each entry cites the paper(s) the construction is drawn from and points to the exact function(s) that host the countermeasure.

The threat classes themselves are defined in Threat model.

Coverage matrix 

ML-KEM countermeasure / threat matrix
Threat	Status	Countermeasure(s)
SPA / SEMA	implemented	NTT butterfly shuffling (`shuffle::ntt_shuffled`) and branchless byte-level primitives (`silentops::ct_*`).
DPA / DEMA / CPA / CEMA	implemented	First-order Boolean-style masking of the secret polynomial `s` (`masked::*`), masked KPKE decrypt (`kpke::decrypt_sca`), masked keygen (`kem::keygen_internal_sca`).
Template attacks on FO comparison	partial	Constant-time `ct_eq` + constant-time double-decaps select. Hardening of the FO comparison itself is on the roadmap (`T4-E`, see below and [HNPS24]).
Software / remote timing	implemented	All conditional selection goes through `silentops::ct_*` with x86_64 asm backend. Verified with ctgrind (see Verification methodology).
DFA on FO re-encryption	implemented	Double computation in `decaps_internal_sca` with branchless `ct_select` fallback to `k_fault` derived from `z`.
DFA on `dk` tampering	implemented	Integrity check of `H(ek)` stored inline in `dk`.
SIFA on implicit-rejection branch	implemented	The accept / implicit-reject selection is itself a branchless `ct_select` on a constant-time equality result, so an ineffective fault on the comparison does not change the control flow.
NTT twiddle-factor masking	planned (T4-E)	Item `T4-F`: mask the twiddle factors used by the shuffled NTT itself ([XWT25]).

SPA / SEMA — NTT butterfly shuffling 

Principle 

The shuffled NTT permutes the order in which independent butterfly groups (and the butterflies within a group) are evaluated. Because butterflies at the same NTT level are independent, permuting them does not change the functional result; however it destroys the trace-alignment information that an SPA adversary needs to read off the secret polynomial in a single trace.

Published basis 

[XWT25] — hardware-style butterfly shuffling for ML-KEM, the baseline we re-implemented in software using a Fisher–Yates permutation drawn from an SCA-dedicated CSPRNG.
[Saa23] — independent evaluation of the shuffling benefit on ML-KEM / ML-DSA.

Code pointers 

Item	Location
Fisher–Yates permutation generator	`quantica/src/ml_kem/shuffle.rs` `generate_permutation`
Shuffled forward NTT	`quantica/src/ml_kem/shuffle.rs` `ntt_shuffled`
Call sites (keygen, encaps, decaps)	`quantica/src/ml_kem/kem.rs` — `keygen_internal_sca` (line 78+), `decaps_internal_sca` (line 189+).

Implementation notes 

The SCA-RNG used to draw the permutation is independent from the caller’s entropy pool so that the shuffle cannot be replayed deterministically.
Rejection sampling on 16-bit RNG output removes the modular bias that would otherwise slightly skew the permutation distribution.
The shuffled NTT is only used on secret polynomials; public matrix expansion uses the classical NTT for performance.

DPA / DEMA / CPA — first-order masking of the KPKE secret 

Principle 

The KPKE secret s is split into two uniformly random arithmetic shares s = s_0 + s_1 (mod q). Every operation consuming s (NTT, pointwise multiplication, addition) is rewritten to operate on the shares, with a fresh-randomness refresh inserted whenever the two shares are combined with a public value. The unmasked s never exists in memory outside the generating code path.

Because s never appears as a single value, the Hamming-weight / Hamming-distance hypothesis at the core of CPA becomes non-identifiable: correlating the power trace with a guessed chunk of s produces no peak, since each share is uniform on its own.

Published basis 

[Nko25] — the pointwise multiplication is the key DPA target; masking each share defeats the attack.
[AOP+25] — independent evaluation on OpenTitan-class hardware confirming the trace-count blow-up.

Code pointers 

Item	Location
Masked polynomial type + helpers	`quantica/src/ml_kem/masked.rs` (`MaskedPoly`, `masked_ntt`, `masked_ntt_inv`, `masked_multiply_public`, `masked_add_public`, `masked_add` / `masked_sub`, `masked_multiply_accumulate`)
Masked KPKE decrypt	`quantica/src/ml_kem/kpke.rs` `decrypt_sca`
Masked keygen	`quantica/src/ml_kem/kem.rs` `keygen_internal_sca`
Masked decaps (called twice for DFA)	`quantica/src/ml_kem/kem.rs` `decaps_internal_sca`

Implementation notes 

Masking is first-order. Higher-order attacks that combine two or more time samples can still recover the secret with substantially more traces; tier 4 item T4-D (not yet scheduled) will extend to a 3-share scheme (CC EAL4+-grade).
The mask-refresh points are positioned at the boundary of each Montgomery reduction so that leakage on the mask itself cannot combine with leakage on s through a single linear predictor.

Timing / cache-timing — constant-time primitives everywhere 

Principle 

Every conditional selection in the ML-KEM control path goes through silentops::ct_*. The public-parameter length check and the H(ek) integrity check use ct_eq to ensure timing does not leak which byte differs. The implicit-rejection select in decapsulation uses ct_select on the constant-time equality result.

See Shared side-channel primitives — silentops for the primitive reference and for the dangerous-fallback discussion on x86_64.

Published basis 

[Koc96] — classical timing-attack theory.
[Lan10] — the verification methodology used here.

Code pointers 

Item	Location
`ct_eq` wrapper (`bool` return) + `ct_select` for 32-byte payloads	`quantica/src/ml_kem/kem.rs` lines 427 and 434.
Consumer call sites	`decaps` (`kem.rs` line 321) — dk integrity check + fault fallback select; `decaps_internal_sca` — implicit rejection select.

DFA — double computation + CT fault fallback 

Principle 

ML-KEM decapsulation is vulnerable to a classical DFA on the FO re-encryption step: if an attacker can make the re-encryption return a value close to the real ciphertext on one specific input, the implicit-rejection path is bypassed and the KEM acts as a decryption oracle [BDL97].

The countermeasure runs decapsulation twice, compares the two shared secrets with ct_eq, and if they differ (a fault occurred in exactly one run) returns a third value k_fault derived from z and a domain separator via SHA3. k_fault is:

deterministic for a given (dk, c) so that a repeated faulted call returns the same value (prevents oracle-by-repetition);
distinct from both the legitimate FO output and the implicit-rejection output (so the attacker cannot distinguish “fault detected” from either correct branch);
selected branch-free via ct_select so no timing distinguisher exists between “fault” and “no fault”.

This last property was specifically added on 2026-04-20 after a ctgrind run flagged the previous if !results_match { return k_fault } branch as a secret-dependent control flow.

Published basis 

[BDL97] — the classical DFA framework.
[Saa23] — ML-KEM-specific fault analysis catalogue, used to size the countermeasure.

Code pointers 

Item	Location
Double decapsulation + branch-free fault fallback	`quantica/src/ml_kem/kem.rs` `decaps` (FIPS 203 level API), especially the block after the two `decaps_internal_sca` calls.
`k_fault` derivation	Same file, inside `decaps`. `fault_input = z ‖ 0xFF` absorbed by `sha3::h`.
Fault-detect branch-free select	`quantica/src/ml_kem/kem.rs` `ct_select` (line 434) invoked from `decaps`.

DFA on `dk` — `H(ek)` integrity check 

Principle 

The decapsulation key dk contains an inline copy of ek and of H(ek) (FIPS 203 §7.3, layout dk_pke ‖ ek ‖ H(ek) ‖ z). A fault that alters dk in memory — for example a hot-carrier induced bit flip in flash — would undermine the FO security argument because the attacker could coerce decapsulation into using a crafted dk_pke.

quantica recomputes H(ek_in_dk) on every decaps call and compares it with the stored H(ek) using ct_eq. A mismatch aborts the decapsulation with the InvalidDecapsulationKey error without running decaps_internal.

Code pointers 

Item	Location
`H(ek_in_dk)` recomputation + `ct_eq` check	`quantica/src/ml_kem/kem.rs` `decaps` (lines 336 – 344) and `decaps_single` (lines 406 – 411).

Planned hardening 

Two ML-KEM items scheduled for closure in a later release:

T4-E — harden the FO comparison against the template attack described in [HNPS24]. Approach: replace the direct byte-compare with an arithmetic folding scheme that reduces the number of addressable bytes per comparison step.
T4-F — mask the twiddle factors used by the shuffled NTT ([XWT25]), adding an additional DPA defence layer on top of the current coefficient masking.