ML-KEM — countermeasures

FIPS spec:

[NationalIoSaTechnology24a]

Crate path:

quantica::ml_kem

Cargo feature:

ml-kem (on by default); sca-protected (on by default) gates the masked / shuffled variants.

This chapter lists the side-channel countermeasures implemented in the ML-KEM module, indexed by the threat class they address. Each entry cites the paper(s) the construction is drawn from and points to the exact function(s) that host the countermeasure.

The threat classes themselves are defined in Threat model.

Coverage matrix

ML-KEM countermeasure / threat matrix

Threat

Status

Countermeasure(s)

SPA / SEMA

implemented

NTT butterfly shuffling (shuffle::ntt_shuffled) and branchless byte-level primitives (silentops::ct_*).

DPA / DEMA / CPA / CEMA

implemented

First-order Boolean-style masking of the secret polynomial s (masked::*), masked KPKE decrypt (kpke::decrypt_sca), masked keygen (kem::keygen_internal_sca).

Template attacks on FO comparison

partial

Constant-time ct_eq + constant-time double-decaps select. Hardening of the FO comparison itself is on the roadmap (T4-E, see below and [HNPS24]).

Software / remote timing

implemented

All conditional selection goes through silentops::ct_* with x86_64 asm backend. Verified with ctgrind (see Verification methodology).

DFA on FO re-encryption

implemented

Double computation in decaps_internal_sca with branchless ct_select fallback to k_fault derived from z.

DFA on dk tampering

implemented

Integrity check of H(ek) stored inline in dk.

SIFA on implicit-rejection branch

implemented

The accept / implicit-reject selection is itself a branchless ct_select on a constant-time equality result, so an ineffective fault on the comparison does not change the control flow.

NTT twiddle-factor masking

planned (T4-E)

Item T4-F: mask the twiddle factors used by the shuffled NTT itself ([XWT25]).

SPA / SEMA — NTT butterfly shuffling

Principle

The shuffled NTT permutes the order in which independent butterfly groups (and the butterflies within a group) are evaluated. Because butterflies at the same NTT level are independent, permuting them does not change the functional result; however it destroys the trace-alignment information that an SPA adversary needs to read off the secret polynomial in a single trace.

Published basis

  • [XWT25] — hardware-style butterfly shuffling for ML-KEM, the baseline we re-implemented in software using a Fisher–Yates permutation drawn from an SCA-dedicated CSPRNG.

  • [Saa23] — independent evaluation of the shuffling benefit on ML-KEM / ML-DSA.

Code pointers

Item

Location

Fisher–Yates permutation generator

quantica/src/ml_kem/shuffle.rs generate_permutation

Shuffled forward NTT

quantica/src/ml_kem/shuffle.rs ntt_shuffled

Call sites (keygen, encaps, decaps)

quantica/src/ml_kem/kem.rskeygen_internal_sca (line 78+), decaps_internal_sca (line 189+).

Implementation notes

  • The SCA-RNG used to draw the permutation is independent from the caller’s entropy pool so that the shuffle cannot be replayed deterministically.

  • Rejection sampling on 16-bit RNG output removes the modular bias that would otherwise slightly skew the permutation distribution.

  • The shuffled NTT is only used on secret polynomials; public matrix expansion uses the classical NTT for performance.

DPA / DEMA / CPA — first-order masking of the KPKE secret

Principle

The KPKE secret s is split into two uniformly random arithmetic shares s = s_0 + s_1 (mod q). Every operation consuming s (NTT, pointwise multiplication, addition) is rewritten to operate on the shares, with a fresh-randomness refresh inserted whenever the two shares are combined with a public value. The unmasked s never exists in memory outside the generating code path.

Because s never appears as a single value, the Hamming-weight / Hamming-distance hypothesis at the core of CPA becomes non-identifiable: correlating the power trace with a guessed chunk of s produces no peak, since each share is uniform on its own.

Published basis

  • [Nko25] — the pointwise multiplication is the key DPA target; masking each share defeats the attack.

  • [AOP+25] — independent evaluation on OpenTitan-class hardware confirming the trace-count blow-up.

Code pointers

Item

Location

Masked polynomial type + helpers

quantica/src/ml_kem/masked.rs (MaskedPoly, masked_ntt, masked_ntt_inv, masked_multiply_public, masked_add_public, masked_add / masked_sub, masked_multiply_accumulate)

Masked KPKE decrypt

quantica/src/ml_kem/kpke.rs decrypt_sca

Masked keygen

quantica/src/ml_kem/kem.rs keygen_internal_sca

Masked decaps (called twice for DFA)

quantica/src/ml_kem/kem.rs decaps_internal_sca

Implementation notes

  • Masking is first-order. Higher-order attacks that combine two or more time samples can still recover the secret with substantially more traces; tier 4 item T4-D (not yet scheduled) will extend to a 3-share scheme (CC EAL4+-grade).

  • The mask-refresh points are positioned at the boundary of each Montgomery reduction so that leakage on the mask itself cannot combine with leakage on s through a single linear predictor.

Timing / cache-timing — constant-time primitives everywhere

Principle

Every conditional selection in the ML-KEM control path goes through silentops::ct_*. The public-parameter length check and the H(ek) integrity check use ct_eq to ensure timing does not leak which byte differs. The implicit-rejection select in decapsulation uses ct_select on the constant-time equality result.

See Shared side-channel primitives — silentops for the primitive reference and for the dangerous-fallback discussion on x86_64.

Published basis

  • [Koc96] — classical timing-attack theory.

  • [Lan10] — the verification methodology used here.

Code pointers

Item

Location

ct_eq wrapper (bool return) + ct_select for 32-byte payloads

quantica/src/ml_kem/kem.rs lines 427 and 434.

Consumer call sites

decaps (kem.rs line 321) — dk integrity check + fault fallback select; decaps_internal_sca — implicit rejection select.

DFA — double computation + CT fault fallback

Principle

ML-KEM decapsulation is vulnerable to a classical DFA on the FO re-encryption step: if an attacker can make the re-encryption return a value close to the real ciphertext on one specific input, the implicit-rejection path is bypassed and the KEM acts as a decryption oracle [BDL97].

The countermeasure runs decapsulation twice, compares the two shared secrets with ct_eq, and if they differ (a fault occurred in exactly one run) returns a third value k_fault derived from z and a domain separator via SHA3. k_fault is:

  • deterministic for a given (dk, c) so that a repeated faulted call returns the same value (prevents oracle-by-repetition);

  • distinct from both the legitimate FO output and the implicit-rejection output (so the attacker cannot distinguish “fault detected” from either correct branch);

  • selected branch-free via ct_select so no timing distinguisher exists between “fault” and “no fault”.

This last property was specifically added on 2026-04-20 after a ctgrind run flagged the previous if !results_match { return k_fault } branch as a secret-dependent control flow.

Published basis

  • [BDL97] — the classical DFA framework.

  • [Saa23] — ML-KEM-specific fault analysis catalogue, used to size the countermeasure.

Code pointers

Item

Location

Double decapsulation + branch-free fault fallback

quantica/src/ml_kem/kem.rs decaps (FIPS 203 level API), especially the block after the two decaps_internal_sca calls.

k_fault derivation

Same file, inside decaps. fault_input = z 0xFF absorbed by sha3::h.

Fault-detect branch-free select

quantica/src/ml_kem/kem.rs ct_select (line 434) invoked from decaps.

DFA on dkH(ek) integrity check

Principle

The decapsulation key dk contains an inline copy of ek and of H(ek) (FIPS 203 §7.3, layout dk_pke ek H(ek) z). A fault that alters dk in memory — for example a hot-carrier induced bit flip in flash — would undermine the FO security argument because the attacker could coerce decapsulation into using a crafted dk_pke.

quantica recomputes H(ek_in_dk) on every decaps call and compares it with the stored H(ek) using ct_eq. A mismatch aborts the decapsulation with the InvalidDecapsulationKey error without running decaps_internal.

Code pointers

Item

Location

H(ek_in_dk) recomputation + ct_eq check

quantica/src/ml_kem/kem.rs decaps (lines 336 – 344) and decaps_single (lines 406 – 411).

Planned hardening

Two ML-KEM items scheduled for closure in a later release:

  • T4-E — harden the FO comparison against the template attack described in [HNPS24]. Approach: replace the direct byte-compare with an arithmetic folding scheme that reduces the number of addressable bytes per comparison step.

  • T4-F — mask the twiddle factors used by the shuffled NTT ([XWT25]), adding an additional DPA defence layer on top of the current coefficient masking.