ML-KEM — countermeasures
- FIPS spec:
- Crate path:
quantica::ml_kem- Cargo feature:
ml-kem(on by default);sca-protected(on by default) gates the masked / shuffled variants.
This chapter lists the side-channel countermeasures implemented in the ML-KEM module, indexed by the threat class they address. Each entry cites the paper(s) the construction is drawn from and points to the exact function(s) that host the countermeasure.
The threat classes themselves are defined in Threat model.
Coverage matrix
Threat |
Status |
Countermeasure(s) |
|---|---|---|
SPA / SEMA |
implemented |
NTT butterfly shuffling ( |
DPA / DEMA / CPA / CEMA |
implemented |
First-order Boolean-style masking of the secret polynomial
|
Template attacks on FO comparison |
partial |
Constant-time |
Software / remote timing |
implemented |
All conditional selection goes through |
DFA on FO re-encryption |
implemented |
Double computation in |
DFA on |
implemented |
Integrity check of |
SIFA on implicit-rejection branch |
implemented |
The accept / implicit-reject selection is itself a
branchless |
NTT twiddle-factor masking |
planned (T4-E) |
Item |
SPA / SEMA — NTT butterfly shuffling
Principle
The shuffled NTT permutes the order in which independent butterfly groups (and the butterflies within a group) are evaluated. Because butterflies at the same NTT level are independent, permuting them does not change the functional result; however it destroys the trace-alignment information that an SPA adversary needs to read off the secret polynomial in a single trace.
Published basis
Code pointers
Item |
Location |
|---|---|
Fisher–Yates permutation generator |
|
Shuffled forward NTT |
|
Call sites (keygen, encaps, decaps) |
|
Implementation notes
The SCA-RNG used to draw the permutation is independent from the caller’s entropy pool so that the shuffle cannot be replayed deterministically.
Rejection sampling on 16-bit RNG output removes the modular bias that would otherwise slightly skew the permutation distribution.
The shuffled NTT is only used on secret polynomials; public matrix expansion uses the classical NTT for performance.
DPA / DEMA / CPA — first-order masking of the KPKE secret
Principle
The KPKE secret s is split into two uniformly random arithmetic
shares s = s_0 + s_1 (mod q). Every operation consuming s
(NTT, pointwise multiplication, addition) is rewritten to operate on
the shares, with a fresh-randomness refresh inserted whenever the
two shares are combined with a public value. The unmasked s
never exists in memory outside the generating code path.
Because s never appears as a single value, the Hamming-weight /
Hamming-distance hypothesis at the core of CPA becomes
non-identifiable: correlating the power trace with a guessed chunk
of s produces no peak, since each share is uniform on its own.
Published basis
Code pointers
Item |
Location |
|---|---|
Masked polynomial type + helpers |
|
Masked KPKE decrypt |
|
Masked keygen |
|
Masked decaps (called twice for DFA) |
|
Implementation notes
Masking is first-order. Higher-order attacks that combine two or more time samples can still recover the secret with substantially more traces; tier 4 item
T4-D(not yet scheduled) will extend to a 3-share scheme (CC EAL4+-grade).The mask-refresh points are positioned at the boundary of each Montgomery reduction so that leakage on the mask itself cannot combine with leakage on
sthrough a single linear predictor.
Timing / cache-timing — constant-time primitives everywhere
Principle
Every conditional selection in the ML-KEM control path goes through
silentops::ct_*. The public-parameter length check and the
H(ek) integrity check use ct_eq to ensure timing does not
leak which byte differs. The implicit-rejection select in
decapsulation uses ct_select on the constant-time equality
result.
See Shared side-channel primitives — silentops for the primitive reference and for the dangerous-fallback discussion on x86_64.
Published basis
Code pointers
Item |
Location |
|---|---|
|
|
Consumer call sites |
|
DFA — double computation + CT fault fallback
Principle
ML-KEM decapsulation is vulnerable to a classical DFA on the FO re-encryption step: if an attacker can make the re-encryption return a value close to the real ciphertext on one specific input, the implicit-rejection path is bypassed and the KEM acts as a decryption oracle [BDL97].
The countermeasure runs decapsulation twice, compares the two
shared secrets with ct_eq, and if they differ (a fault occurred
in exactly one run) returns a third value k_fault derived from
z and a domain separator via SHA3. k_fault is:
deterministic for a given
(dk, c)so that a repeated faulted call returns the same value (prevents oracle-by-repetition);distinct from both the legitimate FO output and the implicit-rejection output (so the attacker cannot distinguish “fault detected” from either correct branch);
selected branch-free via
ct_selectso no timing distinguisher exists between “fault” and “no fault”.
This last property was specifically added on 2026-04-20 after a
ctgrind run flagged the previous if !results_match { return
k_fault } branch as a secret-dependent control flow.
Published basis
Code pointers
Item |
Location |
|---|---|
Double decapsulation + branch-free fault fallback |
|
|
Same file, inside |
Fault-detect branch-free select |
|
DFA on dk — H(ek) integrity check
Principle
The decapsulation key dk contains an inline copy of ek and
of H(ek) (FIPS 203 §7.3, layout
dk_pke ‖ ek ‖ H(ek) ‖ z). A fault that alters dk in memory
— for example a hot-carrier induced bit flip in flash — would
undermine the FO security argument because the attacker could coerce
decapsulation into using a crafted dk_pke.
quantica recomputes H(ek_in_dk) on every decaps call and
compares it with the stored H(ek) using ct_eq. A mismatch
aborts the decapsulation with the InvalidDecapsulationKey error
without running decaps_internal.
Code pointers
Item |
Location |
|---|---|
|
|
Planned hardening
Two ML-KEM items scheduled for closure in a later release:
T4-E — harden the FO comparison against the template attack described in [HNPS24]. Approach: replace the direct byte-compare with an arithmetic folding scheme that reduces the number of addressable bytes per comparison step.
T4-F — mask the twiddle factors used by the shuffled NTT ([XWT25]), adding an additional DPA defence layer on top of the current coefficient masking.