################################################################### ML-KEM — countermeasures ################################################################### :FIPS spec: :cite:`fips203` :Crate path: ``quantica::ml_kem`` :Cargo feature: ``ml-kem`` (on by default); ``sca-protected`` (on by default) gates the masked / shuffled variants. This chapter lists the side-channel countermeasures implemented in the ML-KEM module, indexed by the threat class they address. Each entry cites the paper(s) the construction is drawn from and points to the exact function(s) that host the countermeasure. The threat classes themselves are defined in :doc:`../threat_model`. .. contents:: :local: :depth: 2 Coverage matrix =============== .. list-table:: ML-KEM countermeasure / threat matrix :header-rows: 1 :widths: 25 18 57 * - Threat - Status - Countermeasure(s) * - SPA / SEMA - implemented - NTT butterfly shuffling (``shuffle::ntt_shuffled``) and branchless byte-level primitives (``silentops::ct_*``). * - DPA / DEMA / CPA / CEMA - implemented - First-order Boolean-style masking of the secret polynomial ``s`` (``masked::*``), masked KPKE decrypt (``kpke::decrypt_sca``), masked keygen (``kem::keygen_internal_sca``). * - Template attacks on FO comparison - partial - Constant-time ``ct_eq`` + constant-time double-decaps select. Hardening of the FO comparison itself is on the roadmap (``T4-E``, see below and :cite:`eprint2024_template_fo_comparison`). * - Software / remote timing - implemented - All conditional selection goes through ``silentops::ct_*`` with x86_64 asm backend. Verified with ctgrind (see :doc:`../verification`). * - DFA on FO re-encryption - implemented - Double computation in ``decaps_internal_sca`` with branchless ``ct_select`` fallback to ``k_fault`` derived from ``z``. * - DFA on ``dk`` tampering - implemented - Integrity check of ``H(ek)`` stored inline in ``dk``. * - SIFA on implicit-rejection branch - implemented - The accept / implicit-reject selection is itself a branchless ``ct_select`` on a constant-time equality result, so an ineffective fault on the comparison does not change the control flow. * - NTT twiddle-factor masking - planned (T4-E) - Item ``T4-F``: mask the twiddle factors used by the shuffled NTT itself (:cite:`arxiv2024_mlkem_shuffling_hw`). SPA / SEMA — NTT butterfly shuffling ==================================== Principle --------- The shuffled NTT permutes the order in which independent butterfly groups (and the butterflies within a group) are evaluated. Because butterflies at the same NTT level are independent, permuting them does not change the functional result; however it destroys the trace-alignment information that an SPA adversary needs to read off the secret polynomial in a single trace. Published basis --------------- * :cite:`arxiv2024_mlkem_shuffling_hw` — hardware-style butterfly shuffling for ML-KEM, the baseline we re-implemented in software using a Fisher–Yates permutation drawn from an SCA-dedicated CSPRNG. * :cite:`nist2023_sca_saarinen` — independent evaluation of the shuffling benefit on ML-KEM / ML-DSA. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - Fisher–Yates permutation generator - ``quantica/src/ml_kem/shuffle.rs`` ``generate_permutation`` * - Shuffled forward NTT - ``quantica/src/ml_kem/shuffle.rs`` ``ntt_shuffled`` * - Call sites (keygen, encaps, decaps) - ``quantica/src/ml_kem/kem.rs`` — ``keygen_internal_sca`` (line 78+), ``decaps_internal_sca`` (line 189+). Implementation notes -------------------- * The SCA-RNG used to draw the permutation is independent from the caller's entropy pool so that the shuffle cannot be replayed deterministically. * Rejection sampling on 16-bit RNG output removes the modular bias that would otherwise slightly skew the permutation distribution. * The shuffled NTT is only used on **secret** polynomials; public matrix expansion uses the classical NTT for performance. DPA / DEMA / CPA — first-order masking of the KPKE secret ========================================================= Principle --------- The KPKE secret ``s`` is split into two uniformly random arithmetic shares ``s = s_0 + s_1 (mod q)``. Every operation consuming ``s`` (NTT, pointwise multiplication, addition) is rewritten to operate on the shares, with a fresh-randomness refresh inserted whenever the two shares are combined with a public value. The unmasked ``s`` never exists in memory outside the generating code path. Because ``s`` never appears as a single value, the Hamming-weight / Hamming-distance hypothesis at the core of CPA becomes non-identifiable: correlating the power trace with a guessed chunk of ``s`` produces no peak, since each share is uniform on its own. Published basis --------------- * :cite:`eprint2025_sca_mlkem_pointwise` — the pointwise multiplication is the key DPA target; masking each share defeats the attack. * :cite:`cryptojedi2024_mlkem_mldsa_opentitan` — independent evaluation on OpenTitan-class hardware confirming the trace-count blow-up. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - Masked polynomial type + helpers - ``quantica/src/ml_kem/masked.rs`` (``MaskedPoly``, ``masked_ntt``, ``masked_ntt_inv``, ``masked_multiply_public``, ``masked_add_public``, ``masked_add`` / ``masked_sub``, ``masked_multiply_accumulate``) * - Masked KPKE decrypt - ``quantica/src/ml_kem/kpke.rs`` ``decrypt_sca`` * - Masked keygen - ``quantica/src/ml_kem/kem.rs`` ``keygen_internal_sca`` * - Masked decaps (called twice for DFA) - ``quantica/src/ml_kem/kem.rs`` ``decaps_internal_sca`` Implementation notes -------------------- * Masking is **first-order**. Higher-order attacks that combine two or more time samples can still recover the secret with substantially more traces; tier 4 item ``T4-D`` (not yet scheduled) will extend to a 3-share scheme (CC EAL4+-grade). * The mask-refresh points are positioned at the boundary of each Montgomery reduction so that leakage on the mask itself cannot combine with leakage on ``s`` through a single linear predictor. Timing / cache-timing — constant-time primitives everywhere =========================================================== Principle --------- Every conditional selection in the ML-KEM control path goes through ``silentops::ct_*``. The public-parameter length check and the ``H(ek)`` integrity check use ``ct_eq`` to ensure timing does not leak *which* byte differs. The implicit-rejection select in decapsulation uses ``ct_select`` on the constant-time equality result. See :doc:`../primitives` for the primitive reference and for the dangerous-fallback discussion on x86_64. Published basis --------------- * :cite:`kocher1996timing` — classical timing-attack theory. * :cite:`langley2010ctgrind` — the verification methodology used here. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - ``ct_eq`` wrapper (``bool`` return) + ``ct_select`` for 32-byte payloads - ``quantica/src/ml_kem/kem.rs`` lines 427 and 434. * - Consumer call sites - ``decaps`` (``kem.rs`` line 321) — dk integrity check + fault fallback select; ``decaps_internal_sca`` — implicit rejection select. DFA — double computation + CT fault fallback ============================================= Principle --------- ML-KEM decapsulation is vulnerable to a classical DFA on the FO re-encryption step: if an attacker can make the re-encryption return a value close to the real ciphertext on one specific input, the implicit-rejection path is bypassed and the KEM acts as a decryption oracle :cite:`boneh1997dfa`. The countermeasure runs decapsulation twice, compares the two shared secrets with ``ct_eq``, and if they differ (a fault occurred in exactly one run) returns a third value ``k_fault`` derived from ``z`` and a domain separator via SHA3. ``k_fault`` is: * deterministic for a given ``(dk, c)`` so that a repeated faulted call returns the same value (prevents oracle-by-repetition); * distinct from both the legitimate FO output and the implicit-rejection output (so the attacker cannot distinguish "fault detected" from either correct branch); * selected **branch-free** via ``ct_select`` so no timing distinguisher exists between "fault" and "no fault". This last property was specifically added on 2026-04-20 after a ctgrind run flagged the previous ``if !results_match { return k_fault }`` branch as a secret-dependent control flow. Published basis --------------- * :cite:`boneh1997dfa` — the classical DFA framework. * :cite:`nist2023_sca_saarinen` — ML-KEM-specific fault analysis catalogue, used to size the countermeasure. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - Double decapsulation + branch-free fault fallback - ``quantica/src/ml_kem/kem.rs`` ``decaps`` (FIPS 203 level API), especially the block after the two ``decaps_internal_sca`` calls. * - ``k_fault`` derivation - Same file, inside ``decaps``. ``fault_input = z ‖ 0xFF`` absorbed by ``sha3::h``. * - Fault-detect branch-free select - ``quantica/src/ml_kem/kem.rs`` ``ct_select`` (line 434) invoked from ``decaps``. DFA on ``dk`` — ``H(ek)`` integrity check ========================================= Principle --------- The decapsulation key ``dk`` contains an inline copy of ``ek`` and of ``H(ek)`` (FIPS 203 §7.3, layout ``dk_pke ‖ ek ‖ H(ek) ‖ z``). A fault that alters ``dk`` in memory — for example a hot-carrier induced bit flip in flash — would undermine the FO security argument because the attacker could coerce decapsulation into using a crafted ``dk_pke``. ``quantica`` recomputes ``H(ek_in_dk)`` on every ``decaps`` call and compares it with the stored ``H(ek)`` using ``ct_eq``. A mismatch aborts the decapsulation with the ``InvalidDecapsulationKey`` error without running ``decaps_internal``. Code pointers ------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Item - Location * - ``H(ek_in_dk)`` recomputation + ``ct_eq`` check - ``quantica/src/ml_kem/kem.rs`` ``decaps`` (lines 336 – 344) and ``decaps_single`` (lines 406 – 411). Planned hardening ========================== Two ML-KEM items scheduled for closure in a later release: * **T4-E** — harden the FO comparison against the template attack described in :cite:`eprint2024_template_fo_comparison`. Approach: replace the direct byte-compare with an arithmetic folding scheme that reduces the number of addressable bytes per comparison step. * **T4-F** — mask the twiddle factors used by the shuffled NTT (:cite:`arxiv2024_mlkem_shuffling_hw`), adding an additional DPA defence layer on top of the current coefficient masking.