###################################################################
ML-KEM — countermeasures
###################################################################

:FIPS spec:   :cite:`fips203`
:Crate path:  ``quantica::ml_kem``
:Cargo feature:  ``ml-kem`` (on by default); ``sca-protected`` (on by
                 default) gates the masked / shuffled variants.

This chapter lists the side-channel countermeasures implemented in
the ML-KEM module, indexed by the threat class they address. Each
entry cites the paper(s) the construction is drawn from and points
to the exact function(s) that host the countermeasure.

The threat classes themselves are defined in :doc:`../threat_model`.

.. contents::
   :local:
   :depth: 2

Coverage matrix
===============

.. list-table:: ML-KEM countermeasure / threat matrix
   :header-rows: 1
   :widths: 25 18 57

   * - Threat
     - Status
     - Countermeasure(s)
   * - SPA / SEMA
     - implemented
     - NTT butterfly shuffling (``shuffle::ntt_shuffled``) and
       branchless byte-level primitives (``silentops::ct_*``).
   * - DPA / DEMA / CPA / CEMA
     - implemented
     - First-order Boolean-style masking of the secret polynomial
       ``s`` (``masked::*``), masked KPKE decrypt
       (``kpke::decrypt_sca``), masked keygen
       (``kem::keygen_internal_sca``).
   * - Template attacks on FO comparison
     - partial
     - Constant-time ``ct_eq`` + constant-time double-decaps select.
       Hardening of the FO comparison itself is on the roadmap
       (``T4-E``, see below and
       :cite:`eprint2024_template_fo_comparison`).
   * - Software / remote timing
     - implemented
     - All conditional selection goes through ``silentops::ct_*`` with
       x86_64 asm backend. Verified with ctgrind (see
       :doc:`../verification`).
   * - DFA on FO re-encryption
     - implemented
     - Double computation in ``decaps_internal_sca`` with branchless
       ``ct_select`` fallback to ``k_fault`` derived from ``z``.
   * - DFA on ``dk`` tampering
     - implemented
     - Integrity check of ``H(ek)`` stored inline in ``dk``.
   * - SIFA on implicit-rejection branch
     - implemented
     - The accept / implicit-reject selection is itself a
       branchless ``ct_select`` on a constant-time equality result,
       so an ineffective fault on the comparison does not change the
       control flow.
   * - NTT twiddle-factor masking
     - planned (T4-E)
     - Item ``T4-F``: mask the twiddle factors used by the
       shuffled NTT itself (:cite:`arxiv2024_mlkem_shuffling_hw`).

SPA / SEMA — NTT butterfly shuffling
====================================

Principle
---------

The shuffled NTT permutes the order in which independent butterfly
groups (and the butterflies within a group) are evaluated. Because
butterflies at the same NTT level are independent, permuting them
does not change the functional result; however it destroys the
trace-alignment information that an SPA adversary needs to read off
the secret polynomial in a single trace.

Published basis
---------------

* :cite:`arxiv2024_mlkem_shuffling_hw` — hardware-style butterfly
  shuffling for ML-KEM, the baseline we re-implemented in software
  using a Fisher–Yates permutation drawn from an SCA-dedicated
  CSPRNG.
* :cite:`nist2023_sca_saarinen` — independent evaluation of the
  shuffling benefit on ML-KEM / ML-DSA.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - Fisher–Yates permutation generator
     - ``quantica/src/ml_kem/shuffle.rs`` ``generate_permutation``
   * - Shuffled forward NTT
     - ``quantica/src/ml_kem/shuffle.rs`` ``ntt_shuffled``
   * - Call sites (keygen, encaps, decaps)
     - ``quantica/src/ml_kem/kem.rs`` — ``keygen_internal_sca``
       (line 78+), ``decaps_internal_sca`` (line 189+).

Implementation notes
--------------------

* The SCA-RNG used to draw the permutation is independent from the
  caller's entropy pool so that the shuffle cannot be replayed
  deterministically.
* Rejection sampling on 16-bit RNG output removes the modular bias
  that would otherwise slightly skew the permutation distribution.
* The shuffled NTT is only used on **secret** polynomials; public
  matrix expansion uses the classical NTT for performance.

DPA / DEMA / CPA — first-order masking of the KPKE secret
=========================================================

Principle
---------

The KPKE secret ``s`` is split into two uniformly random arithmetic
shares ``s = s_0 + s_1 (mod q)``. Every operation consuming ``s``
(NTT, pointwise multiplication, addition) is rewritten to operate on
the shares, with a fresh-randomness refresh inserted whenever the
two shares are combined with a public value. The unmasked ``s``
never exists in memory outside the generating code path.

Because ``s`` never appears as a single value, the Hamming-weight /
Hamming-distance hypothesis at the core of CPA becomes
non-identifiable: correlating the power trace with a guessed chunk
of ``s`` produces no peak, since each share is uniform on its own.

Published basis
---------------

* :cite:`eprint2025_sca_mlkem_pointwise` — the pointwise
  multiplication is the key DPA target; masking each share defeats
  the attack.
* :cite:`cryptojedi2024_mlkem_mldsa_opentitan` — independent
  evaluation on OpenTitan-class hardware confirming the trace-count
  blow-up.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - Masked polynomial type + helpers
     - ``quantica/src/ml_kem/masked.rs``
       (``MaskedPoly``, ``masked_ntt``, ``masked_ntt_inv``,
       ``masked_multiply_public``, ``masked_add_public``,
       ``masked_add`` / ``masked_sub``,
       ``masked_multiply_accumulate``)
   * - Masked KPKE decrypt
     - ``quantica/src/ml_kem/kpke.rs`` ``decrypt_sca``
   * - Masked keygen
     - ``quantica/src/ml_kem/kem.rs`` ``keygen_internal_sca``
   * - Masked decaps (called twice for DFA)
     - ``quantica/src/ml_kem/kem.rs`` ``decaps_internal_sca``

Implementation notes
--------------------

* Masking is **first-order**. Higher-order attacks that combine two
  or more time samples can still recover the secret with
  substantially more traces; tier 4 item ``T4-D`` (not yet
  scheduled) will extend to a 3-share scheme (CC EAL4+-grade).
* The mask-refresh points are positioned at the boundary of each
  Montgomery reduction so that leakage on the mask itself cannot
  combine with leakage on ``s`` through a single linear predictor.

Timing / cache-timing — constant-time primitives everywhere
===========================================================

Principle
---------

Every conditional selection in the ML-KEM control path goes through
``silentops::ct_*``. The public-parameter length check and the
``H(ek)`` integrity check use ``ct_eq`` to ensure timing does not
leak *which* byte differs. The implicit-rejection select in
decapsulation uses ``ct_select`` on the constant-time equality
result.

See :doc:`../primitives` for the primitive reference and for the
dangerous-fallback discussion on x86_64.

Published basis
---------------

* :cite:`kocher1996timing` — classical timing-attack theory.
* :cite:`langley2010ctgrind` — the verification methodology used
  here.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - ``ct_eq`` wrapper (``bool`` return) + ``ct_select`` for
       32-byte payloads
     - ``quantica/src/ml_kem/kem.rs`` lines 427 and 434.
   * - Consumer call sites
     - ``decaps`` (``kem.rs`` line 321) — dk integrity check + fault
       fallback select; ``decaps_internal_sca`` — implicit rejection
       select.

DFA — double computation + CT fault fallback
=============================================

Principle
---------

ML-KEM decapsulation is vulnerable to a classical DFA on the FO
re-encryption step: if an attacker can make the re-encryption return
a value close to the real ciphertext on one specific input, the
implicit-rejection path is bypassed and the KEM acts as a
decryption oracle :cite:`boneh1997dfa`.

The countermeasure runs decapsulation twice, compares the two
shared secrets with ``ct_eq``, and if they differ (a fault occurred
in exactly one run) returns a third value ``k_fault`` derived from
``z`` and a domain separator via SHA3. ``k_fault`` is:

* deterministic for a given ``(dk, c)`` so that a repeated faulted
  call returns the same value (prevents oracle-by-repetition);
* distinct from both the legitimate FO output and the
  implicit-rejection output (so the attacker cannot distinguish
  "fault detected" from either correct branch);
* selected **branch-free** via ``ct_select`` so no timing
  distinguisher exists between "fault" and "no fault".

This last property was specifically added on 2026-04-20 after a
ctgrind run flagged the previous ``if !results_match { return
k_fault }`` branch as a secret-dependent control flow.

Published basis
---------------

* :cite:`boneh1997dfa` — the classical DFA framework.
* :cite:`nist2023_sca_saarinen` — ML-KEM-specific fault analysis
  catalogue, used to size the countermeasure.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - Double decapsulation + branch-free fault fallback
     - ``quantica/src/ml_kem/kem.rs`` ``decaps`` (FIPS 203 level API),
       especially the block after the two ``decaps_internal_sca``
       calls.
   * - ``k_fault`` derivation
     - Same file, inside ``decaps``. ``fault_input = z ‖ 0xFF``
       absorbed by ``sha3::h``.
   * - Fault-detect branch-free select
     - ``quantica/src/ml_kem/kem.rs`` ``ct_select`` (line 434)
       invoked from ``decaps``.

DFA on ``dk`` — ``H(ek)`` integrity check
=========================================

Principle
---------

The decapsulation key ``dk`` contains an inline copy of ``ek`` and
of ``H(ek)`` (FIPS 203 §7.3, layout
``dk_pke ‖ ek ‖ H(ek) ‖ z``). A fault that alters ``dk`` in memory
— for example a hot-carrier induced bit flip in flash — would
undermine the FO security argument because the attacker could coerce
decapsulation into using a crafted ``dk_pke``.

``quantica`` recomputes ``H(ek_in_dk)`` on every ``decaps`` call and
compares it with the stored ``H(ek)`` using ``ct_eq``. A mismatch
aborts the decapsulation with the ``InvalidDecapsulationKey`` error
without running ``decaps_internal``.

Code pointers
-------------

.. list-table::
   :header-rows: 1
   :widths: 50 50

   * - Item
     - Location
   * - ``H(ek_in_dk)`` recomputation + ``ct_eq`` check
     - ``quantica/src/ml_kem/kem.rs`` ``decaps`` (lines 336 – 344)
       and ``decaps_single`` (lines 406 – 411).

Planned hardening
==========================

Two ML-KEM items scheduled for closure in a later release:

* **T4-E** — harden the FO comparison against the template
  attack described in :cite:`eprint2024_template_fo_comparison`.
  Approach: replace the direct byte-compare with an arithmetic
  folding scheme that reduces the number of addressable bytes per
  comparison step.
* **T4-F** — mask the twiddle factors used by the shuffled NTT
  (:cite:`arxiv2024_mlkem_shuffling_hw`), adding an additional
  DPA defence layer on top of the current coefficient masking.