###################################################################
Threat model
###################################################################

This chapter enumerates the side-channel threats that ``quantica`` is
designed to resist, describes how each attack works in principle, and
gives an order-of-magnitude estimate of the attacker effort — in
elapsed time and equipment cost — needed to mount it against an
**unprotected** implementation running on a typical embedded target
(ARM Cortex-M, RISC-V MCU, or entry-level secure element).

The goal of the cost estimates is **decision support**, not a precise
evaluation: they let a reader judge which countermeasures are
load-bearing for a given deployment. A lab that is willing to invest
more in equipment or time can always overcome defences tuned to a
weaker profile; the numbers below correspond to the published
state-of-the-art as of early 2026.

Attacker model
==============

The overall attacker model has three axes:

**Access to the device.**
    * level 1 — *Black-box*: the attacker queries the API only.
    * level 2 — *Observational*: adds passive physical measurement
      (power, electromagnetic emanations, timing via a direct probe).
    * level 3 — *Intrusive*: adds active fault injection (clock /
      voltage glitches, laser, electromagnetic pulses) or chip-level
      inspection (decapping, FIB probing).

**Number of traces / queries.**
    From a handful (SPA, template matching) to millions (DPA on noisy
    traces). Each countermeasure drastically raises the number of
    traces required, often to infeasible values.

**Knowledge of the key schedule.**
    Some attacks assume the attacker can build a profile on an
    identical open device (template attacks); others rely only on
    statistical assumptions about the secret (DPA, CPA).

``quantica`` targets threat levels **1 and 2 at minimum**, with
deliberate countermeasures for level 3 on paths where a fault is
known to enable key recovery (ML-KEM FO comparison, ML-DSA rejection
loop).

Threat: Simple Power Analysis (SPA)
===================================

Principle
---------

The attacker observes a single (or very few) power or EM traces of a
cryptographic operation and reads the secret off the trace directly,
either because the code path depends on the secret (e.g.
``if (key_bit) ...`` compiles to a jump visible in the power profile)
or because each secret-dependent basic block has a distinctive power
signature (e.g. a conditional load of a polynomial coefficient).

Classical publications: :cite:`kocher1996timing`,
:cite:`kocher1999dpa`.

Where it bites post-quantum implementations:

* NTT butterfly groups whose execution *order* depends on the secret
  polynomial — identifiable on a single trace by the power envelope
  of each group :cite:`arxiv2024_mlkem_shuffling_hw`.
* Rejection-sampling loops where a "retry" branch has a clearly
  different power signature than the "accept" branch
  :cite:`eprint2025_rejected_signatures_sca`.
* Loading a secret polynomial coefficient as an index into a table
  (observable via cache-line access pattern).

Cost against an unprotected implementation
------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Axis
     - Against embedded MCU
     - Against secure element
   * - Equipment
     - Entry-level USB oscilloscope (≥ 250 MSa/s) + EM H-field probe
       + preamp: **~1 500 €** total.
     - Chipwhisperer Husky or Lecroy scope + near-field probe + power
       interposer / decap: **10 000 – 30 000 €**.
   * - Traces
     - 1 – 100 (SPA by definition).
     - 1 – 1 000 (the SE's own jitter + noise may need trace averaging).
   * - Elapsed time (skilled operator)
     - **1 – 5 days** from setup to key recovery.
     - **2 – 6 weeks** including chip reverse-engineering.

An unprotected reference implementation of ML-KEM / ML-DSA on an
STM32F4 is the canonical textbook target — every academic survey since
2022 reproduces a total break with a few hours of trace acquisition.

Countermeasures in ``quantica``
-------------------------------

* **Branchless primitives** (``silentops::ct_*``) with an inline-asm
  x86_64 backend that LLVM cannot rewrite into a CMOV-on-pointer
  cache-timing leak (see :doc:`primitives`).
* **NTT butterfly shuffling** for secret polynomials
  (``ml_kem::ntt::ntt_shuffled``, ``ml_dsa::shuffle::ntt_shuffled``).
* **Constant-time rejection loop** in ML-DSA signing
  (``sca-ct-rejection``) that removes the accept/reject branch.
* **Masked sampling** of the ML-DSA masking vector ``y``
  (``sca-masked-y``) so every intermediate share is independent of
  the secret key.

Threat: Differential / Correlation Power Analysis (DPA / CPA)
=============================================================

Principle
---------

The attacker records many (10³ – 10⁶) traces of the operation run
with varied public inputs, forms a hypothesis about an intermediate
value that depends on a small chunk of the secret (typically 8 or 16
bits), predicts its Hamming weight or Hamming distance across
traces, and statistically correlates the prediction with the
measured power at each time sample. The correct secret chunk peaks
significantly above noise; wrong guesses average down.

Classical publications: :cite:`kocher1999dpa`,
:cite:`brier2004cpa`.

In post-quantum settings:

* **ML-KEM**: the pointwise multiplication ``s · c`` in decapsulation
  operates on public ``c`` and secret ``s``; classic DPA target per
  :cite:`eprint2025_sca_mlkem_pointwise`.
* **ML-DSA**: the masking vector ``y`` leaks because ``z = y + c·s1``
  is published, and averaging over many signatures reveals ``s1``
  :cite:`hermelink2025_weakest_link_masked_mldsa`.
* **SLH-DSA**: the PRF outputs seed WOTS+ chains; leaking a single
  byte of a chain element reveals the secret leaf
  :cite:`kannwischer2018_dpa_xmss_sphincs`.

Cost against an unprotected implementation
------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Axis
     - Against embedded MCU
     - Against secure element
   * - Equipment
     - Chipwhisperer-Lite or Husky (~**1 000 €**) + laptop + target
       board (~**100 €**).
     - High-end oscilloscope (~**40 000 €**) + EM probe setup
       + chip carrier / depackaging.
   * - Traces
     - **10 000 – 1 000 000** depending on noise and which
       intermediate is targeted.
     - **≥ 10 000 000** once jitter and shielding are factored in.
   * - Elapsed time (skilled operator)
     - **1 – 4 weeks** of trace acquisition + analysis.
     - **6 – 18 months** end-to-end, often a multi-engineer effort.

Countermeasures in ``quantica``
-------------------------------

* First-order arithmetic masking of ``s1``, ``s2``, ``t0`` in ML-DSA
  (``ml_dsa::masked::*``) and of the K-PKE secret ``s`` in ML-KEM
  (``ml_kem::masked``). Each secret is split into two uniformly
  random shares and every operation is performed on the shares
  without ever materialising the unmasked value.
* Masked ``y``-sampling in ML-DSA so that no share of ``y`` leaks
  before the final unmask at signature-emission time
  (:cite:`coron2024_masked_rejection_dilithium`).
* Fisher–Yates shuffling of NTT butterfly ordering drawn from an
  independent SCA-RNG seeded with the per-signature entropy; this
  desynchronises traces so the CPA "pattern" no longer aligns across
  captures (:cite:`arxiv2024_mlkem_shuffling_hw`).
* **Planned** (tier 4, item ``T4-B``): PRF masking on the SLH-DSA
  signing path — see :doc:`countermeasures/slh_dsa`.

Threat: Template attacks
========================

Principle
---------

The attacker has a clone of the target device and builds a
*template* — a multivariate Gaussian model — of the power/EM
signature of each secret value, then matches a single trace from the
real target against the profile. Template attacks are the strongest
passive side-channel; a single trace can be enough once a good
profile exists. Classical reference: :cite:`chari2002template`.

In post-quantum implementations:

* ML-KEM FO-comparison step leaks enough to recover the message on a
  profiled device :cite:`eprint2024_template_fo_comparison`.
* ML-DSA NTT coefficients leak through template models targeting the
  specific Montgomery-reduction sequence on Cortex-M
  :cite:`arxiv2025_mlkem_mldsa_cortexm0_rp2040`.

Cost against an unprotected implementation
------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Axis
     - Against embedded MCU
     - Against secure element
   * - Equipment
     - Two identical MCUs (clone + target), same tooling as DPA.
     - Two decapsulated SE + high-end scope + precision positioning.
   * - Traces
     - Profiling: 10 000 – 100 000; attack: **1 – 10 traces** suffice.
     - Profiling: 1 000 000; attack: 10 – 1 000 traces.
   * - Elapsed time
     - **2 – 6 weeks** once the clone profile exists.
     - **3 – 9 months**, dominated by the profiling phase.
   * - Prerequisite
     - Access to an open clone, which is usually realistic for
       commercial-off-the-shelf MCUs and harder for certified SE.
     - Usually requires an internal agreement or reverse-engineered
       debug access.

Countermeasures in ``quantica``
-------------------------------

* Shuffling (NTT butterfly order, ``y``-sampling order) destroys the
  inter-trace alignment a template attack depends on.
* Masking forces the profile to model *shared* values that the
  attacker does not know, multiplying the required trace count.
* Constant-time FO comparison in ML-KEM decapsulation
  (``silentops::ct_eq`` + branchless hash-equality check, see
  :doc:`countermeasures/ml_kem`).
* **Planned** (tier 4, item ``T4-E``): hardened FO comparison
  against the :cite:`eprint2024_template_fo_comparison` attack.

Threat: Software timing / cache-timing
======================================

Principle
---------

A software attacker co-resident with the cryptographic process
measures its execution time, the eviction pattern of its cache
lines, or the contention of shared microarchitectural resources,
and correlates these observations with secret inputs. Classical
publication :cite:`kocher1996timing`. Modern variants exploit branch
predictor state, speculative execution, or port contention.

In ``quantica`` the threat splits cleanly by algorithm:

* **ML-KEM**: the FO comparison, the implicit-rejection path, and
  any loop over dk bytes must be strictly constant time.
* **ML-DSA**: the rejection loop's number of iterations is *public*
  per FIPS 204, but the *branches inside* a single iteration must not
  depend on secret material (covered by ``sca-ct-rejection``).
* **SLH-DSA**: deterministic per signature — the main risk is a
  table-indexed load on a secret-derived Fors digit.

Cost against an unprotected implementation
------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Axis
     - Local process attacker
     - Remote network attacker
   * - Equipment
     - None beyond a standard user account on the target host.
     - Depends on the target protocol; often a network capture + a
       latency budget of microseconds.
   * - Queries / observations
     - 10⁴ – 10⁸ depending on the secret entropy and the timing gap
       the attack relies on.
     - 10⁶ – 10⁹ (network jitter dominates).
   * - Elapsed time
     - **Hours to days** for a local attacker.
     - **Weeks to months** for a remote attacker on a bare public
       API.
   * - Extra notes
     - Attackers co-resident with an SGX / TrustZone enclave or a
       VM neighbour can reach µs-level precision on cache-miss
       counts.
     - Remote timing attacks on PQC have been demonstrated in
       laboratory conditions; real deployments must treat the
       library as if a remote timing attacker existed.

Countermeasures in ``quantica``
-------------------------------

* All conditional selections go through ``silentops::ct_*``, compiled
  to inline asm on x86_64 so that LLVM at ``opt-level=2`` cannot
  regenerate a cache-timing leak (see :doc:`primitives`).
* No secret-indexed array access; any "select one of k values" is
  implemented by computing all k values and using ``ct_select``.
* ctgrind (Valgrind memcheck client requests) in continuous
  verification — see :doc:`verification`.

Threat: Differential Fault Analysis (DFA) and SIFA
==================================================

Principle
---------

The attacker perturbs the target while it runs the cryptographic
operation — clock glitch, voltage glitch, laser pulse, electromagnetic
fault injection — so that an instruction is skipped, a value is
corrupted, or a loop count is changed. They compare the faulty output
with an unfaulted one; depending on the algorithm a single useful
fault can leak the whole secret
:cite:`boneh1997dfa`.

A more recent variant, Statistical Ineffective Fault Analysis (SIFA),
statistically correlates the fault injection with the operation's
*success* (not with the faulted value): it does not need the faulted
output, only whether the operation aborted
:cite:`dobraunig2018sifa`.

Relevant targets:

* **ML-KEM decapsulation** — the FO re-encryption provides a
  recompute-and-compare mechanism; skipping it collapses the FO
  security argument.
* **ML-DSA signing** — skipping a norm-check produces a non-rejected
  candidate signature that leaks ``s1``, ``s2`` or ``t0``.
* **SLH-DSA** — the hypertree authentication path: a corrupted
  intermediate node under a known message enables a forgery.

Cost against an unprotected implementation
------------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Axis
     - Embedded MCU
     - Secure element
   * - Equipment
     - Chipwhisperer with voltage/clock-glitch board (**~1 000 €**).
       EMFI setup with pulse generator + XY stage: **10 000 – 30 000 €**.
     - Laser bench with precision positioning: **100 000 – 300 000 €**.
       SE decapsulation often required.
   * - Faults needed
     - A single well-placed fault frequently suffices (DFA on ML-KEM
       FO comparison). SIFA on ML-DSA needs 1 000 – 100 000 trials.
     - Same ballpark but with far more parameter sweeping.
   * - Elapsed time
     - **Days to weeks** for MCU-level targets.
     - **6 – 18 months** for a certified SE.

Countermeasures in ``quantica``
-------------------------------

* **ML-KEM decapsulation** uses **double computation** +
  constant-time comparison of the two shared secrets; a fault affecting
  only one computation causes divergence, which is detected in
  constant time and the caller is served a ``k_fault`` value derived
  from ``z`` that cannot be exploited as a decryption oracle. See
  ``quantica/src/ml_kem/kem.rs`` and :doc:`countermeasures/ml_kem`.
* **dk integrity check** in decapsulation: ``H(ek)`` stored inside
  ``dk`` is recomputed and compared with ``ct_eq``; a mismatched
  value aborts decapsulation deterministically.
* **ML-DSA**: the rejection loop already double-checks norms before
  emission; with ``sca-ct-rejection`` the check is branchless.
* **Planned** (tier 4, item ``S-SCA2``): redundant computation for
  SLH-DSA signing + constant-time comparison to detect faults in
  hypertree authentication paths.

Threat: Electromagnetic side-channels (SEMA / DEMA / CEMA)
==========================================================

Principle
---------

Near-field electromagnetic emanations from a working chip carry the
same information as its power trace, with better spatial resolution.
A positioning stage can aim a probe at a specific cryptographic
peripheral or a specific memory bus to bypass the averaging effect of
a shared power supply. EM attacks follow the same analytical
framework as power attacks (SEMA / DEMA / CEMA mirror SPA / DPA /
CPA).

Cost against an unprotected implementation
------------------------------------------

EM attacks trade a higher equipment bill (near-field probe, XY
stage, extra amplification) for fewer traces at equal security level.
An EM attacker typically needs 10× – 100× fewer traces than a power
attacker at equal distance to the secret, because the signal is less
contaminated by the board's global power return.

Countermeasures in ``quantica``
-------------------------------

Same as for power attacks — at the software level there is no
physical distinction between power and EM side-channels. Hardware
shielding of a deployed product is an integrator responsibility.

Summary table
=============

.. list-table:: Threats vs. ``quantica`` coverage
   :header-rows: 1
   :widths: 25 40 35

   * - Threat
     - Typical entry-level cost (MCU / SE)
     - Coverage in ``quantica``
   * - SPA / SEMA
     - 1 500 € / 10 – 30 k€
     - Branchless primitives, NTT shuffling, CT rejection loop.
   * - DPA / DEMA / CPA / CEMA
     - 1 100 € / ~40 k€
     - First-order masking on secrets (ML-KEM s, ML-DSA s1/s2/t0/y),
       Fisher–Yates NTT shuffling, CT rejection.
   * - Template attacks
     - ~2 500 € / chip-revered SE
     - Shuffling (alignment break) + masking (multiplies profile
       cost); FO-comparison hardening planned (``T4-E``).
   * - Software / remote timing
     - none / network access
     - silentops ``ct_*`` with asm backends + ctgrind continuous
       verification.
   * - DFA / SIFA
     - 1 k€ / up to 300 k€
     - ML-KEM: dk integrity + double-decaps + CT fault fallback.
       ML-DSA: CT rejection. SLH-DSA redundant sign planned
       (``S-SCA2``).

Cost numbers are indicative and quickly dated; see the cited
literature for the full table of published attacks and their
parameters.