################################################################### Threat model ################################################################### This chapter enumerates the side-channel threats that ``quantica`` is designed to resist, describes how each attack works in principle, and gives an order-of-magnitude estimate of the attacker effort — in elapsed time and equipment cost — needed to mount it against an **unprotected** implementation running on a typical embedded target (ARM Cortex-M, RISC-V MCU, or entry-level secure element). The goal of the cost estimates is **decision support**, not a precise evaluation: they let a reader judge which countermeasures are load-bearing for a given deployment. A lab that is willing to invest more in equipment or time can always overcome defences tuned to a weaker profile; the numbers below correspond to the published state-of-the-art as of early 2026. Attacker model ============== The overall attacker model has three axes: **Access to the device.** * level 1 — *Black-box*: the attacker queries the API only. * level 2 — *Observational*: adds passive physical measurement (power, electromagnetic emanations, timing via a direct probe). * level 3 — *Intrusive*: adds active fault injection (clock / voltage glitches, laser, electromagnetic pulses) or chip-level inspection (decapping, FIB probing). **Number of traces / queries.** From a handful (SPA, template matching) to millions (DPA on noisy traces). Each countermeasure drastically raises the number of traces required, often to infeasible values. **Knowledge of the key schedule.** Some attacks assume the attacker can build a profile on an identical open device (template attacks); others rely only on statistical assumptions about the secret (DPA, CPA). ``quantica`` targets threat levels **1 and 2 at minimum**, with deliberate countermeasures for level 3 on paths where a fault is known to enable key recovery (ML-KEM FO comparison, ML-DSA rejection loop). Threat: Simple Power Analysis (SPA) =================================== Principle --------- The attacker observes a single (or very few) power or EM traces of a cryptographic operation and reads the secret off the trace directly, either because the code path depends on the secret (e.g. ``if (key_bit) ...`` compiles to a jump visible in the power profile) or because each secret-dependent basic block has a distinctive power signature (e.g. a conditional load of a polynomial coefficient). Classical publications: :cite:`kocher1996timing`, :cite:`kocher1999dpa`. Where it bites post-quantum implementations: * NTT butterfly groups whose execution *order* depends on the secret polynomial — identifiable on a single trace by the power envelope of each group :cite:`arxiv2024_mlkem_shuffling_hw`. * Rejection-sampling loops where a "retry" branch has a clearly different power signature than the "accept" branch :cite:`eprint2025_rejected_signatures_sca`. * Loading a secret polynomial coefficient as an index into a table (observable via cache-line access pattern). Cost against an unprotected implementation ------------------------------------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Axis - Against embedded MCU - Against secure element * - Equipment - Entry-level USB oscilloscope (≥ 250 MSa/s) + EM H-field probe + preamp: **~1 500 €** total. - Chipwhisperer Husky or Lecroy scope + near-field probe + power interposer / decap: **10 000 – 30 000 €**. * - Traces - 1 – 100 (SPA by definition). - 1 – 1 000 (the SE's own jitter + noise may need trace averaging). * - Elapsed time (skilled operator) - **1 – 5 days** from setup to key recovery. - **2 – 6 weeks** including chip reverse-engineering. An unprotected reference implementation of ML-KEM / ML-DSA on an STM32F4 is the canonical textbook target — every academic survey since 2022 reproduces a total break with a few hours of trace acquisition. Countermeasures in ``quantica`` ------------------------------- * **Branchless primitives** (``silentops::ct_*``) with an inline-asm x86_64 backend that LLVM cannot rewrite into a CMOV-on-pointer cache-timing leak (see :doc:`primitives`). * **NTT butterfly shuffling** for secret polynomials (``ml_kem::ntt::ntt_shuffled``, ``ml_dsa::shuffle::ntt_shuffled``). * **Constant-time rejection loop** in ML-DSA signing (``sca-ct-rejection``) that removes the accept/reject branch. * **Masked sampling** of the ML-DSA masking vector ``y`` (``sca-masked-y``) so every intermediate share is independent of the secret key. Threat: Differential / Correlation Power Analysis (DPA / CPA) ============================================================= Principle --------- The attacker records many (10³ – 10⁶) traces of the operation run with varied public inputs, forms a hypothesis about an intermediate value that depends on a small chunk of the secret (typically 8 or 16 bits), predicts its Hamming weight or Hamming distance across traces, and statistically correlates the prediction with the measured power at each time sample. The correct secret chunk peaks significantly above noise; wrong guesses average down. Classical publications: :cite:`kocher1999dpa`, :cite:`brier2004cpa`. In post-quantum settings: * **ML-KEM**: the pointwise multiplication ``s · c`` in decapsulation operates on public ``c`` and secret ``s``; classic DPA target per :cite:`eprint2025_sca_mlkem_pointwise`. * **ML-DSA**: the masking vector ``y`` leaks because ``z = y + c·s1`` is published, and averaging over many signatures reveals ``s1`` :cite:`hermelink2025_weakest_link_masked_mldsa`. * **SLH-DSA**: the PRF outputs seed WOTS+ chains; leaking a single byte of a chain element reveals the secret leaf :cite:`kannwischer2018_dpa_xmss_sphincs`. Cost against an unprotected implementation ------------------------------------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Axis - Against embedded MCU - Against secure element * - Equipment - Chipwhisperer-Lite or Husky (~**1 000 €**) + laptop + target board (~**100 €**). - High-end oscilloscope (~**40 000 €**) + EM probe setup + chip carrier / depackaging. * - Traces - **10 000 – 1 000 000** depending on noise and which intermediate is targeted. - **≥ 10 000 000** once jitter and shielding are factored in. * - Elapsed time (skilled operator) - **1 – 4 weeks** of trace acquisition + analysis. - **6 – 18 months** end-to-end, often a multi-engineer effort. Countermeasures in ``quantica`` ------------------------------- * First-order arithmetic masking of ``s1``, ``s2``, ``t0`` in ML-DSA (``ml_dsa::masked::*``) and of the K-PKE secret ``s`` in ML-KEM (``ml_kem::masked``). Each secret is split into two uniformly random shares and every operation is performed on the shares without ever materialising the unmasked value. * Masked ``y``-sampling in ML-DSA so that no share of ``y`` leaks before the final unmask at signature-emission time (:cite:`coron2024_masked_rejection_dilithium`). * Fisher–Yates shuffling of NTT butterfly ordering drawn from an independent SCA-RNG seeded with the per-signature entropy; this desynchronises traces so the CPA "pattern" no longer aligns across captures (:cite:`arxiv2024_mlkem_shuffling_hw`). * **Planned** (tier 4, item ``T4-B``): PRF masking on the SLH-DSA signing path — see :doc:`countermeasures/slh_dsa`. Threat: Template attacks ======================== Principle --------- The attacker has a clone of the target device and builds a *template* — a multivariate Gaussian model — of the power/EM signature of each secret value, then matches a single trace from the real target against the profile. Template attacks are the strongest passive side-channel; a single trace can be enough once a good profile exists. Classical reference: :cite:`chari2002template`. In post-quantum implementations: * ML-KEM FO-comparison step leaks enough to recover the message on a profiled device :cite:`eprint2024_template_fo_comparison`. * ML-DSA NTT coefficients leak through template models targeting the specific Montgomery-reduction sequence on Cortex-M :cite:`arxiv2025_mlkem_mldsa_cortexm0_rp2040`. Cost against an unprotected implementation ------------------------------------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Axis - Against embedded MCU - Against secure element * - Equipment - Two identical MCUs (clone + target), same tooling as DPA. - Two decapsulated SE + high-end scope + precision positioning. * - Traces - Profiling: 10 000 – 100 000; attack: **1 – 10 traces** suffice. - Profiling: 1 000 000; attack: 10 – 1 000 traces. * - Elapsed time - **2 – 6 weeks** once the clone profile exists. - **3 – 9 months**, dominated by the profiling phase. * - Prerequisite - Access to an open clone, which is usually realistic for commercial-off-the-shelf MCUs and harder for certified SE. - Usually requires an internal agreement or reverse-engineered debug access. Countermeasures in ``quantica`` ------------------------------- * Shuffling (NTT butterfly order, ``y``-sampling order) destroys the inter-trace alignment a template attack depends on. * Masking forces the profile to model *shared* values that the attacker does not know, multiplying the required trace count. * Constant-time FO comparison in ML-KEM decapsulation (``silentops::ct_eq`` + branchless hash-equality check, see :doc:`countermeasures/ml_kem`). * **Planned** (tier 4, item ``T4-E``): hardened FO comparison against the :cite:`eprint2024_template_fo_comparison` attack. Threat: Software timing / cache-timing ====================================== Principle --------- A software attacker co-resident with the cryptographic process measures its execution time, the eviction pattern of its cache lines, or the contention of shared microarchitectural resources, and correlates these observations with secret inputs. Classical publication :cite:`kocher1996timing`. Modern variants exploit branch predictor state, speculative execution, or port contention. In ``quantica`` the threat splits cleanly by algorithm: * **ML-KEM**: the FO comparison, the implicit-rejection path, and any loop over dk bytes must be strictly constant time. * **ML-DSA**: the rejection loop's number of iterations is *public* per FIPS 204, but the *branches inside* a single iteration must not depend on secret material (covered by ``sca-ct-rejection``). * **SLH-DSA**: deterministic per signature — the main risk is a table-indexed load on a secret-derived Fors digit. Cost against an unprotected implementation ------------------------------------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Axis - Local process attacker - Remote network attacker * - Equipment - None beyond a standard user account on the target host. - Depends on the target protocol; often a network capture + a latency budget of microseconds. * - Queries / observations - 10⁴ – 10⁸ depending on the secret entropy and the timing gap the attack relies on. - 10⁶ – 10⁹ (network jitter dominates). * - Elapsed time - **Hours to days** for a local attacker. - **Weeks to months** for a remote attacker on a bare public API. * - Extra notes - Attackers co-resident with an SGX / TrustZone enclave or a VM neighbour can reach µs-level precision on cache-miss counts. - Remote timing attacks on PQC have been demonstrated in laboratory conditions; real deployments must treat the library as if a remote timing attacker existed. Countermeasures in ``quantica`` ------------------------------- * All conditional selections go through ``silentops::ct_*``, compiled to inline asm on x86_64 so that LLVM at ``opt-level=2`` cannot regenerate a cache-timing leak (see :doc:`primitives`). * No secret-indexed array access; any "select one of k values" is implemented by computing all k values and using ``ct_select``. * ctgrind (Valgrind memcheck client requests) in continuous verification — see :doc:`verification`. Threat: Differential Fault Analysis (DFA) and SIFA ================================================== Principle --------- The attacker perturbs the target while it runs the cryptographic operation — clock glitch, voltage glitch, laser pulse, electromagnetic fault injection — so that an instruction is skipped, a value is corrupted, or a loop count is changed. They compare the faulty output with an unfaulted one; depending on the algorithm a single useful fault can leak the whole secret :cite:`boneh1997dfa`. A more recent variant, Statistical Ineffective Fault Analysis (SIFA), statistically correlates the fault injection with the operation's *success* (not with the faulted value): it does not need the faulted output, only whether the operation aborted :cite:`dobraunig2018sifa`. Relevant targets: * **ML-KEM decapsulation** — the FO re-encryption provides a recompute-and-compare mechanism; skipping it collapses the FO security argument. * **ML-DSA signing** — skipping a norm-check produces a non-rejected candidate signature that leaks ``s1``, ``s2`` or ``t0``. * **SLH-DSA** — the hypertree authentication path: a corrupted intermediate node under a known message enables a forgery. Cost against an unprotected implementation ------------------------------------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Axis - Embedded MCU - Secure element * - Equipment - Chipwhisperer with voltage/clock-glitch board (**~1 000 €**). EMFI setup with pulse generator + XY stage: **10 000 – 30 000 €**. - Laser bench with precision positioning: **100 000 – 300 000 €**. SE decapsulation often required. * - Faults needed - A single well-placed fault frequently suffices (DFA on ML-KEM FO comparison). SIFA on ML-DSA needs 1 000 – 100 000 trials. - Same ballpark but with far more parameter sweeping. * - Elapsed time - **Days to weeks** for MCU-level targets. - **6 – 18 months** for a certified SE. Countermeasures in ``quantica`` ------------------------------- * **ML-KEM decapsulation** uses **double computation** + constant-time comparison of the two shared secrets; a fault affecting only one computation causes divergence, which is detected in constant time and the caller is served a ``k_fault`` value derived from ``z`` that cannot be exploited as a decryption oracle. See ``quantica/src/ml_kem/kem.rs`` and :doc:`countermeasures/ml_kem`. * **dk integrity check** in decapsulation: ``H(ek)`` stored inside ``dk`` is recomputed and compared with ``ct_eq``; a mismatched value aborts decapsulation deterministically. * **ML-DSA**: the rejection loop already double-checks norms before emission; with ``sca-ct-rejection`` the check is branchless. * **Planned** (tier 4, item ``S-SCA2``): redundant computation for SLH-DSA signing + constant-time comparison to detect faults in hypertree authentication paths. Threat: Electromagnetic side-channels (SEMA / DEMA / CEMA) ========================================================== Principle --------- Near-field electromagnetic emanations from a working chip carry the same information as its power trace, with better spatial resolution. A positioning stage can aim a probe at a specific cryptographic peripheral or a specific memory bus to bypass the averaging effect of a shared power supply. EM attacks follow the same analytical framework as power attacks (SEMA / DEMA / CEMA mirror SPA / DPA / CPA). Cost against an unprotected implementation ------------------------------------------ EM attacks trade a higher equipment bill (near-field probe, XY stage, extra amplification) for fewer traces at equal security level. An EM attacker typically needs 10× – 100× fewer traces than a power attacker at equal distance to the secret, because the signal is less contaminated by the board's global power return. Countermeasures in ``quantica`` ------------------------------- Same as for power attacks — at the software level there is no physical distinction between power and EM side-channels. Hardware shielding of a deployed product is an integrator responsibility. Summary table ============= .. list-table:: Threats vs. ``quantica`` coverage :header-rows: 1 :widths: 25 40 35 * - Threat - Typical entry-level cost (MCU / SE) - Coverage in ``quantica`` * - SPA / SEMA - 1 500 € / 10 – 30 k€ - Branchless primitives, NTT shuffling, CT rejection loop. * - DPA / DEMA / CPA / CEMA - 1 100 € / ~40 k€ - First-order masking on secrets (ML-KEM s, ML-DSA s1/s2/t0/y), Fisher–Yates NTT shuffling, CT rejection. * - Template attacks - ~2 500 € / chip-revered SE - Shuffling (alignment break) + masking (multiplies profile cost); FO-comparison hardening planned (``T4-E``). * - Software / remote timing - none / network access - silentops ``ct_*`` with asm backends + ctgrind continuous verification. * - DFA / SIFA - 1 k€ / up to 300 k€ - ML-KEM: dk integrity + double-decaps + CT fault fallback. ML-DSA: CT rejection. SLH-DSA redundant sign planned (``S-SCA2``). Cost numbers are indicative and quickly dated; see the cited literature for the full table of published attacks and their parameters.