###################################################################
X25519 / X448 — countermeasures
###################################################################

:Spec:        RFC 7748 :cite:`rfc7748`
:Crate path:  ``arcana::ecc::x25519`` (Curve25519 Diffie-Hellman),
              ``arcana::ecc::x448`` (Curve448 Diffie-Hellman)
:Cargo feature: none — both compiled unconditionally.

X25519 and X448 are the two ECDH primitives on Montgomery curves
in arcana. They are CT by **construction** (the X coordinate is
the only state, no Y, no special cases for the neutral element)
which is why they are popular in modern protocol designs (TLS 1.3,
Noise, Signal, WireGuard).

That said, *any* concrete Montgomery-ladder implementation can
still leak SCA information through the field operations, the
RNG-derived blinding, and side effects of the SHAKE-style
clamping :cite:`weissbart2021_curve25519_ml_sca`.

.. contents::
   :local:
   :depth: 2

Coverage matrix
===============

.. list-table:: X25519/X448 countermeasure / threat matrix
   :header-rows: 1
   :widths: 25 18 57

   * - Threat
     - Status
     - Countermeasure(s)
   * - Software / cache-timing on Montgomery ladder
     - **partial — audit pending**
     - Item ``T1-G``: audit ``x25519_ladder`` and ``x448_ladder``
       under the same lens as the Weierstrass-side fix.
   * - SPA on Cortex-M0 / RISC-V
     - **vulnerable**
     - Same audit ``T1-G``; deep-learning SCA on Curve25519
       Cortex-M0 implementations was demonstrated in
       :cite:`weissbart2021_curve25519_ml_sca` even against random-
       delay defences.
   * - DPA on field operations
     - **vulnerable**
     - Plan ``T2-A`` (Z-rerandomization) — adapted to Montgomery
       ladder's ``(X : Z)`` projective coordinates.
   * - Template attacks on the per-iteration ``ct_swap``
     - **vulnerable**
     - Plan ``T2-A`` + ``T2-B`` (scalar blinding).
   * - Invalid-curve attack (peer pubkey on twist)
     - **implemented**
     - X25519 and X448 are by design twist-secure (RFC 7748 §6.1),
       so invalid-curve attacks reduce to a small subgroup attack
       which is mitigated by the "all-zero shared secret"
       contributory check (when applicable).
   * - Small-subgroup contributory check
     - **partial**
     - The X25519 / X448 functions return a shared secret of all-
       zero when the peer pubkey is in the small-order subgroup;
       callers should reject. Audit ``T2-K`` to confirm the check
       is in place and CT.

Background — Montgomery ladder for X25519
=========================================

X25519 (RFC 7748) computes ``X(k · P)`` from ``k`` and ``X(P)``
using a constant-time Montgomery ladder over ``(X : Z)``
projective coordinates:

.. code-block:: text

   X1 := X(P)
   X2, Z2 := 1, 0           ; representing the neutral element
   X3, Z3 := X1, 1
   for t in [254..0]:
       k_t := bit t of k
       cswap(k_t, X2, X3)
       cswap(k_t, Z2, Z3)
       (X2, Z2, X3, Z3) := double_and_add(X1, X2, Z2, X3, Z3)
       cswap(k_t, X2, X3)
       cswap(k_t, Z2, Z3)
   return X2 / Z2

The structure is essentially identical to the Weierstrass-side
``scalar_mul_point`` and benefits from the same hardening
techniques.

Audit gaps (``T1-G``)
=====================

The arcana X25519 / X448 implementations were ported from
RFC 7748 reference code with the standard idiom "constant-time
swap implemented as ``mask = -bit; t = mask & (a ^ b); a ^= t;
b ^= t``". The same LLVM regression observed on Weierstrass
``ecc::field`` (mask-pattern → branch recovery) applies here, so
the audit checklist mirrors the Weierstrass one:

1. ``cswap`` must compile branchless under ``opt-level=2``.
   Apply ``core::hint::black_box`` on the mask if the release
   asm shows a recovered branch.
2. ``double_and_add`` body must not branch on field-element
   limbs. The inner field operations
   (``ecc::field`` for Curve25519 / Curve448 primes) are shared
   with the Weierstrass code and already received the
   ``black_box`` treatment in commit 76191c1; confirm the X25519
   path uses the *same* ``field_add`` / ``field_sub`` /
   ``reduce_wide`` and not a separate copy.
3. **Final inversion** ``Z2^{-1} mod p`` uses Fermat
   (``Z2^(p-2) mod p``), which goes through the CT
   ``field_pow``; re-confirm.
4. **Clamping** of the scalar ``k`` (clear bits 0, 1, 2, 255; set
   bit 254 for X25519; analogous for X448) is bitwise; no branch
   risk by construction.

Estimated effort: 1 day audit + 0.5 day fix.

Z-randomization on ``(X : Z)`` (``T2-A``)
=========================================

The Montgomery projective ``(X : Z)`` representation admits the
same ``λ``-rescaling as Jacobian Weierstrass:

.. math::

    (X, Z) \;\sim\; (\lambda X, \lambda Z),
    \qquad \lambda \stackrel{\$}{\leftarrow} \mathbb{F}_p^*

So at the ladder start, draw ``λ`` from the SCA-RNG and replace
``(X1, 1)`` (the input point) by ``(λX1, λ)`` — both ``X3, Z3``
follow because the loop derives them from ``X2, Z2``.

This is **the exact countermeasure that broke
:cite:`weissbart2021_curve25519_ml_sca`'s template attack** on
unprotected Curve25519 implementations; once Z-rerandomization is
in, the per-iteration intermediates randomize across signatures
and the profiled attack does not align.

Cost: 2 field multiplications. Negligible.

Implementation hook: today the X25519 / X448 entry points
(``x25519_derive_public``, ``x25519_ecdh``) are pure functions
without an RNG argument. Adding Z-rerand requires either:

* changing the API to take a ``CryptoRng`` callback (breaking),
  or
* deriving an internal SCA-RNG seed from
  ``H(sk_bytes ‖ peer_pk_bytes ‖ "x25519-z-rerand")`` and using a
  SHAKE-derived stream, à la ECDSA-deterministic ``T2-A``.

The latter preserves the API, the determinism for KAT, and the
zero-RNG-failure-mode property that makes X25519 attractive in the
first place. Recommendation: go with the SHAKE-derived approach.

Scalar blinding (``T2-B``)
==========================

Scalar blinding ``k' = k + r · ℓ`` works the same as for Edwards;
``ℓ = 2^252 + 27742...`` for X25519 (the order of the prime-order
subgroup). 64 random bits is the standard, costing ~25 % per-call
overhead on a 254-bit scalar.

For X25519 this is **layered on top of** the existing clamping;
clamp first, then blind, then ladder. The blinding does not break
the clamping properties (``k' mod 8 = k mod 8``, etc.) since
``ℓ`` is congruent to a known value mod 8.

Reading list
============

* :cite:`weissbart2021_curve25519_ml_sca` — ML-based template SCA
  on Curve25519 Cortex-M0; the canonical "even with random
  delays, you leak" baseline.
* :cite:`bernstein2006_curve25519` — the original Curve25519
  paper, which already gives the CT-by-construction argument.
* :cite:`hutter2015_curve25519_arm` — high-speed Curve25519 on
  ARM Cortex-M0; the reference for embedded performance numbers.
* :cite:`adomnicai2024_curve25519_curve448` — recent unified
  hardware design including Z-randomization and CT timing.

Code path summary
=================

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Path
     - Today (2026-04-21)
     - Target (post T1-G + T2-A + T2-B)
   * - ``ecc::x25519::x25519_ladder``
     - CT structure (RFC 7748 idiom), audit pending
     - Audited CT, Z-rerand, scalar blinding
   * - ``ecc::x25519::x25519_derive_public``
     - Pure function, no RNG
     - Same API; internal SCA-RNG seeded from sk
   * - ``ecc::x25519::x25519_ecdh``
     - Pure function, no RNG
     - Same API; internal SCA-RNG seeded from sk + peer_pk
   * - ``ecc::x448::*``
     - Same as X25519 mutatis mutandis
     - Same plan as X25519
   * - Field arithmetic (CURVE25519_P, CURVE448_P)
     - Reuses ``ecc::field::*`` (already ``black_box``-shielded)
     - Unchanged