###################################################################
AES — countermeasures
###################################################################

:Spec:        FIPS 197 :cite:`fips197`
:Crate path:  ``arcana::cipher::aes`` + ``arcana::cipher::modes`` +
              ``arcana::cipher::ccm`` + ``arcana::cipher::xts``
:Cargo feature: none — AES is unconditionally compiled.

AES is the **single largest open SCA gap** on the arcana side. The
crate currently ships a textbook table-based AES (T-tables / S-box
LUTs), which is a known cache-timing and SPA target. Closing this
gap is item ``T1-A`` and is on the evaluation critical path.

This chapter lists each threat applicable to AES, the
state-of-the-art mitigations from the literature, and the planned
arcana implementation route.

.. contents::
   :local:
   :depth: 2

Coverage matrix
===============

.. list-table:: AES countermeasure / threat matrix
   :header-rows: 1
   :widths: 25 18 57

   * - Threat
     - Status
     - Countermeasure(s)
   * - SPA / SEMA on key schedule + round function
     - **vulnerable**
     - Plan ``T1-A``: replace table-based with fixsliced bitslice
       (:cite:`adomnicai2021_fixslicing_aes`).
   * - Cache-timing on shared L1 / L2
     - **vulnerable**
     - Same plan ``T1-A``. AES-NI / VAES backend is item ``T5``
       (host-only, not on the evaluation critical path).
   * - DPA / CPA on round-1 SubBytes
     - **vulnerable**
     - Plan ``T2-G`` (post-T1-A): first-order Boolean masking on
       top of fixsliced AES, leveraging the same masking schemes
       used in quantica's ML-KEM/ML-DSA layer.
   * - Template attacks (esp. ML-DPA)
     - **vulnerable**
     - Same plan ``T2-G``. The ANSSI protected AES on ARM was
       broken end-to-end by deep-learning multi-task DPA in
       :cite:`anssi2023_aes_ml_dpa`; arcana will need first-order
       masking + shuffling at minimum to resist an evaluation-class lab.
   * - DFA on last AES round
     - **vulnerable**
     - Plan ``T4-AES-A`` (deferred): redundancy + infective
       countermeasure (:cite:`battistello2015fault_aes`).
   * - GMAC GF(2^128) multiplier SCA
     - **vulnerable**
     - Plan ``T2-H``: replace the table-driven GHASH multiplier
       with a CT carry-less multiply (or PCLMULQDQ / PMULL on
       hosts; software fallback bitsliced).

SPA / cache-timing — Fixsliced AES (``T1-A``)
=============================================

Principle of the attack
-----------------------

AES table-based implementations leak through cache-line access
patterns:

* The S-box is a 256-byte LUT that fits in 4 cache lines (64-byte
  lines). The first round of AES indexes 16 bytes of the input
  XOR-ed with the round key; observing which cache lines are
  accessed reveals the high bits of each ``byte ^ K[i]``.
* Combined T-table implementations (which fold ShiftRows and
  MixColumns into 4 KiB of pre-computed tables) leak an even
  larger fraction of the round-1 state.

Original references: :cite:`bernstein2005_aes_cache_timing`,
:cite:`osvik2006cache_aes`. Modern variants exploit Flush+Reload,
Prime+Probe, and shared-LLC contention against co-resident
attackers.

Countermeasure
--------------

**Fixsliced bitslice AES** (Adomnicai-Peyrin TCHES 2021/1,
:cite:`adomnicai2021_fixslicing_aes`) is the current SOTA for
constant-time AES on Cortex-M and RISC-V:

* Bit-slices 8 blocks of AES at once into 8 32-bit registers
  (one bit position per register). The S-box becomes a sequence
  of bitwise operations on registers — no memory loads, no
  branches.
* Unlike classical bitslicing, fixslicing keeps each bit at a
  fixed register position across rounds, eliminating the heavy
  inter-round shuffling that earlier bitsliced AES paid for
  ShiftRows.
* Reported performance: **80 cycles/byte on Cortex-M**, **91
  cycles/byte on RISC-V** (E31), 21 % / 26 % faster than the
  prior bitsliced records on those platforms.
* RAM footprint: 4 × less than classical bitslice (round keys are
  smaller since the bit positions are fixed).

Reference implementation: `aadomn/aes <https://github.com/aadomn/aes>`_
on GitHub, MIT-licensed.

Implementation route in arcana
------------------------------

1. Port the public-domain fixsliced AES from ``aadomn/aes`` to
   ``arcana::cipher::aes_bitsliced`` as a separate module.
   Pure Rust, no external crates (compatible with the workspace's
   zero-deps rule).
2. Behind a feature flag ``aes-fixsliced`` (off by default to keep
   the diff reviewable; promotion to default after KAT validation).
3. Validate against the full FIPS 197 + NIST CAVP AES KAT corpus
   already in arcana — bit-identical output to the table-based
   variant.
4. Run dudect (``T3-B``) on a Cortex-M target and confirm
   ``|t| < 4.5`` for fixed-vs-random key inputs.
5. Once stable, switch the default ``Aes128`` / ``Aes192`` /
   ``Aes256`` types to dispatch to the fixsliced backend on
   targets where 32-bit registers are present (Cortex-M3 and up,
   RISC-V RV32 and up); keep the table-based variant only as a
   fallback for Cortex-M0 (which has fewer bit-manipulation
   instructions and is bandwidth-bound).

GMAC GF(2^128) multiplier (``T2-H``)
====================================

The current arcana GHASH (``cipher::modes::gcm::gf128_mul``)
**must** be audited and probably rewritten. A naive shift-and-XOR
multiply over GF(2^128) leaks via the conditional XOR; the standard
fix is a constant-time carry-less software multiply (``clmul``
emulation). On hosts with PCLMULQDQ (x86_64) or PMULL (aarch64) a
hardware backend is the right answer; on embedded targets the
bitsliced approach of :cite:`kasper2009aes_gcm_bitsliced` is the
reference.

DPA and template attacks — masked AES (``T2-G``)
================================================

Once ``T1-A`` lands and the round function operates on bitsliced
state, the DPA target shifts: there is no per-byte SubBytes
intermediate to model. However the bitsliced state is still
secret-dependent, so first-order DPA on the loaded round-key state
remains feasible.

The intended countermeasure is **first-order Boolean masking**:
each bit of the bitsliced state is split into two shares
``s = s0 ⊕ s1`` with ``s0 ← rng()``; the round function operates
on each share independently and the linear layer (ShiftRows,
MixColumns) commutes with XOR. The S-box is the only non-linear
layer; the standard answer is the *masked AND gate* of
:cite:`trichina2003masked` (or higher-order TI masking
:cite:`bilgin2014threshold_aes` for a stronger threat model).

Implementation hooks:

* The masked AES will live behind the same ``sca-protected``
  feature flag as quantica's masking layer (already present in the
  workspace), keeping the `Cargo features` story consistent.
* Cost expectation: ~3 – 5 × the unmasked fixsliced AES per the
  literature.
* Validation: dudect on Cortex-M target + KAT regression.

Outside the evaluation scope: AES-NI / VAES backend (``T5``)
============================================================

For host (x86_64 / aarch64) deployments arcana should eventually
expose an AES-NI / VAES backend. This is **not on the evaluation
critical path**: the target evaluation runs on embedded silicon
where AES-NI does not exist. It is purely a server-deployment
performance item and is tracked separately so it does not delay
the evaluation deliverable.

Code path summary
=================

.. list-table::
   :header-rows: 1
   :widths: 30 35 35

   * - Path
     - Today (2026-04-21)
     - Target (post ``T1-A`` + ``T2-G``)
   * - ``cipher::aes::Aes128::encrypt_block``
     - Table-based S-box
     - Fixsliced bitslice (8 blocks parallel)
   * - ``cipher::aes::Aes128`` (masked variant)
     - n/a
     - First-order masked fixslice, behind ``sca-protected``
   * - ``cipher::modes::gcm::gf128_mul``
     - Audit pending
     - CT carry-less multiply or HW backend
   * - ``cipher::ccm::Ccm`` (CCM uses CBC-MAC)
     - Inherits AES table leak
     - Inherits fixsliced AES
   * - ``cipher::xts::AesXts`` (XTS for storage)
     - Inherits AES table leak
     - Inherits fixsliced AES