################################################################### AES — countermeasures ################################################################### :Spec: FIPS 197 :cite:`fips197` :Crate path: ``arcana::cipher::aes`` + ``arcana::cipher::modes`` + ``arcana::cipher::ccm`` + ``arcana::cipher::xts`` :Cargo feature: none — AES is unconditionally compiled. AES is the **single largest open SCA gap** on the arcana side. The crate currently ships a textbook table-based AES (T-tables / S-box LUTs), which is a known cache-timing and SPA target. Closing this gap is item ``T1-A`` and is on the evaluation critical path. This chapter lists each threat applicable to AES, the state-of-the-art mitigations from the literature, and the planned arcana implementation route. .. contents:: :local: :depth: 2 Coverage matrix =============== .. list-table:: AES countermeasure / threat matrix :header-rows: 1 :widths: 25 18 57 * - Threat - Status - Countermeasure(s) * - SPA / SEMA on key schedule + round function - **vulnerable** - Plan ``T1-A``: replace table-based with fixsliced bitslice (:cite:`adomnicai2021_fixslicing_aes`). * - Cache-timing on shared L1 / L2 - **vulnerable** - Same plan ``T1-A``. AES-NI / VAES backend is item ``T5`` (host-only, not on the evaluation critical path). * - DPA / CPA on round-1 SubBytes - **vulnerable** - Plan ``T2-G`` (post-T1-A): first-order Boolean masking on top of fixsliced AES, leveraging the same masking schemes used in quantica's ML-KEM/ML-DSA layer. * - Template attacks (esp. ML-DPA) - **vulnerable** - Same plan ``T2-G``. The ANSSI protected AES on ARM was broken end-to-end by deep-learning multi-task DPA in :cite:`anssi2023_aes_ml_dpa`; arcana will need first-order masking + shuffling at minimum to resist an evaluation-class lab. * - DFA on last AES round - **vulnerable** - Plan ``T4-AES-A`` (deferred): redundancy + infective countermeasure (:cite:`battistello2015fault_aes`). * - GMAC GF(2^128) multiplier SCA - **vulnerable** - Plan ``T2-H``: replace the table-driven GHASH multiplier with a CT carry-less multiply (or PCLMULQDQ / PMULL on hosts; software fallback bitsliced). SPA / cache-timing — Fixsliced AES (``T1-A``) ============================================= Principle of the attack ----------------------- AES table-based implementations leak through cache-line access patterns: * The S-box is a 256-byte LUT that fits in 4 cache lines (64-byte lines). The first round of AES indexes 16 bytes of the input XOR-ed with the round key; observing which cache lines are accessed reveals the high bits of each ``byte ^ K[i]``. * Combined T-table implementations (which fold ShiftRows and MixColumns into 4 KiB of pre-computed tables) leak an even larger fraction of the round-1 state. Original references: :cite:`bernstein2005_aes_cache_timing`, :cite:`osvik2006cache_aes`. Modern variants exploit Flush+Reload, Prime+Probe, and shared-LLC contention against co-resident attackers. Countermeasure -------------- **Fixsliced bitslice AES** (Adomnicai-Peyrin TCHES 2021/1, :cite:`adomnicai2021_fixslicing_aes`) is the current SOTA for constant-time AES on Cortex-M and RISC-V: * Bit-slices 8 blocks of AES at once into 8 32-bit registers (one bit position per register). The S-box becomes a sequence of bitwise operations on registers — no memory loads, no branches. * Unlike classical bitslicing, fixslicing keeps each bit at a fixed register position across rounds, eliminating the heavy inter-round shuffling that earlier bitsliced AES paid for ShiftRows. * Reported performance: **80 cycles/byte on Cortex-M**, **91 cycles/byte on RISC-V** (E31), 21 % / 26 % faster than the prior bitsliced records on those platforms. * RAM footprint: 4 × less than classical bitslice (round keys are smaller since the bit positions are fixed). Reference implementation: `aadomn/aes `_ on GitHub, MIT-licensed. Implementation route in arcana ------------------------------ 1. Port the public-domain fixsliced AES from ``aadomn/aes`` to ``arcana::cipher::aes_bitsliced`` as a separate module. Pure Rust, no external crates (compatible with the workspace's zero-deps rule). 2. Behind a feature flag ``aes-fixsliced`` (off by default to keep the diff reviewable; promotion to default after KAT validation). 3. Validate against the full FIPS 197 + NIST CAVP AES KAT corpus already in arcana — bit-identical output to the table-based variant. 4. Run dudect (``T3-B``) on a Cortex-M target and confirm ``|t| < 4.5`` for fixed-vs-random key inputs. 5. Once stable, switch the default ``Aes128`` / ``Aes192`` / ``Aes256`` types to dispatch to the fixsliced backend on targets where 32-bit registers are present (Cortex-M3 and up, RISC-V RV32 and up); keep the table-based variant only as a fallback for Cortex-M0 (which has fewer bit-manipulation instructions and is bandwidth-bound). GMAC GF(2^128) multiplier (``T2-H``) ==================================== The current arcana GHASH (``cipher::modes::gcm::gf128_mul``) **must** be audited and probably rewritten. A naive shift-and-XOR multiply over GF(2^128) leaks via the conditional XOR; the standard fix is a constant-time carry-less software multiply (``clmul`` emulation). On hosts with PCLMULQDQ (x86_64) or PMULL (aarch64) a hardware backend is the right answer; on embedded targets the bitsliced approach of :cite:`kasper2009aes_gcm_bitsliced` is the reference. DPA and template attacks — masked AES (``T2-G``) ================================================ Once ``T1-A`` lands and the round function operates on bitsliced state, the DPA target shifts: there is no per-byte SubBytes intermediate to model. However the bitsliced state is still secret-dependent, so first-order DPA on the loaded round-key state remains feasible. The intended countermeasure is **first-order Boolean masking**: each bit of the bitsliced state is split into two shares ``s = s0 ⊕ s1`` with ``s0 ← rng()``; the round function operates on each share independently and the linear layer (ShiftRows, MixColumns) commutes with XOR. The S-box is the only non-linear layer; the standard answer is the *masked AND gate* of :cite:`trichina2003masked` (or higher-order TI masking :cite:`bilgin2014threshold_aes` for a stronger threat model). Implementation hooks: * The masked AES will live behind the same ``sca-protected`` feature flag as quantica's masking layer (already present in the workspace), keeping the `Cargo features` story consistent. * Cost expectation: ~3 – 5 × the unmasked fixsliced AES per the literature. * Validation: dudect on Cortex-M target + KAT regression. Outside the evaluation scope: AES-NI / VAES backend (``T5``) ============================================================ For host (x86_64 / aarch64) deployments arcana should eventually expose an AES-NI / VAES backend. This is **not on the evaluation critical path**: the target evaluation runs on embedded silicon where AES-NI does not exist. It is purely a server-deployment performance item and is tracked separately so it does not delay the evaluation deliverable. Code path summary ================= .. list-table:: :header-rows: 1 :widths: 30 35 35 * - Path - Today (2026-04-21) - Target (post ``T1-A`` + ``T2-G``) * - ``cipher::aes::Aes128::encrypt_block`` - Table-based S-box - Fixsliced bitslice (8 blocks parallel) * - ``cipher::aes::Aes128`` (masked variant) - n/a - First-order masked fixslice, behind ``sca-protected`` * - ``cipher::modes::gcm::gf128_mul`` - Audit pending - CT carry-less multiply or HW backend * - ``cipher::ccm::Ccm`` (CCM uses CBC-MAC) - Inherits AES table leak - Inherits fixsliced AES * - ``cipher::xts::AesXts`` (XTS for storage) - Inherits AES table leak - Inherits fixsliced AES