AES — countermeasures

Spec:

FIPS 197 [NationalIoSaTechnology01]

Crate path:

arcana::cipher::aes + arcana::cipher::modes + arcana::cipher::ccm + arcana::cipher::xts

Cargo feature:

none — AES is unconditionally compiled.

AES is the single largest open SCA gap on the arcana side. The crate currently ships a textbook table-based AES (T-tables / S-box LUTs), which is a known cache-timing and SPA target. Closing this gap is item T1-A and is on the evaluation critical path.

This chapter lists each threat applicable to AES, the state-of-the-art mitigations from the literature, and the planned arcana implementation route.

Coverage matrix

AES countermeasure / threat matrix

Threat

Status

Countermeasure(s)

SPA / SEMA on key schedule + round function

vulnerable

Plan T1-A: replace table-based with fixsliced bitslice ([AP21]).

Cache-timing on shared L1 / L2

vulnerable

Same plan T1-A. AES-NI / VAES backend is item T5 (host-only, not on the evaluation critical path).

DPA / CPA on round-1 SubBytes

vulnerable

Plan T2-G (post-T1-A): first-order Boolean masking on top of fixsliced AES, leveraging the same masking schemes used in quantica’s ML-KEM/ML-DSA layer.

Template attacks (esp. ML-DPA)

vulnerable

Same plan T2-G. The ANSSI protected AES on ARM was broken end-to-end by deep-learning multi-task DPA in [MS23]; arcana will need first-order masking + shuffling at minimum to resist an evaluation-class lab.

DFA on last AES round

vulnerable

Plan T4-AES-A (deferred): redundancy + infective countermeasure ([BG15]).

GMAC GF(2^128) multiplier SCA

vulnerable

Plan T2-H: replace the table-driven GHASH multiplier with a CT carry-less multiply (or PCLMULQDQ / PMULL on hosts; software fallback bitsliced).

SPA / cache-timing — Fixsliced AES (T1-A)

Principle of the attack

AES table-based implementations leak through cache-line access patterns:

  • The S-box is a 256-byte LUT that fits in 4 cache lines (64-byte lines). The first round of AES indexes 16 bytes of the input XOR-ed with the round key; observing which cache lines are accessed reveals the high bits of each byte ^ K[i].

  • Combined T-table implementations (which fold ShiftRows and MixColumns into 4 KiB of pre-computed tables) leak an even larger fraction of the round-1 state.

Original references: [Ber05], [OST06]. Modern variants exploit Flush+Reload, Prime+Probe, and shared-LLC contention against co-resident attackers.

Countermeasure

Fixsliced bitslice AES (Adomnicai-Peyrin TCHES 2021/1, [AP21]) is the current SOTA for constant-time AES on Cortex-M and RISC-V:

  • Bit-slices 8 blocks of AES at once into 8 32-bit registers (one bit position per register). The S-box becomes a sequence of bitwise operations on registers — no memory loads, no branches.

  • Unlike classical bitslicing, fixslicing keeps each bit at a fixed register position across rounds, eliminating the heavy inter-round shuffling that earlier bitsliced AES paid for ShiftRows.

  • Reported performance: 80 cycles/byte on Cortex-M, 91 cycles/byte on RISC-V (E31), 21 % / 26 % faster than the prior bitsliced records on those platforms.

  • RAM footprint: 4 × less than classical bitslice (round keys are smaller since the bit positions are fixed).

Reference implementation: aadomn/aes on GitHub, MIT-licensed.

Implementation route in arcana

  1. Port the public-domain fixsliced AES from aadomn/aes to arcana::cipher::aes_bitsliced as a separate module. Pure Rust, no external crates (compatible with the workspace’s zero-deps rule).

  2. Behind a feature flag aes-fixsliced (off by default to keep the diff reviewable; promotion to default after KAT validation).

  3. Validate against the full FIPS 197 + NIST CAVP AES KAT corpus already in arcana — bit-identical output to the table-based variant.

  4. Run dudect (T3-B) on a Cortex-M target and confirm |t| < 4.5 for fixed-vs-random key inputs.

  5. Once stable, switch the default Aes128 / Aes192 / Aes256 types to dispatch to the fixsliced backend on targets where 32-bit registers are present (Cortex-M3 and up, RISC-V RV32 and up); keep the table-based variant only as a fallback for Cortex-M0 (which has fewer bit-manipulation instructions and is bandwidth-bound).

GMAC GF(2^128) multiplier (T2-H)

The current arcana GHASH (cipher::modes::gcm::gf128_mul) must be audited and probably rewritten. A naive shift-and-XOR multiply over GF(2^128) leaks via the conditional XOR; the standard fix is a constant-time carry-less software multiply (clmul emulation). On hosts with PCLMULQDQ (x86_64) or PMULL (aarch64) a hardware backend is the right answer; on embedded targets the bitsliced approach of [KS09] is the reference.

DPA and template attacks — masked AES (T2-G)

Once T1-A lands and the round function operates on bitsliced state, the DPA target shifts: there is no per-byte SubBytes intermediate to model. However the bitsliced state is still secret-dependent, so first-order DPA on the loaded round-key state remains feasible.

The intended countermeasure is first-order Boolean masking: each bit of the bitsliced state is split into two shares s = s0 s1 with s0 rng(); the round function operates on each share independently and the linear layer (ShiftRows, MixColumns) commutes with XOR. The S-box is the only non-linear layer; the standard answer is the masked AND gate of [Tri03] (or higher-order TI masking [BGN+14] for a stronger threat model).

Implementation hooks:

  • The masked AES will live behind the same sca-protected feature flag as quantica’s masking layer (already present in the workspace), keeping the Cargo features story consistent.

  • Cost expectation: ~3 – 5 × the unmasked fixsliced AES per the literature.

  • Validation: dudect on Cortex-M target + KAT regression.

Outside the evaluation scope: AES-NI / VAES backend (T5)

For host (x86_64 / aarch64) deployments arcana should eventually expose an AES-NI / VAES backend. This is not on the evaluation critical path: the target evaluation runs on embedded silicon where AES-NI does not exist. It is purely a server-deployment performance item and is tracked separately so it does not delay the evaluation deliverable.

Code path summary

Path

Today (2026-04-21)

Target (post T1-A + T2-G)

cipher::aes::Aes128::encrypt_block

Table-based S-box

Fixsliced bitslice (8 blocks parallel)

cipher::aes::Aes128 (masked variant)

n/a

First-order masked fixslice, behind sca-protected

cipher::modes::gcm::gf128_mul

Audit pending

CT carry-less multiply or HW backend

cipher::ccm::Ccm (CCM uses CBC-MAC)

Inherits AES table leak

Inherits fixsliced AES

cipher::xts::AesXts (XTS for storage)

Inherits AES table leak

Inherits fixsliced AES