AES — countermeasures

Spec:: FIPS 197 [NationalIoSaTechnology01]
Crate path:: arcana::cipher::aes + arcana::cipher::modes + arcana::cipher::ccm + arcana::cipher::xts
Cargo feature:: none — AES is unconditionally compiled.

AES is the single largest open SCA gap on the arcana side. The crate currently ships a textbook table-based AES (T-tables / S-box LUTs), which is a known cache-timing and SPA target. Closing this gap is item T1-A and is on the evaluation critical path.

This chapter lists each threat applicable to AES, the state-of-the-art mitigations from the literature, and the planned arcana implementation route.

Coverage matrix 

AES countermeasure / threat matrix
Threat	Status	Countermeasure(s)
SPA / SEMA on key schedule + round function	vulnerable	Plan `T1-A`: replace table-based with fixsliced bitslice ([AP21]).
Cache-timing on shared L1 / L2	vulnerable	Same plan `T1-A`. AES-NI / VAES backend is item `T5` (host-only, not on the evaluation critical path).
DPA / CPA on round-1 SubBytes	vulnerable	Plan `T2-G` (post-T1-A): first-order Boolean masking on top of fixsliced AES, leveraging the same masking schemes used in quantica’s ML-KEM/ML-DSA layer.
Template attacks (esp. ML-DPA)	vulnerable	Same plan `T2-G`. The ANSSI protected AES on ARM was broken end-to-end by deep-learning multi-task DPA in [MS23]; arcana will need first-order masking + shuffling at minimum to resist an evaluation-class lab.
DFA on last AES round	vulnerable	Plan `T4-AES-A` (deferred): redundancy + infective countermeasure ([BG15]).
GMAC GF(2^128) multiplier SCA	vulnerable	Plan `T2-H`: replace the table-driven GHASH multiplier with a CT carry-less multiply (or PCLMULQDQ / PMULL on hosts; software fallback bitsliced).

SPA / cache-timing — Fixsliced AES (`T1-A`)

Principle of the attack 

AES table-based implementations leak through cache-line access patterns:

The S-box is a 256-byte LUT that fits in 4 cache lines (64-byte lines). The first round of AES indexes 16 bytes of the input XOR-ed with the round key; observing which cache lines are accessed reveals the high bits of each byte ^ K[i].
Combined T-table implementations (which fold ShiftRows and MixColumns into 4 KiB of pre-computed tables) leak an even larger fraction of the round-1 state.

Original references: [Ber05], [OST06]. Modern variants exploit Flush+Reload, Prime+Probe, and shared-LLC contention against co-resident attackers.

Countermeasure 

Fixsliced bitslice AES (Adomnicai-Peyrin TCHES 2021/1, [AP21]) is the current SOTA for constant-time AES on Cortex-M and RISC-V:

Bit-slices 8 blocks of AES at once into 8 32-bit registers (one bit position per register). The S-box becomes a sequence of bitwise operations on registers — no memory loads, no branches.
Unlike classical bitslicing, fixslicing keeps each bit at a fixed register position across rounds, eliminating the heavy inter-round shuffling that earlier bitsliced AES paid for ShiftRows.
Reported performance: 80 cycles/byte on Cortex-M, 91 cycles/byte on RISC-V (E31), 21 % / 26 % faster than the prior bitsliced records on those platforms.
RAM footprint: 4 × less than classical bitslice (round keys are smaller since the bit positions are fixed).

Reference implementation: aadomn/aes on GitHub, MIT-licensed.

Implementation route in arcana 

Port the public-domain fixsliced AES from aadomn/aes to arcana::cipher::aes_bitsliced as a separate module. Pure Rust, no external crates (compatible with the workspace’s zero-deps rule).
Behind a feature flag aes-fixsliced (off by default to keep the diff reviewable; promotion to default after KAT validation).
Validate against the full FIPS 197 + NIST CAVP AES KAT corpus already in arcana — bit-identical output to the table-based variant.
Run dudect (T3-B) on a Cortex-M target and confirm |t| < 4.5 for fixed-vs-random key inputs.
Once stable, switch the default Aes128 / Aes192 / Aes256 types to dispatch to the fixsliced backend on targets where 32-bit registers are present (Cortex-M3 and up, RISC-V RV32 and up); keep the table-based variant only as a fallback for Cortex-M0 (which has fewer bit-manipulation instructions and is bandwidth-bound).

GMAC GF(2^128) multiplier (`T2-H`)

The current arcana GHASH (cipher::modes::gcm::gf128_mul) must be audited and probably rewritten. A naive shift-and-XOR multiply over GF(2^128) leaks via the conditional XOR; the standard fix is a constant-time carry-less software multiply (clmul emulation). On hosts with PCLMULQDQ (x86_64) or PMULL (aarch64) a hardware backend is the right answer; on embedded targets the bitsliced approach of [KS09] is the reference.

DPA and template attacks — masked AES (`T2-G`)

Once T1-A lands and the round function operates on bitsliced state, the DPA target shifts: there is no per-byte SubBytes intermediate to model. However the bitsliced state is still secret-dependent, so first-order DPA on the loaded round-key state remains feasible.

The intended countermeasure is first-order Boolean masking: each bit of the bitsliced state is split into two shares s = s0 ⊕ s1 with s0 ← rng(); the round function operates on each share independently and the linear layer (ShiftRows, MixColumns) commutes with XOR. The S-box is the only non-linear layer; the standard answer is the masked AND gate of [Tri03] (or higher-order TI masking [BGN+14] for a stronger threat model).

Implementation hooks:

The masked AES will live behind the same sca-protected feature flag as quantica’s masking layer (already present in the workspace), keeping the Cargo features story consistent.
Cost expectation: ~3 – 5 × the unmasked fixsliced AES per the literature.
Validation: dudect on Cortex-M target + KAT regression.

Outside the evaluation scope: AES-NI / VAES backend (`T5`)

For host (x86_64 / aarch64) deployments arcana should eventually expose an AES-NI / VAES backend. This is not on the evaluation critical path: the target evaluation runs on embedded silicon where AES-NI does not exist. It is purely a server-deployment performance item and is tracked separately so it does not delay the evaluation deliverable.

Code path summary 

Path	Today (2026-04-21)	Target (post `T1-A` + `T2-G`)
`cipher::aes::Aes128::encrypt_block`	Table-based S-box	Fixsliced bitslice (8 blocks parallel)
`cipher::aes::Aes128` (masked variant)	n/a	First-order masked fixslice, behind `sca-protected`
`cipher::modes::gcm::gf128_mul`	Audit pending	CT carry-less multiply or HW backend
`cipher::ccm::Ccm` (CCM uses CBC-MAC)	Inherits AES table leak	Inherits fixsliced AES
`cipher::xts::AesXts` (XTS for storage)	Inherits AES table leak	Inherits fixsliced AES