AES — countermeasures
- Spec:
FIPS 197 [NationalIoSaTechnology01]
- Crate path:
arcana::cipher::aes+arcana::cipher::modes+arcana::cipher::ccm+arcana::cipher::xts- Cargo feature:
none — AES is unconditionally compiled.
AES is the single largest open SCA gap on the arcana side. The
crate currently ships a textbook table-based AES (T-tables / S-box
LUTs), which is a known cache-timing and SPA target. Closing this
gap is item T1-A and is on the evaluation critical path.
This chapter lists each threat applicable to AES, the state-of-the-art mitigations from the literature, and the planned arcana implementation route.
Coverage matrix
Threat |
Status |
Countermeasure(s) |
|---|---|---|
SPA / SEMA on key schedule + round function |
vulnerable |
Plan |
Cache-timing on shared L1 / L2 |
vulnerable |
Same plan |
DPA / CPA on round-1 SubBytes |
vulnerable |
Plan |
Template attacks (esp. ML-DPA) |
vulnerable |
Same plan |
DFA on last AES round |
vulnerable |
Plan |
GMAC GF(2^128) multiplier SCA |
vulnerable |
Plan |
SPA / cache-timing — Fixsliced AES (T1-A)
Principle of the attack
AES table-based implementations leak through cache-line access patterns:
The S-box is a 256-byte LUT that fits in 4 cache lines (64-byte lines). The first round of AES indexes 16 bytes of the input XOR-ed with the round key; observing which cache lines are accessed reveals the high bits of each
byte ^ K[i].Combined T-table implementations (which fold ShiftRows and MixColumns into 4 KiB of pre-computed tables) leak an even larger fraction of the round-1 state.
Original references: [Ber05], [OST06]. Modern variants exploit Flush+Reload, Prime+Probe, and shared-LLC contention against co-resident attackers.
Countermeasure
Fixsliced bitslice AES (Adomnicai-Peyrin TCHES 2021/1, [AP21]) is the current SOTA for constant-time AES on Cortex-M and RISC-V:
Bit-slices 8 blocks of AES at once into 8 32-bit registers (one bit position per register). The S-box becomes a sequence of bitwise operations on registers — no memory loads, no branches.
Unlike classical bitslicing, fixslicing keeps each bit at a fixed register position across rounds, eliminating the heavy inter-round shuffling that earlier bitsliced AES paid for ShiftRows.
Reported performance: 80 cycles/byte on Cortex-M, 91 cycles/byte on RISC-V (E31), 21 % / 26 % faster than the prior bitsliced records on those platforms.
RAM footprint: 4 × less than classical bitslice (round keys are smaller since the bit positions are fixed).
Reference implementation: aadomn/aes on GitHub, MIT-licensed.
Implementation route in arcana
Port the public-domain fixsliced AES from
aadomn/aestoarcana::cipher::aes_bitslicedas a separate module. Pure Rust, no external crates (compatible with the workspace’s zero-deps rule).Behind a feature flag
aes-fixsliced(off by default to keep the diff reviewable; promotion to default after KAT validation).Validate against the full FIPS 197 + NIST CAVP AES KAT corpus already in arcana — bit-identical output to the table-based variant.
Run dudect (
T3-B) on a Cortex-M target and confirm|t| < 4.5for fixed-vs-random key inputs.Once stable, switch the default
Aes128/Aes192/Aes256types to dispatch to the fixsliced backend on targets where 32-bit registers are present (Cortex-M3 and up, RISC-V RV32 and up); keep the table-based variant only as a fallback for Cortex-M0 (which has fewer bit-manipulation instructions and is bandwidth-bound).
GMAC GF(2^128) multiplier (T2-H)
The current arcana GHASH (cipher::modes::gcm::gf128_mul)
must be audited and probably rewritten. A naive shift-and-XOR
multiply over GF(2^128) leaks via the conditional XOR; the standard
fix is a constant-time carry-less software multiply (clmul
emulation). On hosts with PCLMULQDQ (x86_64) or PMULL (aarch64) a
hardware backend is the right answer; on embedded targets the
bitsliced approach of [KS09] is the
reference.
DPA and template attacks — masked AES (T2-G)
Once T1-A lands and the round function operates on bitsliced
state, the DPA target shifts: there is no per-byte SubBytes
intermediate to model. However the bitsliced state is still
secret-dependent, so first-order DPA on the loaded round-key state
remains feasible.
The intended countermeasure is first-order Boolean masking:
each bit of the bitsliced state is split into two shares
s = s0 ⊕ s1 with s0 ← rng(); the round function operates
on each share independently and the linear layer (ShiftRows,
MixColumns) commutes with XOR. The S-box is the only non-linear
layer; the standard answer is the masked AND gate of
[Tri03] (or higher-order TI masking
[BGN+14] for a stronger threat model).
Implementation hooks:
The masked AES will live behind the same
sca-protectedfeature flag as quantica’s masking layer (already present in the workspace), keeping the Cargo features story consistent.Cost expectation: ~3 – 5 × the unmasked fixsliced AES per the literature.
Validation: dudect on Cortex-M target + KAT regression.
Outside the evaluation scope: AES-NI / VAES backend (T5)
For host (x86_64 / aarch64) deployments arcana should eventually expose an AES-NI / VAES backend. This is not on the evaluation critical path: the target evaluation runs on embedded silicon where AES-NI does not exist. It is purely a server-deployment performance item and is tracked separately so it does not delay the evaluation deliverable.
Code path summary
Path |
Today (2026-04-21) |
Target (post |
|---|---|---|
|
Table-based S-box |
Fixsliced bitslice (8 blocks parallel) |
|
n/a |
First-order masked fixslice, behind |
|
Audit pending |
CT carry-less multiply or HW backend |
|
Inherits AES table leak |
Inherits fixsliced AES |
|
Inherits AES table leak |
Inherits fixsliced AES |