GMAC — countermeasures

Spec:: RFC 2104 [KBC97], FIPS 198-1 (HMAC), NIST SP 800-38B [NationalIoSaTechnology05] (CMAC), NIST SP 800-185 (KMAC), NIST SP 800-38D (GMAC)
Crate path:: arcana::mac::ctx (streaming Mac wrapper), arcana::cipher::poly1305 (Poly1305 — function- oriented, not in Mac ctx by design)
Cargo feature:: none — all four families compiled unconditionally.

The MAC family lives at the intersection of two SCA literatures: the symmetric SCA literature (CDPA on HMAC-SHA-2, [BDT+23]; classical CPA on AES-CMAC) and the bigint SCA literature (none directly applicable — HMAC/CMAC/KMAC/GMAC do not require modular arithmetic).

The most important recent finding is [BDT+23], TCHES 2023 Issue 3: any implementation of HMAC-SHA-2, even pure parallel hardware, leaks the secret key in 30 K – 275 K traces under Carry-based Differential Power Analysis (CDPA). This is a strict tightening over the prior literature (e.g. [BBD+13]) which had hoped that DPA on HMAC-SHA-2 was hard.

Coverage matrix 

MAC countermeasure / threat matrix
Threat	Status	Countermeasure(s)
Software / cache-timing on tag verify	implemented	`mac::ctx::Mac::verify` uses `silentops::ct_eq`; returns a single bool independent of which byte differed.
DPA / CDPA on HMAC-SHA-2 inner / outer state [BDT+23]	vulnerable	Plan `T2-D`: first-order Boolean masking of the SHA-2 compression function for the inner / outer keyed states. Also covers the Ed25519 SHA-512 path (EdDSA / Ed25519 — countermeasures).
DPA on AES-CMAC subkey derivation	vulnerable	Mitigated by `T2-G` (masked AES) — once AES is masked, CMAC inherits.
DPA on GMAC GF(2^128) multiplier	vulnerable	Plan `T2-H`: CT carry-less multiplier for GHASH; on hosts use PCLMULQDQ / PMULL.
Length-extension / hash-misuse	n/a — by design	HMAC, CMAC, KMAC, GMAC are all designed to resist length-extension on their underlying primitive.

Carry-based DPA on HMAC-SHA-2 (`T2-D`)

Why CDPA is special 

Classical DPA targets a single non-linear gate (e.g. an AES S-box) where the leakage model is “Hamming weight of the gate output” or “Hamming distance between two register states”. [BDT+23] introduced the carry of an addition as a leakage model: the bit that propagates between adjacent bit positions of an arithmetic addition has a noticeable power signature on most hardware (especially CPUs without explicit carry-handling tricks).

SHA-2’s compression function is dominated by 32-bit / 64-bit additions:

\[\begin{split}T_1 = h + \Sigma_1(e) + \mathrm{Ch}(e, f, g) + K_t + W_t \\ T_2 = \Sigma_0(a) + \mathrm{Maj}(a, b, c)\end{split}\]

The carry chain inside each + is a 32-input linear function of the input bits, which CDPA models. With a carry-leakage model, the attacker recovers the inner-state words bit-by-bit.

Result: HMAC-SHA-2 (which feeds the secret key into the inner H((K ⊕ ipad) ‖ M) state) leaks the key in 30 K – 275 K traces, even in pure parallel hardware where the bytes are processed simultaneously. Software implementations leak even more easily because the additions are explicit instructions on a sequential pipeline.

Implication for arcana 

The SHA-256 / SHA-512 compression functions in arcana::hash::sha256 / sha512 are textbook reference implementations. They are CT (no secret-dependent branches), but they are DPA-vulnerable to CDPA and to the more general [BBD+13] style HW-leakage attacks.

For deployments where the threat model includes a level-2 attacker with EM / power probes, the HMAC-SHA-2 keys in arcana must be assumed extractable. Any lab-class evaluation falls within this threat model.

Countermeasure 

The standard answer is first-order Boolean masking of the SHA-2 compression function:

Each 32-bit (resp. 64-bit) state word w is split into two shares w0 ⊕ w1 = w with w0 ← rng().
The linear operations of SHA-2 (XOR, rotations, shifts) commute with XOR, so they are applied to each share independently.
The non-linear operations are:
- Ch(e, f, g) = (e ∧ f) ⊕ (¬e ∧ g) — a masked AND, standard technique (Trichina mask, [Tri03]).
- Maj(a, b, c) = (a ∧ b) ⊕ (a ∧ c) ⊕ (b ∧ c) — three masked ANDs.
- The 32-bit additions T_1, T_2 — the harder part. A Boolean-shared addition uses the Goubin transform [Gou01] to switch from Boolean to arithmetic shares, perform the addition, and switch back.

Implementation route in arcana 

The masked SHA-2 lives behind the same sca-protected feature flag used by quantica’s masking layer (already wired in the workspace Cargo.toml).

New module arcana::hash::sha2_masked exposing MaskedSha256 and MaskedSha512 types.
Internally each state word is a 2-share MaskedU32 / MaskedU64; operations are constant-time on the shares.
mac::ctx::Mac::sign / Mac::verify route through the masked variants when the feature is on.
Performance expectation: ~3-5× the unmasked SHA-2 per literature.
KAT regression: outputs are bit-identical to the unmasked variant (the masking is mathematically transparent).

Cost vs. evaluation benefit 

For the target evaluation the attacker is permitted observational SCA; without masking, HMAC-SHA-2 fails at this level. T2-D is therefore on the evaluation critical path even though it is labelled “Tier 2” (it sits below T1 because T1 has the arguably-larger Bellcore RSA gap, and below AES — countermeasures’s T1-A because every other primitive depends on AES being CT-safe first).

Dependence on Ed25519 

The same SHA-512 primitive is used in Ed25519 to derive the nonce r = H(prefix ‖ M) mod ℓ and the challenge k = H(R ‖ A ‖ M) mod ℓ. T2-D (masking SHA-512 for HMAC) transparently extends to Ed25519 once the masked SHA-512 is plumbed through ed25519_sign. No separate item.

CMAC / KMAC / GMAC 

CMAC (AES-based)

CMAC inherits its SCA properties from the underlying AES. Once T1-A (fixsliced AES) lands, CMAC’s first-round leak is gone; once T2-G (masked AES) lands, CMAC inherits the DPA defence. No CMAC-specific countermeasure is needed beyond the AES-side hardening.

The CMAC subkey derivation (computing L = AES_K(0), K1 = (2 · L) mod x^128 + r_128, K2 = 2 · K1) operates on public-domain values once L is computed, and the doubling in GF(2^128) is the same CT carry-less multiplier as GHASH (T2-H).

KMAC (Keccak-based)

KMAC128 / KMAC256 build on cSHAKE, which builds on Keccak-f[1600]. The Keccak permutation is structurally CT (no S-box LUT, no secret-dependent branches in the round function). DPA on Keccak is harder than on SHA-2 — the only addition is the ι step’s XOR with a round constant, which carries no key information. The non-linear χ step is a 5-bit AND-XOR pattern that masking papers ([BDH+17]) cover but which has not been an evaluation-flagged target.

For arcana: KMAC ships as-is for now; revisit only if an evaluation-level Keccak attack appears.

GMAC (GHASH-based)

The GHASH multiplier H = AES_K(0^128) and per-block X_i = (X_{i-1} ⊕ block_i) · H over GF(2^128) is the SCA target. The mitigation is item T2-H (CT carry-less multiplier), also flagged in AES — countermeasures.

Code path summary 

Path	Today (2026-04-21)	Target (post T2-D + T2-G + T2-H)
`mac::ctx::Mac::verify`	`ct_eq` tag compare	Unchanged
HMAC-SHA-256/384/512 inner state	Unmasked	Masked (`sca-protected` feature)
HMAC-SHA-3 inner state	Unmasked, Keccak-CT-by-structure	Unchanged for now
CMAC subkey derivation	Inherits AES leak	Inherits fixsliced + masked AES
KMAC128 / KMAC256	Keccak-CT-by-structure	Unchanged for now
GMAC GHASH multiplier	Audit pending; likely table-driven	CT carry-less multiplier (T2-H)