HMAC / CMAC / KMAC / GMAC — countermeasures

Spec:

RFC 2104 [KBC97], FIPS 198-1 (HMAC), NIST SP 800-38B [NationalIoSaTechnology05] (CMAC), NIST SP 800-185 (KMAC), NIST SP 800-38D (GMAC)

Crate path:

arcana::mac::ctx (streaming Mac wrapper), arcana::cipher::poly1305 (Poly1305 — function- oriented, not in Mac ctx by design)

Cargo feature:

none — all four families compiled unconditionally.

The MAC family lives at the intersection of two SCA literatures: the symmetric SCA literature (CDPA on HMAC-SHA-2, [BDT+23]; classical CPA on AES-CMAC) and the bigint SCA literature (none directly applicable — HMAC/CMAC/KMAC/GMAC do not require modular arithmetic).

The most important recent finding is [BDT+23], TCHES 2023 Issue 3: any implementation of HMAC-SHA-2, even pure parallel hardware, leaks the secret key in 30 K – 275 K traces under Carry-based Differential Power Analysis (CDPA). This is a strict tightening over the prior literature (e.g. [BBD+13]) which had hoped that DPA on HMAC-SHA-2 was hard.

Coverage matrix

MAC countermeasure / threat matrix

Threat

Status

Countermeasure(s)

Software / cache-timing on tag verify

implemented

mac::ctx::Mac::verify uses silentops::ct_eq; returns a single bool independent of which byte differed.

DPA / CDPA on HMAC-SHA-2 inner / outer state [BDT+23]

vulnerable

Plan T2-D: first-order Boolean masking of the SHA-2 compression function for the inner / outer keyed states. Also covers the Ed25519 SHA-512 path (EdDSA / Ed25519 — countermeasures).

DPA on AES-CMAC subkey derivation

vulnerable

Mitigated by T2-G (masked AES) — once AES is masked, CMAC inherits.

DPA on GMAC GF(2^128) multiplier

vulnerable

Plan T2-H: CT carry-less multiplier for GHASH; on hosts use PCLMULQDQ / PMULL.

Length-extension / hash-misuse

n/a — by design

HMAC, CMAC, KMAC, GMAC are all designed to resist length-extension on their underlying primitive.

Carry-based DPA on HMAC-SHA-2 (T2-D)

Why CDPA is special

Classical DPA targets a single non-linear gate (e.g. an AES S-box) where the leakage model is “Hamming weight of the gate output” or “Hamming distance between two register states”. [BDT+23] introduced the carry of an addition as a leakage model: the bit that propagates between adjacent bit positions of an arithmetic addition has a noticeable power signature on most hardware (especially CPUs without explicit carry-handling tricks).

SHA-2’s compression function is dominated by 32-bit / 64-bit additions:

\[\begin{split}T_1 = h + \Sigma_1(e) + \mathrm{Ch}(e, f, g) + K_t + W_t \\ T_2 = \Sigma_0(a) + \mathrm{Maj}(a, b, c)\end{split}\]

The carry chain inside each + is a 32-input linear function of the input bits, which CDPA models. With a carry-leakage model, the attacker recovers the inner-state words bit-by-bit.

Result: HMAC-SHA-2 (which feeds the secret key into the inner H((K ipad) M) state) leaks the key in 30 K – 275 K traces, even in pure parallel hardware where the bytes are processed simultaneously. Software implementations leak even more easily because the additions are explicit instructions on a sequential pipeline.

Implication for arcana

The SHA-256 / SHA-512 compression functions in arcana::hash::sha256 / sha512 are textbook reference implementations. They are CT (no secret-dependent branches), but they are DPA-vulnerable to CDPA and to the more general [BBD+13] style HW-leakage attacks.

For deployments where the threat model includes a level-2 attacker with EM / power probes, the HMAC-SHA-2 keys in arcana must be assumed extractable. Any lab-class evaluation falls within this threat model.

Countermeasure

The standard answer is first-order Boolean masking of the SHA-2 compression function:

  • Each 32-bit (resp. 64-bit) state word w is split into two shares w0 w1 = w with w0 rng().

  • The linear operations of SHA-2 (XOR, rotations, shifts) commute with XOR, so they are applied to each share independently.

  • The non-linear operations are:

    • Ch(e, f, g) = (e f) (¬e g) — a masked AND, standard technique (Trichina mask, [Tri03]).

    • Maj(a, b, c) = (a b) (a c) (b c) — three masked ANDs.

    • The 32-bit additions T_1, T_2 — the harder part. A Boolean-shared addition uses the Goubin transform [Gou01] to switch from Boolean to arithmetic shares, perform the addition, and switch back.

Implementation route in arcana

The masked SHA-2 lives behind the same sca-protected feature flag used by quantica’s masking layer (already wired in the workspace Cargo.toml).

  • New module arcana::hash::sha2_masked exposing MaskedSha256 and MaskedSha512 types.

  • Internally each state word is a 2-share MaskedU32 / MaskedU64; operations are constant-time on the shares.

  • mac::ctx::Mac::sign / Mac::verify route through the masked variants when the feature is on.

  • Performance expectation: ~3-5× the unmasked SHA-2 per literature.

  • KAT regression: outputs are bit-identical to the unmasked variant (the masking is mathematically transparent).

Cost vs. evaluation benefit

For the target evaluation the attacker is permitted observational SCA; without masking, HMAC-SHA-2 fails at this level. T2-D is therefore on the evaluation critical path even though it is labelled “Tier 2” (it sits below T1 because T1 has the arguably-larger Bellcore RSA gap, and below AES — countermeasures’s T1-A because every other primitive depends on AES being CT-safe first).

Dependence on Ed25519

The same SHA-512 primitive is used in Ed25519 to derive the nonce r = H(prefix M) mod and the challenge k = H(R A M) mod . T2-D (masking SHA-512 for HMAC) transparently extends to Ed25519 once the masked SHA-512 is plumbed through ed25519_sign. No separate item.

CMAC / KMAC / GMAC

CMAC (AES-based)

CMAC inherits its SCA properties from the underlying AES. Once T1-A (fixsliced AES) lands, CMAC’s first-round leak is gone; once T2-G (masked AES) lands, CMAC inherits the DPA defence. No CMAC-specific countermeasure is needed beyond the AES-side hardening.

The CMAC subkey derivation (computing L = AES_K(0), K1 = (2 · L) mod x^128 + r_128, K2 = 2 · K1) operates on public-domain values once L is computed, and the doubling in GF(2^128) is the same CT carry-less multiplier as GHASH (T2-H).

KMAC (Keccak-based)

KMAC128 / KMAC256 build on cSHAKE, which builds on Keccak-f[1600]. The Keccak permutation is structurally CT (no S-box LUT, no secret-dependent branches in the round function). DPA on Keccak is harder than on SHA-2 — the only addition is the ι step’s XOR with a round constant, which carries no key information. The non-linear χ step is a 5-bit AND-XOR pattern that masking papers ([BDH+17]) cover but which has not been an evaluation-flagged target.

For arcana: KMAC ships as-is for now; revisit only if an evaluation-level Keccak attack appears.

GMAC (GHASH-based)

The GHASH multiplier H = AES_K(0^128) and per-block X_i = (X_{i-1} block_i) · H over GF(2^128) is the SCA target. The mitigation is item T2-H (CT carry-less multiplier), also flagged in AES — countermeasures.

Code path summary

Path

Today (2026-04-21)

Target (post T2-D + T2-G + T2-H)

mac::ctx::Mac::verify

ct_eq tag compare

Unchanged

HMAC-SHA-256/384/512 inner state

Unmasked

Masked (sca-protected feature)

HMAC-SHA-3 inner state

Unmasked, Keccak-CT-by-structure

Unchanged for now

CMAC subkey derivation

Inherits AES leak

Inherits fixsliced + masked AES

KMAC128 / KMAC256

Keccak-CT-by-structure

Unchanged for now

GMAC GHASH multiplier

Audit pending; likely table-driven

CT carry-less multiplier (T2-H)