krypteia — Cryptography Workspace

Documentation

  • krypteia — Post-Quantum and Classical Cryptography in Rust
    • Goals
    • Status
    • Workspace layout
    • Project structure
    • Building
    • Cargo profiles
    • Test coverage and references
    • Contributing and security
    • License
  • krypteia-quantica — Post-Quantum Cryptography for the krypteia workspace
    • Design rules
    • Algorithms
      • ML-KEM (FIPS 203)
      • ML-DSA (FIPS 204)
      • SLH-DSA (FIPS 205)
    • Cargo features
    • Quick start
      • ML-KEM (FIPS 203) — Key Encapsulation
      • ML-DSA (FIPS 204) — Digital Signature
      • SLH-DSA (FIPS 205) — Stateless Hash-Based Signature
    • Typed key wrappers (Zeroize-on-Drop)
    • Parameter sets / curve families
      • ML-KEM (FIPS 203)
      • ML-DSA (FIPS 204)
      • SLH-DSA (FIPS 205) — SHAKE variants only
    • Design decisions
    • Side-channel countermeasures (summary)
      • Always-on
      • Feature-gated (sca-protected, on by default)
        • Approximate cost (single-threaded, release mode)
      • Timing leakage verification (dudect)
      • Known residual surface
      • Per-algorithm deep dives
        • krypteia — Side-Channel Analysis and Countermeasures
    • Performance
    • Building
      • Desktop / server (default)
      • no_std / bare-metal cross-compile
      • Cargo profiles
    • Test validation
      • NIST ACVP — happy-path conformance
      • Wycheproof — edge cases and negative tests
      • Custom negative / robustness tests
      • Running everything
      • Policy on test suites
    • Examples
      • Rust
      • C FFI
    • Module map
    • Known limitations
      • Side-channel protection
      • Standards conformance
      • Portability
      • Testing
    • Roadmap
      • Tier 1 — Active vulnerabilities (critical path)
      • Tier 2 — Hardening for evaluation
      • Tier 3 — Verification tooling
      • Tier 4 — Deferred / beyond the current evaluation scope
      • Tier 5 — Documentation pass
      • Already shipped (trace-back)
      • Suggested execution order (critical path)
    • References
    • License
  • krypteia-arcana — Classical Cryptography for the krypteia workspace
    • Design rules
    • Algorithms
      • Hash functions
      • Symmetric ciphers and modes
      • Message authentication codes (MACs)
      • RSA
      • Elliptic curve cryptography
      • Edwards / Montgomery curves
    • Cargo features
    • Quick start
      • Hashing (SHA-256)
      • AEAD (AES-128-GCM)
      • X25519 ECDH
      • HMAC-SHA-256 (streaming)
      • AES-256-CBC (Cipher ctx)
    • Typed key wrappers (Zeroize-on-Drop)
    • Parameter sets / curve families
      • NIST P-curves
      • Brainpool
      • secp256k1
      • Edwards / Montgomery
      • RSA key sizes
    • Design decisions
    • Side-channel countermeasures (summary)
      • Always-on
      • Feature-gated
      • Timing leakage verification (dudect)
      • Known residual surface
      • Per-algorithm deep dives
        • arcana — Side-Channel Analysis and Countermeasures
    • Performance
    • Building
      • Desktop / server (default)
      • no_std / bare-metal cross-compile
      • Cargo profiles
    • Test validation
      • NIST CAVP / FIPS / RFC happy-path conformance
      • Wycheproof
      • Custom negative / robustness tests
      • Running everything
      • Policy on test suites
    • Examples
      • Rust
      • C FFI
    • Module map
    • Known limitations
      • Side-channel protection
      • Standards conformance
      • Portability
      • Testing
    • Roadmap
      • Tier 1 — Active vulnerabilities (critical path)
      • Tier 2 — Hardening for evaluation
      • Tier 3 — Verification tooling
      • Tier 4 — Deferred / beyond the current evaluation scope
      • Tier 5 — Documentation pass
      • ECC follow-ups (already shipped)
      • Suggested execution order (critical path)
    • References
    • License
  • krypteia-silentops — side-channel countermeasure toolkit
    • Cargo features
    • Verification status
    • License
  • krypteia-memory — TLSF allocator for the krypteia workspace
    • Cargo features
    • Usage from C (bare-metal)
    • License
  • Rust API reference
    • How the API reference is produced
    • Notes for reviewers

Governance

  • Contributing to Krypteia
    • Position
    • Why this document exists
    • Five principles
      • 1. Domain expertise is the price of admission
      • 2. You own what you submit
      • 3. Validate against ground truth, not vibes
      • 4. Trace your reasoning
      • 5. Be honest about your tools
    • Pre-submission checklist
    • What we will reject without lengthy review
    • What we hold ourselves to
  • Security Maintenance Process — krypteia
    • 1. Mission and target
    • 2. Three pillars — veille, doc, code
    • 3. The shared skill — crypto-research
    • 4. Common directives
      • 4.1 Code
      • 4.2 Documentation
      • 4.3 Veille
      • 4.4 Verification
    • 5. Per-crate ownership
    • 6. Lifecycle of a security item
    • 7. Where to find what
    • 8. Vulnerability reporting
      • 8.1 Reporting channel
      • 8.2 Initial response
      • 8.3 Coordinated disclosure window
      • 8.4 Public advisory
      • 8.5 Out of scope
      • 8.6 Safe harbour
    • 9. License
  • Changelog
    • Unreleased
    • 0.1.0 - 2026-06-11
      • Added — quantica (post-quantum cryptography)
      • Added — arcana (classical cryptography)
      • Added — silentops (side-channel countermeasure toolkit)
      • Added — memory (TLSF allocator)
      • Added — Cross-architecture validation infrastructure (T3-A)
      • Added — Documentation pack and CI
      • Conventions and workspace shape adopted in v0.1
      • Known limitations carried into v0.1
      • Initial public release commit
krypteia — Cryptography Workspace
  • krypteia-quantica — Post-Quantum Cryptography for the krypteia workspace
  • krypteia — Side-Channel Analysis and Countermeasures
  • Hermelink 2025/276 audit pass on ml_dsa::masked
  • View page source

Hermelink 2025/276 audit pass on ml_dsa::masked

Status:

Shipped (audit only — code work-list cross-referenced below).

Scope:

quantica/src/ml_dsa/masked.rs plus the rejection-loop callers in quantica/src/ml_dsa/dsa.rs.

Reference:

[HNP25] (CRYPTO 2025, IACR ePrint 2025/276).

Audit method:

Static walk of every public masked gadget and every unmask() call site, classified against the Hermelink leak taxonomy.

  • Why this audit

  • Scope and out-of-scope

  • Leak-class taxonomy (Hermelink 2025/276)

  • Per-gadget audit matrix

  • Status legend

  • Summary work-list

  • References

Why this audit

[HNP25] does not break ML-DSA or its masked variant; it provides an information-theoretic enumeration of the operations inside a masked Dilithium implementation that leak even when each individual gadget is masked, because the aggregate statistic combines shares unsafely. The dominant leak class is the rejection-loop recombination: shares of y, s1, s2, t0 get unmasked into plaintext aggregates (w, z, cs1, cs2, ct0) that then drive data-dependent non-linear operations (Decompose, MakeHint, norm checks).

The paper is the auditor checklist for any masked-ML-DSA claim. T1-B of the krypteia Tier-1 roadmap walks that checklist over our code and records, gadget by gadget, whether the leak class is closed, acknowledged residual risk, or scheduled for reinforcement. This file is the resulting evidence piece — readable as a stand-alone evaluation surface and traceable down to a file:line for every claim.

Scope and out-of-scope

In scope — the masked-arithmetic surface area of ML-DSA:

  • the masked-poly type and its primitives (MaskedPoly in quantica/src/ml_dsa/masked.rs);

  • the masked algorithmic kernels reused inside sign_internal (masked NTT, masked × public, masked matrix-vector multiplication);

  • every unmask() call site inside the rejection loop of quantica/src/ml_dsa/dsa.rs::sign_internal, and every data-dependent operation consuming the resulting plaintext aggregate.

Out of scope — outside Hermelink’s target:

  • the unmasked NTT (ml_dsa::ntt), the samplers (ml_dsa::sample), and the rejection-orchestration tail (sign_internal after a successful rejection-loop exit), which operate on quantities already public per FIPS-204 or on the final signature bytes;

  • ML-KEM masked code and SLH-DSA: not the paper’s target. The same audit-annex pattern can be reused there if needed in a future tier.

Leak-class taxonomy (Hermelink 2025/276)

We use a five-class shorthand that maps the paper’s section structure to our code:

  • C1 — Recombination of secret shares. Every unmask() call produces a plaintext aggregate. Even when the upstream operations are masked, the resulting plaintext is the natural DPA / SPA target.

  • C2 — Decompose / HighBits / LowBits on unmasked aggregates. Once the plaintext is on the stack, any non-linear bit-extraction with data-dependent control flow becomes a CPA target (the resulting HighBits polynomial directly enters the signature).

  • C3 — Hint compression. MakeHint is a per-coefficient comparison whose count is bounded by ω; the count itself and the per-coefficient outcome leak the shape of w - c·s2 modulo 2·γ₂.

  • C4 — Mask refresh sufficiency. Higher-order DPA aggregates traces across rejection iterations; a missing refresh between iterations collapses the higher-order assumption.

  • C5 — Sampler-side leakage on `y`. ExpandMask is the canonical DPA target on SK.prf; the paper requires that y never materialises in plaintext on the stack between sampling and consumption.

Per-gadget audit matrix

ML-DSA masked gadgets vs Hermelink 2025/276 leak classes

Gadget / call site

File:line

Class

Status

Rationale and follow-up

MaskedPoly::sample_expand_mask

masked.rs:158-218

C5

protected

DPA-safe sampling: y is produced as two arithmetic shares directly from SHAKE256, never reconstructed in plaintext. Verified by masked_expand_mask_matches_unmasked_expand_mask test.

MaskedPoly::mask / unmask (primitive round trip)

masked.rs:229-269

C1 (primitive)

protected

Correctness verified by mask_unmask_roundtrip; the primitive itself is sound. Per-call-site analysis below for the use of unmask() inside the rejection loop.

MaskedPoly::refresh

masked.rs:284-299

C4 (primitive)

protected

Re-randomises both shares without changing the unmasked value; verified by refresh_preserves_unmasked_value. The primitive is sound; the C4 sufficiency question (refresh per iteration) is addressed in the row below.

masked_ntt / masked_ntt_inv

masked.rs:314-323

— (linear)

protected

NTT is linear over the prime field; applying it to each share preserves the additive masking invariant. Verified by masked_ntt_matches_regular_ntt.

masked_pointwise_mul_public

masked.rs:332-337

— (linear × public)

protected

Multiplication by the public challenge polynomial c is linear in the secret share; first-order security preserved. Verified by masked_pointwise_mul_public_matches_unmasked.

masked_mat_vec_mul / masked_mat_vec_mul_lazy

masked.rs:351-395

— (linear × public)

protected

Same linearity argument: A is public, each share is matrix- multiplied independently. The lazy variant recomputes a_hat from rho on the fly for low-memory targets and is currently not covered by a dedicated unit test (covered transitively by end-to-end KAT) — minor follow-up.

Per-iteration mask refresh sufficiency

whole rejection loop

C4

protected

T1-A shipped (dsa.rs head-of-loop refresh block). Every polynomial of s1_hat_m, s2_hat_m, t0_hat_m is re-randomised via MaskedPoly::refresh at the start of every rejection iteration, before any operation on the shares — exactly the Hermelink §4 prescription. Output bytes unchanged (mask cancels in unmask, KAT 9/9 byte-identical). Cost unchanged versus the previous end-of-cs/ct refresh placement (same number of refresh calls per iteration, same ScaRng-byte consumption).

w_m[i].unmask() → w_tmp[i]

dsa.rs:727

C1

residual

w = A·y is intentionally produced as a plaintext aggregate; it is the public input to HighBits whose output is emitted in the signature. T1-A shipped — the upstream y_m / s1_hat_m / s2_hat_m shares are refreshed at the start of every rejection iteration, killing cross-iteration higher-order DPA aggregation. The remaining residual is the plaintext-aggregate floor (the unmask itself); full closure would require a HighBits-on-shares gadget — Tier-3 candidate.

y_m[r].unmask() → y_out[r]

dsa.rs:735

C1

residual

y is consumed in plaintext for the time-domain z = y + c·s1 formation. y_m is resampled fresh every iteration (sample_expand_mask), so cross-iteration aggregation is already neutralised at the source — no T1-A contribution needed here. Remaining residual is the plaintext-aggregate floor (the unmask itself).

cs1[i] = masked_pointwise_mul_public(s1_hat_m[i], c_hat).unmask()

dsa.rs:1000

C1

residual

First-order safe (masked × public is linear, the unmasking happens after the multiply). T1-A shipped — s1_hat_m is refreshed at the head of every rejection iteration, so the multi-iteration higher-order DPA window is closed. Remaining residual is the plaintext-aggregate floor; closure via share-domain multiply-and-accumulate is a Tier-3 candidate.

cs2[i] = masked_pointwise_mul_public(s2_hat_m[i], c_hat).unmask()

dsa.rs:1045

C1

residual

Same as cs1. s2_hat_m covered by the same T1-A per-iteration refresh; s2 shares feed into r0 = LowBits(w - c·s2) whose threshold check leaks outcome-level — Tier-2 candidate (CT norm-on-shares).

ct0[i] = masked_pointwise_mul_public(t0_hat_m[i], c_hat).unmask()

dsa.rs:1128

C1

residual

Same as cs1/cs2. t0_hat_m covered by the same T1-A per-iteration refresh; t0 shares feed into hint generation, whose count and threshold leak outcome-level — Tier-3 candidate (share-domain MakeHint).

decompose::high_bits_vec(&w_tmp, …)

dsa.rs:816

C2

residual

HighBits is on the unmasked w; the output is part of the public signature footprint, so the leak is on y only via the upstream chain. Hardening would require a HighBits-on-shares gadget. Tier-3 candidate (not on the current roadmap).

decompose::low_bits(wbuf[j], γ₂) (per-coef)

dsa.rs:935

C2

residual

Inner-loop low-bits extraction inside Phase 2; data-dependent on unmasked w - cs2. Same Tier-3 candidate as above.

decompose::low_bits_vec(&w_minus_cs2, …)

dsa.rs:1094

C2

residual

Vector variant feeding r0 for the norm check; same class.

decompose::make_hint(mod_q(-tmp[j]), wbuf[j] + tmp[j], γ₂)

dsa.rs:961

C3

residual

Per-coefficient hint generation, observation of the hint bit pattern is the Hermelink §3 leak. Tier-3 candidate (share-domain MakeHint).

decompose::make_hint_vec(&neg_ct0, &w_cs2_ct0, γ₂, k)

dsa.rs:1179

C3

residual

Vector variant that returns (h, num_ones); num_ones > ω drives a rejection branch whose timing is closed by sca-ct-rejection already, but the C3 information leak on the count itself is not. Tier-3 candidate.

check_norm_vec(&z, γ₁ − β, l)

dsa.rs:895, 1100, 1111

C1/C2

partial

The classic infinity-norm check on the unmasked z aggregate. The early-abort timing leak is closed by sca-ct-rejection (every iteration computes every intermediate, a single branch- free decision is taken). The information-theoretic leak (the outcome of the check across many signatures) is the residual C1/C2 contribution. Tier-2 candidate (CT norm-check on shares).

check_norm_vec(&r0, γ₂ − β, k)

dsa.rs:1104, 1112

C1/C2

partial

Same as z norm check. r0 = LowBits(w − c·s2) is data-dependent on both unmasked aggregates; sca-ct-rejection closes the timing leak; the outcome-leak is the residual.

check_norm_vec(&ct0, γ₂, k)

dsa.rs:1146, 1153

C1/C2

partial

Same posture. Used to decide whether ct0 is safe to commit to hint generation.

Status legend

  • protected — the leak class is closed for this gadget given the threat model in Threat model. Either the operation is linear (so it preserves the additive sharing invariant), or it operates on values that are already public per the FIPS-204 specification.

  • residual — the leak class is known and acknowledged. The unmasked aggregate is short-lived (immediately re-masked, refreshed, or zeroized through MaskedPoly::zeroize / silentops::ct_zeroize after use). The practical DPA-cost-to-recover-key remains high but is not eliminated. A follow-up Roadmap item that would close the class is named in the matrix row.

  • partial — a complementary defence is already in place (e.g. sca-ct-rejection removes the early-abort timing leak on norm checks) but the information-theoretic class is not fully closed.

Summary work-list

The audit originally surfaced three categories of follow-up; T1-A has since shipped (see updated rows above). The remaining active candidates are:

  1. ✅ ``T1-A`` — per-iteration mask refresh — shipped. s1_hat_m / s2_hat_m / t0_hat_m refreshed at the head of every rejection iteration via the dsa.rs head-of-loop refresh block (Hermelink §4 prescription matched exactly). Closes the C4 sufficiency gap; reduces every C1 residual on the unmask call sites to the plaintext-aggregate floor.

  2. Tier-2 candidate — CT norm check on shares for check_norm_vec(&z, …), check_norm_vec(&r0, …), check_norm_vec(&ct0, …). Would close the C1/C2 partial rows by moving the norm comparison into the share domain via the silentops CT primitives. Cost: a redesign of check_norm_vec to take MaskedPoly instead of plaintext. Tracked as a future ticket, not yet on the Roadmap.

  3. Tier-3 candidate — share-domain Decompose / MakeHint for the five Decompose / MakeHint call sites listed above. Would close the C2 / C3 rows. Larger redesign (full share-domain HighBits/LowBits/MakeHint kernels). Tracked as a future ticket, not yet on the Roadmap.

The above three categories — one shipped, two future candidates — cover every residual and partial row in the matrix. No row is unaccounted for, and no current protected row degrades under any of the follow-ups.

References

  • [HNP25] — the audit reference itself.

  • [DFM+25] — concealed-ILWE attack on partially-masked Dilithium; motivates the same hardening vector on the masked_mat_vec_mul gadgets.

  • [ZCQ+26] — rejection-loop attack on the unmasked / hedged path; motivates the sca-ct-rejection posture cross-referenced in the partial rows above.

  • ML-DSA — countermeasures — the per-threat coverage matrix and the T1-A roadmap entry the audit cross-references.

Previous Next

© Copyright 2026, cslashm.

Built with Sphinx using a theme provided by Read the Docs.