krypteia-quantica — Post-Quantum Cryptography for the krypteia workspace

Pure-Rust implementations of the three NIST post-quantum standards, sharing a side-channel countermeasure toolkit (silentops, a companion crate of the same workspace) used by the classical side as well. Specifications (FIPS 203 / 204 / 205 PDFs) are vendored alongside the crate in the repository.

Design rules

The crate inherits the krypteia workspace design rules:

  1. Pure Rust, zero external crates — only core (and alloc); std is optional behind a feature flag.

  2. Embedded-friendly — small RAM footprint, fits secure elements, STM32 (Cortex-M0/M4/M33), RISC-V parts (ESP32-C3, …).

  3. Side-channel hardened against SPA, DPA, DFA, template attacks, timing attacks. CT primitives come from silentops, with architecture-specific assembly backends.

  4. Validated against the official NIST ACVP test vectors.

  5. C FFI-exposable through the quantica_ffi companion crate.

Algorithms

Standard

Algorithm

Type

Status

FIPS 203

ML-KEM (ex-CRYSTALS-Kyber)

Key Encapsulation Mechanism

Implemented, ACVP + Wycheproof validated

FIPS 204

ML-DSA (ex-CRYSTALS-Dilithium)

Digital Signature

Implemented, ACVP + Wycheproof validated

FIPS 205

SLH-DSA (ex-SPHINCS+)

Stateless Hash-Based Signature

Implemented, ACVP validated (Wycheproof has no SLH-DSA corpus yet)

ML-KEM (FIPS 203)

Module-lattice-based Key Encapsulation Mechanism. Derived from CRYSTALS-Kyber. A keygen / encaps / decaps KEM producing a 32-byte shared secret; decapsulation uses the Fujisaki–Okamoto transform with implicit rejection so a malformed ciphertext yields a deterministic secret indistinguishable from a legitimate one.

ML-DSA (FIPS 204)

Module-lattice-based Digital Signature Algorithm. Derived from CRYSTALS-Dilithium. A Fiat–Shamir-with-aborts signature scheme; signing is a hedged rejection loop that mixes fresh randomness with the secret key. Verification is deterministic and does not touch any secret.

SLH-DSA (FIPS 205)

Stateless Hash-Based Digital Signature Algorithm. Derived from SPHINCS+. Security relies on the second-preimage resistance of SHAKE / SHA-2 only — no algebraic assumption. Signatures are large (7–50 KiB depending on parameter set) but the underlying primitive is conservative and quantum-safe.

Cargo features

[dependencies]
quantica = { path = "../quantica" }   # default = std + 3 algos + sca-protected

Feature

Default

Effect

std

Pulls in the Rust standard library. Enables OsRng and std::error::Error impls.

ml-kem

Compiles the FIPS 203 module (quantica::ml_kem).

ml-dsa

Compiles the FIPS 204 module (quantica::ml_dsa).

slh-dsa

Compiles the FIPS 205 module (quantica::slh_dsa).

sca-protected

Activates the masking + shuffled-NTT defences in ML-KEM and ML-DSA.

Disabling std makes the crate no_std (still requires alloc). In that mode the OS-backed OsRng disappears — the caller must provide their own CryptoRng impl wrapping a hardware RNG.

Quick start

ML-KEM (FIPS 203) — Key Encapsulation

use quantica::ml_kem::*;

let mut rng = OsRng;

// Key generation. ek is a public EncapsulationKey<MlKem768>;
// dk is a DecapsulationKey<MlKem768> that auto-zeroizes on Drop.
let (ek, dk) = MlKem::<MlKem768>::keygen(&mut rng).unwrap();

// Encapsulation (Bob): produces a 32-byte SharedSecret + a Ciphertext.
let (shared_secret_bob, ciphertext) =
    MlKem::<MlKem768>::encaps(&ek, &mut rng).unwrap();

// Decapsulation (Alice): recovers the same SharedSecret.
let shared_secret_alice =
    MlKem::<MlKem768>::decaps(&dk, &ciphertext, &mut rng).unwrap();

assert_eq!(shared_secret_alice, shared_secret_bob);
// Both shared secrets wipe themselves at end of scope.

ML-DSA (FIPS 204) — Digital Signature

use quantica::ml_dsa::*;

let mut rng = OsRng;

// VerifyingKey<MlDsa65> + zeroizing SigningKey<MlDsa65>.
let (pk, sk) = MlDsa::<MlDsa65>::keygen(&mut rng).unwrap();

// Sign — uses hedged signing (mixes fresh RNG bytes with the secret key).
let sig: Signature<MlDsa65> =
    MlDsa::<MlDsa65>::sign(&sk, b"message", b"", &mut rng).unwrap();

let valid = MlDsa::<MlDsa65>::verify(&pk, b"message", b"", &sig).unwrap();
assert!(valid);

SLH-DSA (FIPS 205) — Stateless Hash-Based Signature

use quantica::slh_dsa::*;

let mut rng = OsRng;

let (sk, pk) = SlhDsa::<Shake128f>::keygen(&mut rng).unwrap();

let sig = SlhDsa::<Shake128f>::sign(b"message", &sk, &mut rng).unwrap();

let valid = SlhDsa::<Shake128f>::verify(b"message", &sig, &pk).unwrap();
assert!(valid);

Typed key wrappers (Zeroize-on-Drop)

The public API never returns raw Vec<u8> for secret material. Each algorithm exposes parameter-set-tagged wrapper types backed by the shared [quantica::secret] module:

Module

Public (not zeroized)

Secret (Drop-zeroizes via silentops::ct_zeroize)

ml_kem

EncapsulationKey<P>, Ciphertext<P>

DecapsulationKey<P>, SharedSecret

ml_dsa

VerifyingKey<P>, Signature<P>

SigningKey<P>

slh_dsa

VerifyingKey<P>, Signature<P>

SigningKey<P>

All wrappers implement from_bytes(&[u8]) (length-validated against the parameter set), as_bytes() -> &[u8], Deref<Target=[u8]>, AsRef<[u8]>, and a manual Clone. The secret variants additionally have a redacted Debug impl that prints <redacted; len=N> so a stray eprintln! cannot leak key material into a log file.

The internal byte-slice API (quantica::ml_kem::kem::*, quantica::ml_dsa::dsa::*, quantica::slh_dsa::slh::*) is still exposed for ACVP/CAVP testing and for the C FFI, which prefers raw Vec<u8> to keep the FFI boundary thin.

Parameter sets / curve families

ML-KEM (FIPS 203)

Parameter set

Security

ek (B)

dk (B)

ct (B)

ss (B)

ML-KEM-512

Cat. 1

800

1632

768

32

ML-KEM-768

Cat. 3

1184

2400

1088

32

ML-KEM-1024

Cat. 5

1568

3168

1568

32

ML-DSA (FIPS 204)

Parameter set

Security

pk (B)

sk (B)

sig (B)

ML-DSA-44

Cat. 2

1312

2560

2420

ML-DSA-65

Cat. 3

1952

4032

3309

ML-DSA-87

Cat. 5

2592

4896

4627

SLH-DSA (FIPS 205) — SHAKE variants only

Parameter set

Security

n

pk (B)

sk (B)

sig (B)

SLH-DSA-SHAKE-128s

Cat. 1

16

32

64

7 856

SLH-DSA-SHAKE-128f

Cat. 1

16

32

64

17 088

SLH-DSA-SHAKE-192s

Cat. 3

24

48

96

16 224

SLH-DSA-SHAKE-192f

Cat. 3

24

48

96

35 664

SLH-DSA-SHAKE-256s

Cat. 5

32

64

128

29 792

SLH-DSA-SHAKE-256f

Cat. 5

32

64

128

49 856

s variants optimize for small signatures, f variants for fast signing and verification. SHA2-based parameter sets are not yet implemented (see “Known limitations” below).

Design decisions

  • Zero dependencies — only core + alloc (and optionally std). SHA-3 / SHAKE are implemented from scratch on top of a single shared Keccak-f[1600] core in src/sha3.rs; each algorithm exposes its own thin wrapper.

  • Generic over parameter setsMlKem<P>, MlDsa<P>, SlhDsa<P> are monomorphized at compile time via const generics, so a single code path serves all security levels.

  • Internal byte-slice API stays rawkeygen_internal, encaps_internal, sign_internal, verify_internal accept and return raw &[u8] / Vec<u8>. The KAT tests and the C FFI use this layer; only the high-level MlKem<P>::keygen etc. wrap into the typed key types.

  • Arithmetic widths — i16 for ML-KEM (q = 3329 fits in 12 bits), i32 for ML-DSA (q = 8 380 417 needs 23 bits), no NTT at all for SLH-DSA.

  • NTT differences — ML-KEM uses BitRev_7 with a partial NTT (down to length-2, base-case multiply); ML-DSA uses BitRev_8 with a full NTT (down to length-1, simple pointwise multiply).

  • SLH-DSA architecture — WOTS+ → XMSS → Hypertree → FORS → SLH-DSA. Purely hash-based, no algebraic structures.

Side-channel countermeasures (summary)

Always-on

These defences are active in every build, regardless of feature flags:

Countermeasure

Algorithm

Threat addressed

How

Constant-time arithmetic

ML-KEM, ML-DSA, SLH-DSA

Timing / cache-timing / basic SPA

Branchless mod_q, ct_eq, ct_select from silentops

Zeroize-on-Drop wrappers

ML-KEM, ML-DSA, SLH-DSA

Cold boot, memory dumps, UAF

SecretBytes / SecretArraysilentops::ct_zeroize on Drop

Volatile zeroization

ML-KEM, ML-DSA, SLH-DSA

Cold boot, memory dumps

core::ptr::write_volatile + compiler_fence on intermediates

Double Decaps

ML-KEM

DFA on FO comparison

Decaps runs twice; results compared; mismatch ⇒ random output

dk integrity check

ML-KEM

DFA on stored key material

H(ek) is embedded in dk and re-checked at every Decaps

Hedged signing

ML-DSA, SLH-DSA

Fault-induced nonce reuse

32 bytes of fresh entropy mixed into the per-signature derivation

Feature-gated (sca-protected, on by default)

Countermeasure

Algorithm

Threat addressed

Module

First-order additive masking

ML-KEM

First-order DPA, template attacks

ml_kem::masked

NTT butterfly shuffling

ML-KEM

SPA, trace alignment for DPA

ml_kem::shuffle

First-order additive masking

ML-DSA

First-order DPA, template attacks

ml_dsa::masked

Shuffled NTT (secret poly)

ML-DSA

SPA, trace alignment for DPA

ml_dsa::shuffle

Mask refresh between rounds

ML-DSA

Higher-order share correlation

MaskedPoly::refresh() between rejection iterations

The masking layer is mathematically transparent — the masked path produces bit-identical keys, ciphertexts, and signatures to the unmasked path, which is why the NIST ACVP vectors keep matching with sca-protected enabled. Internally:

  • ML-KEM: secret polynomials s, e, etc. are split into two additive shares mod q = 3329 immediately after CBD sampling. NTTs run on each share independently (linearity of the NTT), pointwise multiplications by public matrices distribute over the shares.

  • ML-DSA: in dsa::sign_internal, the secret-key vectors s1, s2, t0 are NTT-transformed via shuffle::ntt_shuffled then split into MaskedPoly arrays. Each per-rejection-iteration multiplication ĉ · ŝx runs through masked_pointwise_mul_public, followed by MaskedPoly::refresh() to prevent inter-iteration share correlation. Mask randomness is drawn from a SHAKE256-seeded deterministic ScaRng (seed = K rnd tr M'), so sign_internal keeps a deterministic signature and the ACVP fixed-rnd vectors still match.

  • SLH-DSA: hash-based, no algebraic structure to mask — first-order masking does not buy anything here. The always-on defences (CT arithmetic, zeroization, hedged signing) are the relevant layer.

Approximate cost (single-threaded, release mode)

Operation

Plain

sca-protected

Slowdown

ML-KEM-768 Decaps

~0.03 ms

~0.07 ms (double)

~2.3×

ML-DSA-65 Sign

~2.2 ms

~7.1 ms

~3.2×

Numbers vary widely with hardware. Run the quantica_bench companion crate for measurements on your machine.

Timing leakage verification (dudect)

The shared silentops::verify module implements the dudect methodology of Reparaz, Balasch and Verbauwhede (2017). A pre-built harness exercises the most sensitive paths:

cargo run --release -p silentops --features std --example ct_verify_pqc

Currently checks:

  • ML-KEM-768 Decaps — valid vs random ciphertext (implicit-rejection timing)

  • ML-KEM Barrett reduce — small vs large input

  • ML-DSA-44 Sign — message A vs message B (message-independent timing)

  • ML-DSA-44 Verify — valid vs invalid signature

A t-statistic with |t| < 4.5 after ~10⁶ samples is considered passing (p < 10⁻⁵). Note that ML-DSA Sign uses rejection sampling, so its timing inherently varies — a FAIL there is not necessarily a vulnerability if the variation is independent of the secret key.

Known residual surface

The following attack surfaces are not currently defended against and are documented here so the reader knows what they are deploying. They are tracked in the side-channel annex and in the tier-4 hardening roadmap.

  • Masked Keccak / SHAKE — the hash primitive feeding the PRF in ML-KEM / ML-DSA / SLH-DSA is not masked; a DPA attacker with trace access can mount Kannwischer-style attacks on SK.seed. A 3-share SHAKE variant is planned (see tier-4 item T4-K).

  • Grafting-tree fault attacks on SLH-DSA — SLH-DSA signing does not yet include a post-sign redundancy check; a single-fault attacker (physical or Rowhammer-class) can coerce a forgery. Redundancy is planned (tier-4 T4-H / T4-J / T4-L).

  • Heap allocations on the secret path — secret-key buffers come from alloc rather than caller-provided fixed buffers. A future refactor will thread &mut [u8] end-to-end for bare-metal stack-only operation.

  • Higher-order DPA across rejection iterations — ML-DSA shares s1, s2, t0 are first-order-masked but not refreshed between rejection iterations; a higher-order adversary combining two iterations’ leakage remains in scope. Scheduled as tier-4 T4-C.

  • Pointer-level CMOV by the compiler — the Rust bit-hack CT primitives are defended by the silentops asm backend on x86_64 and ARM; on targets without an asm backend (e.g. WebAssembly), the CT guarantee is best-effort source-level only.

Per-algorithm deep dives

The summary above lists which countermeasures are active; the full per-algorithm SCA analyses — threat matrices, attack references, code pointers, residual risks — live under quantica/doc/sca/countermeasures/ in the repository. The Sphinx documentation pack (./gendoc.sh quantica) inlines them as a navigable cross-linked tree below.

Performance

Run the workspace bench tool:

cargo run --release -p quantica_bench

Representative single-threaded numbers (no SIMD, no NEON, sca-protected on):

Algorithm

KeyGen

Sign / Encaps

Verify / Decaps

ML-KEM-768

~0.03 ms

~0.04 ms

~0.07 ms

ML-DSA-65

~0.10 ms

~7.1 ms

~0.12 ms

SLH-DSA-SHAKE-128f

~2 ms

~40 ms

~2 ms

Notes:

  • ML-KEM uses full Montgomery NTT arithmetic (shifts instead of divisions).

  • ML-DSA Sign times vary because of rejection sampling.

  • SLH-DSA is dominated by SHAKE evaluations; release mode is essential (debug mode is ~100× slower).

Building

Desktop / server (default)

# Build everything (opt-level=2, CT-safe, all algos + sca-protected on)
cargo build --release -p quantica

# Build with no SCA countermeasures (faster, dudect baseline)
cargo build --release -p quantica \
    --no-default-features --features std,ml-kem,ml-dsa,slh-dsa

# Run all tests (ACVP vectors, secret-module, masked/shuffle round-trips)
cargo test --release -p quantica

# Generate the rustdoc API reference
cargo doc -p quantica --no-deps --open

no_std / bare-metal cross-compile

# Install the targets we care about
rustup target add thumbv7em-none-eabihf       # Cortex-M4/M7
rustup target add thumbv6m-none-eabi          # Cortex-M0/M0+
rustup target add thumbv8m.main-none-eabihf   # Cortex-M33 (TrustZone)
rustup target add riscv32imc-unknown-none-elf # ESP32-C3, SiFive

# Cross-compile no_std + all 3 algos + sca-protected
cargo build -p quantica \
    --no-default-features \
    --features ml-kem,ml-dsa,slh-dsa,sca-protected \
    --target thumbv7em-none-eabihf

In no_std mode the crate still depends on alloc (keys, ciphertexts and signatures are Vec<u8>-backed). The OS-backed OsRng is unavailable — provide your own CryptoRng implementation that delegates to a hardware TRNG.

Cargo profiles

The workspace Cargo.toml declares three profiles:

Profile

opt-level

CT guarantee

Use case

release

2

Yes (Rust source-level)

Desktop / server production

release-embedded

z + abort

Yes (asm CT backends)

Embedded, minimum size

release-bench

3

No (LLVM may break CT patterns)

Benchmarks only

⚠️ opt-level=3 can defeat constant-time guarantees: LLVM may convert bitwise mask patterns into conditional memory accesses. Always use opt-level=2 or lower for security-critical builds, or rely on the assembly CT backends from silentops (asm-aarch64, asm-thumbv7, asm-thumbv6m, asm-riscv32) which bypass the compiler entirely.

Test validation

All implementations are validated against three independent vector suites, all checked into tests/vectors/:

NIST ACVP — happy-path conformance

Official vectors from usnistgov/ACVP-Server. These are the NIST-authored known-answer tests that every FIPS 203 / 204 / 205 claimant must pass.

Algorithm

KeyGen

SigGen / Encaps

SigVer / Decaps

ML-KEM

75 / 75

75 / 75

30 / 30

ML-DSA

15 / 15

15 / 15

30 / 30

SLH-DSA

18 / 18

1 / 1 (128f)

3 / 3 (128f)

(SLH-DSA SigGen / SigVer covered only on SHAKE-128f for test wall-clock reasons; all 6 parameter sets share the same code path and KeyGen is validated on every one.)

Wycheproof — edge cases and negative tests

Vectors from the C2SP/wycheproof project, covering malformed inputs, corrupted keys, truncated ciphertexts / signatures, out-of-range coefficients, and other edge cases the NIST happy-path vectors do not exercise. Each vector carries a result field — valid, invalid, or acceptable — against which our implementation’s accept / reject decision is compared.

Algorithm

Files

Vectors

Coverage

ML-KEM

12

~1650

512 / 768 / 1024 — Encaps + Decaps

ML-DSA

9

~1020

44 / 65 / 87 — Sign (seed + noseed) + Verify

Total

21

~2 672

Custom negative / robustness tests

A hand-curated suite in tests/negative.rs targeting the specific error paths of each typed key wrapper — wrong-length inputs, silent wrong-result scenarios, FIPS 203 §7.2 encapsulation-key modulus check, FO-transform integrity under malformed ciphertexts, etc. Around 25 tests across the three algorithms.

Running everything

cargo test --release -p quantica

Policy on test suites

A necessary condition for adding a new cryptographic primitive to quantica is the availability of a public reference test suite for it. When a new peer-reviewed test corpus appears (a refreshed Wycheproof release, a new CAVP tranche, a community project like the IETF CFRG vectors), we re-import it and extend the test matrix accordingly; this is tracked as part of our ongoing crypto-research monitoring and is called out in the changelog.

Examples

Rust

cargo run --release -p quantica --example ml_kem_roundtrip
cargo run --release -p quantica --example ml_dsa_sign_verify
cargo run --release -p quantica --example slh_dsa_sign_verify

C FFI

For C consumers, the quantica_ffi companion crate exports a C ABI around the three algorithms and ships a standalone test_quantica.c example program. The shared library is built by:

cargo build --release -p quantica_ffi

and the generated C header (quantica.h) is kept under the FFI crate’s include/ directory.

Module map

quantica/
├── Cargo.toml
├── README.md                 (this file)
├── src/
│   ├── lib.rs                Re-exports the algo modules behind features
│   ├── secret.rs             SecretBytes / SecretArray (Zeroize-on-Drop)
│   ├── sha3.rs               Shared Keccak-f[1600] core (KeccakState)
│   ├── ml_kem/               FIPS 203 ML-KEM            (feature `ml-kem`)
│   │   ├── mod.rs            Public API: MlKem<P>, typed wrappers
│   │   ├── params.rs         MlKem512, MlKem768, MlKem1024
│   │   ├── sha3.rs           Thin wrappers: H, G, J, PRF, Xof
│   │   ├── ntt.rs            NTT mod 3329 (full Montgomery, i16)
│   │   ├── encode.rs         ByteEncode/Decode, Compress/Decompress
│   │   ├── sample.rs         SampleNTT, SamplePolyCBD
│   │   ├── kpke.rs           K-PKE (KeyGen, Encrypt, Decrypt)
│   │   ├── kem.rs            ML-KEM + double-decaps + dk integrity (DFA)
│   │   ├── rng.rs            CryptoRng trait + OsRng (std-only)
│   │   ├── masked.rs         First-order additive masking (DPA)
│   │   └── shuffle.rs        Fisher-Yates shuffled NTT (SPA)
│   ├── ml_dsa/               FIPS 204 ML-DSA            (feature `ml-dsa`)
│   │   ├── mod.rs            Public API: MlDsa<P>, typed wrappers
│   │   ├── params.rs         MlDsa44, MlDsa65, MlDsa87
│   │   ├── sha3.rs           Thin wrappers: SHAKE128/256, sha3_256/512
│   │   ├── ntt.rs            NTT mod 8 380 417 (Montgomery, i32)
│   │   ├── encode.rs         BitPack, pk/sk/sig encode/decode
│   │   ├── sample.rs         SampleInBall, RejNTTPoly, ExpandA/S/Mask
│   │   ├── decompose.rs      Power2Round, Decompose, HighBits, Hints
│   │   ├── dsa.rs            KeyGen, Sign (rejection loop, masked), Verify
│   │   ├── rng.rs            CryptoRng trait + OsRng (std-only)
│   │   ├── masked.rs         First-order additive masking (DPA)
│   │   └── shuffle.rs        Fisher-Yates shuffled NTT (SPA)
│   └── slh_dsa/              FIPS 205 SLH-DSA           (feature `slh-dsa`)
│       ├── mod.rs            Public API: SlhDsa<P>, typed wrappers
│       ├── params.rs         6 SHAKE parameter sets
│       ├── sha3.rs           Shake256 streaming wrapper
│       ├── address.rs        32-byte ADRS structure
│       ├── hash.rs           H_msg, PRF, PRF_msg, T_l, H, F
│       ├── wots.rs           WOTS+ one-time signatures
│       ├── xmss.rs           XMSS Merkle trees
│       ├── hypertree.rs      Hypertree of XMSS trees
│       ├── fors.rs           FORS forest
│       ├── slh.rs            SLH-DSA top-level
│       └── rng.rs            CryptoRng trait + OsRng (std-only)
├── examples/
│   ├── ml_kem_roundtrip.rs
│   ├── ml_dsa_sign_verify.rs
│   └── slh_dsa_sign_verify.rs
└── tests/
    ├── ml_kem_kat.rs
    ├── ml_dsa_kat.rs
    ├── slh_dsa_kat.rs
    └── vectors/              NIST ACVP-Server JSON / .rsp vectors

Known limitations

Side-channel protection

  • Vec<u8> heap allocations: secret-key buffers come from alloc, not from caller-provided fixed buffers. A future refactor will thread &mut [u8] everywhere for full bare-metal stack-only support.

  • write_volatile zeroization is the strongest erasure available in safe-ish Rust without external crates, but is not formally guaranteed against every compiler optimization on every target.

  • No formal CT verification yet (no ct-grind / Valgrind / ct-verif runs). The dudect harness gives statistical evidence, not proof.

Standards conformance

  • HashML-DSA (Algorithms 4 / 5) and HashSLH-DSA (Algorithm 23) pre-hash variants are structurally supported by the API but not tested. ACVP vectors with hashAlg != "none" are skipped.

  • SLH-DSA SHA2 parameter sets are not implemented; only the 6 SHAKE-based sets are.

  • Hedged signing is implemented, but only the deterministic variant (rnd = 0x00^32 for ML-DSA, opt_rand = pk.seed for SLH-DSA) is tested against ACVP vectors.

  • No CAVP certification — vectors come from the public NIST ACVP-Server GitHub mirror.

Portability

  • OsRng is Linux-only — reads /dev/urandom. Windows / macOS builds need custom adapters (BCryptGenRandom, SecRandomCopyBytes). Embedded targets must supply a hardware-RNG CryptoRng impl regardless.

Testing

  • Partial ACVP coverage — 1–25 vectors per operation, not the whole vector set, to keep test wall-clock low. Wycheproof is imported in full.

  • No SLH-DSA Wycheproof corpus exists yet — SLH-DSA validation currently rests on NIST ACVP vectors plus the custom negative suite; a Wycheproof import will be added when the upstream project ships vectors for FIPS 205.

  • No fuzzing, no CI/CD pipeline.

Roadmap

The full hardening roadmap lives under quantica/doc/sca/ (HTML rendered by ./gendoc.sh quantica). The summary below is the project’s living plan towards a third-party evaluation, indexed by Tier item identifier so each row maps to a stable cross-reference in the source code, the SCA annex and the workspace SECURITY.md lifecycle.

Status legend: ✅ done · 🔧 in progress · 📋 planned · 💤 deferred.

Tier 1 — Active vulnerabilities (critical path)

Items addressing documented attack vectors that affect the security of the implemented algorithms. The bulk of these are post-veille (2026-04-21) findings on the SLH-DSA fault surface, plus the ML-DSA mask-hygiene gaps surfaced by Hermelink CRYPTO 2025.

Id

Item

Status

T1-A

A3 — refresh ML-DSA shares (s1, s2, t0) at the start of every rejection iteration

T1-B

Hermelink 2025/276 audit pass on ml_dsa::masked (information-theoretic leakage map)

T1-C

FORS signature redundancy (anti-grafting-tree forgery, Castelnovi 2018, SLasH-DSA 2025)

T1-D

Full-tree streaming FORS sign (defeats template idx-recovery, Kannwischer 2018)

T1-E

Digest → FORS-indices integrity check

T1-F

Constant-time fors_pk_from_sig (prerequisite for T1-C)

Tier 2 — Hardening for evaluation

Id

Item

Status

T2-A

Explicit ct_grind::unpoison after the algorithmic unmask of w1, h, z in ML-DSA

📋

T2-B

Branch-free generate_permutation in ML-DSA shuffle (Feistel- or Floyd-based)

📋

T2-C

Documentation traceability — convert tools/ctgrind.supp into a “resolved-findings” annex once T2-A and T2-B land

📋

T2-D

Explicit ct_grind::unpoison of R, digest, FORS / WOTS / XMSS indices in SLH-DSA

📋

Tier 3 — Verification tooling

Id

Item

Status

T3-A

Cross-arch test infrastructure: qemu-user matrix (aarch64 / armv7 / riscv64 Linux) via cross + qemu-system matrix (riscv32imc / riscv32imac / thumbv6m / thumbv7em bare-metal) + custom semihosting host↔guest vector-streaming protocol so KAT corpora are not compiled into the bare-metal image. thumbv8m.main (M33 / STM32U5) is wired in tree but currently sidelined by an upstream rustc + cortex-m-rt link issue — asm-thumbv7 coverage is preserved via thumbv7em.

T3-B

Codeberg Forgejo Actions workflow (qemu-user + qemu-system + qemu-vector jobs) — replaces the originally scoped Gitea / turtle.local plan after the project moved its public CI to codeberg.org.

Tier 4 — Deferred / beyond the current evaluation scope

Id

Item

Status

T4-A

SUCRE (TCHES 2026.1) shuffle-and-unmask migration evaluation — 4–6× speedup vs. the current Coron 2024/1149 masked-y pipeline

💤

T4-B

First-order Boolean masking of the SHAKE PRF in SLH-DSA (Fluhrer 2024/500, 1.7× overhead)

💤

T4-C

Higher-order arithmetic masking on ML-DSA s1/s2/t0 (2-share, CC EAL4+ grade)

💤

T4-D

Higher-order masking on ML-KEM s (3-share, CC EAL4+ grade)

💤

T4-E

Hardened ML-KEM FO comparison against the eprint 2025/1577 template attack

💤

T4-F

Twiddle-factor masking inside the ML-KEM shuffled NTT (additional DPA defence layer)

💤

T4-G

SHA2-based SLH-DSA parameter sets (FIPS 205 Section 8) — currently SHAKE only

💤

T4-H

HashML-DSA / HashSLH-DSA pre-hash variants (FIPS 204 §6, FIPS 205 Algorithm 23)

💤

Tier 5 — Documentation pass

Cross-cutting documentation work, orthogonal to the cryptographic tiers above. Planned (not deferred); timing to be sequenced against the external evaluation calendar.

Id

Item

Status

T5-A

Workspace-wide doc pass (quantica + arcana): neutralise evaluation-target references — replace any CSPN-/ANSSI-specific language with generic evaluation / certification / audit terminology so the doc set reads cleanly against any third-party reviewer

T5-B

TOC review across the workspace doc set (doc/TOC.md contract + per-crate doc/ trees) — reorder chapters into 4 thematic clusters; rename ch.8 “Side-channel countermeasures” → “(summary)” + add Per-algorithm deep dives H3 bridging to the Sphinx pack

Already shipped (trace-back)

Items below were entries on a prior version of this roadmap and have since been delivered. They are kept here so a third-party reviewer can match each closed concern to its commit without re-opening it.

Item

Status

ML-DSA sca-masked-y pipeline (Coron 2024/1149)

✅ commit 3149b68

ML-DSA sca-ct-rejection (constant-time rejection loop)

ML-DSA first-order arithmetic masking on s1/s2/t0 + Fisher-Yates shuffled NTT

ML-DSA seven RAM-reduction features (179 KB → ~17 KB peak Sign stack)

ML-KEM first-order arithmetic masking on s/e + shuffled NTT

ML-KEM double-decaps + H(ek) integrity DFA

ML-KEM branchless fault-fallback (closes the timing oracle on the fault path)

✅ commit 5f0bdad

SLH-DSA iterative BDS FORS treehash (256 KiB → 448 B per call)

✅ commit fff156f

SLH-DSA streaming signature output (one allocation, *_into variants throughout)

✅ commit 1eb224f

silentops x86_64 / aarch64 inline-asm CT backends

✅ commit 90a1168

silentops::ct_grind::poison/unpoison Valgrind instrumentation

✅ commit 90a1168

Per-algorithm ctgrind harness (quantica_bench/src/bin/ctgrind.rs) + suppression file

✅ commit 241aeb1

Stack-painting memcheck tool (quantica_bench/src/bin/memcheck.rs)

✅ commit e21d6d0

Static stack-size analysis via nightly -Z emit-stack-sizes (tools/stack-sizes.sh)

✅ commit 5f30e69

Sphinx side-channel doc pack with bibliography + per-algorithm countermeasure chapters

✅ commit 32a76bd

Self-contained crate-owned quantica/doc/ tree (Option B layout)

✅ commit 5fc8c9b

T1-F — Constant-time fors_pk_from_sig (prereq for T1-C FORS redundancy)

✅ commit 1fe4b18

T1-C — FORS recompute-and-compare redundancy (sca-fors-redundancy feature, SLH-DSA grafting-tree defence)

✅ commit c6a916e

API cleanup post-T1C — single CT fors_pk_from_sig, unified slh_sign_internal, &Adrs template

✅ commit a8d9a4a

T1-D — Full-tree streaming FORS sign (sca-fors-dummy-siblings feature, anti-template Kannwischer 2018)

✅ commit 5d779c6

T1-E — Digest → FORS-indices integrity check (sca-fors-indices-check feature, anti-fault Castelnovi 2018)

✅ commit 8ff4e01

T1-B — Hermelink 2025/276 audit annex on ml_dsa::masked (doc-only, classifies leak surface)

✅ commit d73dc70

T1-A — Per-iteration mask refresh in ML-DSA rejection loop (head-of-loop, Hermelink §4 prescription)

✅ commit 738ec73

T5-A — Workspace-wide doc pass: neutralise evaluation-target language (CSPN/ANSSI → generic evaluation)

✅ commit eac79f5

T5-B — TOC reorder (4 thematic clusters) + SCA chapter summary-bridge to per-algo deep dives

✅ this branch

T3-A — Cross-arch test infrastructure (qemu-user matrix + qemu-system bare-metal matrix + semihosting vector-streaming protocol)

✅ commits ce06085, fe9b3d4, 617120f, dd7f867, 1d7b6fa

T3-B — Codeberg Forgejo Actions workflow (.forgejo/workflows/qemu-cross-tests.yml) covering all three qemu layers

✅ this branch

Suggested execution order (critical path)

  1. Sprint 1: T1-F + T1-C — closes the dominant published attack on SLH-DSA (Castelnovi grafting / SLasH-DSA Rowhammer). T1-F is the prerequisite (CT fors_pk_from_sig), T1-C the redundancy itself.

  2. Sprint 2: T1-D + T1-E + T1-B — completes the FORS hardening (template + fault on idx) and pushes the Hermelink leakage checklist through ml_dsa::masked.

  3. Sprint 3: T1-A + T2-A + T2-B — closes the ML-DSA higher-order recombination + the last two ctgrind suppressions for ML-DSA.

  4. Sprint 4: T2-D + T3-A + T3-B + T2-C — ctgrind unpoisons for SLH-DSA, CT3 QEMU portability, CI wiring, and the documentation conversion of tools/ctgrind.supp to a “resolved-findings” annex. The evaluation doc pack ships at the end of this sprint.

Effort estimate: ~3 weeks of dev for Tier 1 + Tier 2 (T1-C dominates, the rest are mostly mechanical), plus ~1 week for the Tier 3 verification wiring. Updates to this table are tracked in the change log of quantica/doc/sca/index.rst.

References

License

Apache-2.0.