# krypteia-quantica — Post-Quantum Cryptography for the krypteia workspace Pure-Rust implementations of the three NIST post-quantum standards, sharing a side-channel countermeasure toolkit (`silentops`, a companion crate of the same workspace) used by the classical side as well. Specifications (FIPS 203 / 204 / 205 PDFs) are vendored alongside the crate in the repository. ## Design rules The crate inherits the `krypteia` workspace design rules: 1. **Pure Rust, zero external crates** — only `core` (and `alloc`); `std` is optional behind a feature flag. 2. **Embedded-friendly** — small RAM footprint, fits secure elements, STM32 (Cortex-M0/M4/M33), RISC-V parts (ESP32-C3, …). 3. **Side-channel hardened** against SPA, DPA, DFA, template attacks, timing attacks. CT primitives come from `silentops`, with architecture-specific assembly backends. 4. **Validated** against the official NIST ACVP test vectors. 5. **C FFI-exposable** through the `quantica_ffi` companion crate. ## Algorithms | Standard | Algorithm | Type | Status | |----------|-----------|--------------------------------------------|------------------------------| | FIPS 203 | **ML-KEM** (ex-CRYSTALS-Kyber) | Key Encapsulation Mechanism | Implemented, ACVP + Wycheproof validated | | FIPS 204 | **ML-DSA** (ex-CRYSTALS-Dilithium) | Digital Signature | Implemented, ACVP + Wycheproof validated | | FIPS 205 | **SLH-DSA** (ex-SPHINCS+) | Stateless Hash-Based Signature | Implemented, ACVP validated (Wycheproof has no SLH-DSA corpus yet) | ### ML-KEM (FIPS 203) Module-lattice-based Key Encapsulation Mechanism. Derived from CRYSTALS-Kyber. A `keygen / encaps / decaps` KEM producing a 32-byte shared secret; decapsulation uses the Fujisaki–Okamoto transform with implicit rejection so a malformed ciphertext yields a deterministic secret indistinguishable from a legitimate one. ### ML-DSA (FIPS 204) Module-lattice-based Digital Signature Algorithm. Derived from CRYSTALS-Dilithium. A Fiat–Shamir-with-aborts signature scheme; signing is a hedged rejection loop that mixes fresh randomness with the secret key. Verification is deterministic and does not touch any secret. ### SLH-DSA (FIPS 205) Stateless Hash-Based Digital Signature Algorithm. Derived from SPHINCS+. Security relies on the second-preimage resistance of SHAKE / SHA-2 only — no algebraic assumption. Signatures are large (7–50 KiB depending on parameter set) but the underlying primitive is conservative and quantum-safe. ## Cargo features ```toml [dependencies] quantica = { path = "../quantica" } # default = std + 3 algos + sca-protected ``` | Feature | Default | Effect | |-----------------|:-------:|------------------------------------------------------------------------------| | `std` | ✅ | Pulls in the Rust standard library. Enables `OsRng` and `std::error::Error` impls. | | `ml-kem` | ✅ | Compiles the FIPS 203 module (`quantica::ml_kem`). | | `ml-dsa` | ✅ | Compiles the FIPS 204 module (`quantica::ml_dsa`). | | `slh-dsa` | ✅ | Compiles the FIPS 205 module (`quantica::slh_dsa`). | | `sca-protected` | ✅ | Activates the masking + shuffled-NTT defences in ML-KEM and ML-DSA. | Disabling `std` makes the crate `no_std` (still requires `alloc`). In that mode the OS-backed `OsRng` disappears — the caller must provide their own `CryptoRng` impl wrapping a hardware RNG. ## Quick start ### ML-KEM (FIPS 203) — Key Encapsulation ```rust use quantica::ml_kem::*; let mut rng = OsRng; // Key generation. ek is a public EncapsulationKey; // dk is a DecapsulationKey that auto-zeroizes on Drop. let (ek, dk) = MlKem::::keygen(&mut rng).unwrap(); // Encapsulation (Bob): produces a 32-byte SharedSecret + a Ciphertext. let (shared_secret_bob, ciphertext) = MlKem::::encaps(&ek, &mut rng).unwrap(); // Decapsulation (Alice): recovers the same SharedSecret. let shared_secret_alice = MlKem::::decaps(&dk, &ciphertext, &mut rng).unwrap(); assert_eq!(shared_secret_alice, shared_secret_bob); // Both shared secrets wipe themselves at end of scope. ``` ### ML-DSA (FIPS 204) — Digital Signature ```rust use quantica::ml_dsa::*; let mut rng = OsRng; // VerifyingKey + zeroizing SigningKey. let (pk, sk) = MlDsa::::keygen(&mut rng).unwrap(); // Sign — uses hedged signing (mixes fresh RNG bytes with the secret key). let sig: Signature = MlDsa::::sign(&sk, b"message", b"", &mut rng).unwrap(); let valid = MlDsa::::verify(&pk, b"message", b"", &sig).unwrap(); assert!(valid); ``` ### SLH-DSA (FIPS 205) — Stateless Hash-Based Signature ```rust use quantica::slh_dsa::*; let mut rng = OsRng; let (sk, pk) = SlhDsa::::keygen(&mut rng).unwrap(); let sig = SlhDsa::::sign(b"message", &sk, &mut rng).unwrap(); let valid = SlhDsa::::verify(b"message", &sig, &pk).unwrap(); assert!(valid); ``` ## Typed key wrappers (Zeroize-on-Drop) The public API never returns raw `Vec` for secret material. Each algorithm exposes parameter-set-tagged wrapper types backed by the shared [`quantica::secret`] module: | Module | Public (not zeroized) | Secret (Drop-zeroizes via `silentops::ct_zeroize`) | |--------------|---------------------------------------------|-----------------------------------------------------| | `ml_kem` | `EncapsulationKey

`, `Ciphertext

` | `DecapsulationKey

`, `SharedSecret` | | `ml_dsa` | `VerifyingKey

`, `Signature

` | `SigningKey

` | | `slh_dsa` | `VerifyingKey

`, `Signature

` | `SigningKey

` | All wrappers implement `from_bytes(&[u8])` (length-validated against the parameter set), `as_bytes() -> &[u8]`, `Deref`, `AsRef<[u8]>`, and a manual `Clone`. The secret variants additionally have a redacted `Debug` impl that prints `` so a stray `eprintln!` cannot leak key material into a log file. The internal byte-slice API (`quantica::ml_kem::kem::*`, `quantica::ml_dsa::dsa::*`, `quantica::slh_dsa::slh::*`) is still exposed for ACVP/CAVP testing and for the C FFI, which prefers raw `Vec` to keep the FFI boundary thin. ## Parameter sets / curve families ### ML-KEM (FIPS 203) | Parameter set | Security | ek (B) | dk (B) | ct (B) | ss (B) | |---------------|----------|--------|--------|--------|--------| | ML-KEM-512 | Cat. 1 | 800 | 1632 | 768 | 32 | | ML-KEM-768 | Cat. 3 | 1184 | 2400 | 1088 | 32 | | ML-KEM-1024 | Cat. 5 | 1568 | 3168 | 1568 | 32 | ### ML-DSA (FIPS 204) | Parameter set | Security | pk (B) | sk (B) | sig (B) | |---------------|----------|--------|--------|---------| | ML-DSA-44 | Cat. 2 | 1312 | 2560 | 2420 | | ML-DSA-65 | Cat. 3 | 1952 | 4032 | 3309 | | ML-DSA-87 | Cat. 5 | 2592 | 4896 | 4627 | ### SLH-DSA (FIPS 205) — SHAKE variants only | Parameter set | Security | n | pk (B) | sk (B) | sig (B) | |---------------------|----------|----|--------|--------|---------| | SLH-DSA-SHAKE-128s | Cat. 1 | 16 | 32 | 64 | 7 856 | | SLH-DSA-SHAKE-128f | Cat. 1 | 16 | 32 | 64 | 17 088 | | SLH-DSA-SHAKE-192s | Cat. 3 | 24 | 48 | 96 | 16 224 | | SLH-DSA-SHAKE-192f | Cat. 3 | 24 | 48 | 96 | 35 664 | | SLH-DSA-SHAKE-256s | Cat. 5 | 32 | 64 | 128 | 29 792 | | SLH-DSA-SHAKE-256f | Cat. 5 | 32 | 64 | 128 | 49 856 | `s` variants optimize for small signatures, `f` variants for fast signing and verification. SHA2-based parameter sets are not yet implemented (see "Known limitations" below). ## Design decisions * **Zero dependencies** — only `core` + `alloc` (and optionally `std`). SHA-3 / SHAKE are implemented from scratch on top of a single shared Keccak-f[1600] core in `src/sha3.rs`; each algorithm exposes its own thin wrapper. * **Generic over parameter sets** — `MlKem

`, `MlDsa

`, `SlhDsa

` are monomorphized at compile time via const generics, so a single code path serves all security levels. * **Internal byte-slice API stays raw** — `keygen_internal`, `encaps_internal`, `sign_internal`, `verify_internal` accept and return raw `&[u8]` / `Vec`. The KAT tests and the C FFI use this layer; only the high-level `MlKem

::keygen` etc. wrap into the typed key types. * **Arithmetic widths** — i16 for ML-KEM (q = 3329 fits in 12 bits), i32 for ML-DSA (q = 8 380 417 needs 23 bits), no NTT at all for SLH-DSA. * **NTT differences** — ML-KEM uses BitRev_7 with a partial NTT (down to length-2, base-case multiply); ML-DSA uses BitRev_8 with a full NTT (down to length-1, simple pointwise multiply). * **SLH-DSA architecture** — WOTS+ → XMSS → Hypertree → FORS → SLH-DSA. Purely hash-based, no algebraic structures. ## Side-channel countermeasures (summary) ### Always-on These defences are active in every build, regardless of feature flags: | Countermeasure | Algorithm | Threat addressed | How | |-----------------------------|---------------------------|------------------------------------|----------------------------------------------------------------------| | Constant-time arithmetic | ML-KEM, ML-DSA, SLH-DSA | Timing / cache-timing / basic SPA | Branchless `mod_q`, `ct_eq`, `ct_select` from `silentops` | | Zeroize-on-Drop wrappers | ML-KEM, ML-DSA, SLH-DSA | Cold boot, memory dumps, UAF | `SecretBytes` / `SecretArray` → `silentops::ct_zeroize` on Drop | | Volatile zeroization | ML-KEM, ML-DSA, SLH-DSA | Cold boot, memory dumps | `core::ptr::write_volatile` + `compiler_fence` on intermediates | | **Double Decaps** | ML-KEM | DFA on FO comparison | Decaps runs twice; results compared; mismatch ⇒ random output | | **dk integrity check** | ML-KEM | DFA on stored key material | `H(ek)` is embedded in `dk` and re-checked at every Decaps | | **Hedged signing** | ML-DSA, SLH-DSA | Fault-induced nonce reuse | 32 bytes of fresh entropy mixed into the per-signature derivation | ### Feature-gated (`sca-protected`, on by default) | Countermeasure | Algorithm | Threat addressed | Module | |-----------------------------|--------------|------------------------------------|---------------------------------------| | First-order additive masking| ML-KEM | First-order DPA, template attacks | `ml_kem::masked` | | NTT butterfly shuffling | ML-KEM | SPA, trace alignment for DPA | `ml_kem::shuffle` | | First-order additive masking| **ML-DSA** | First-order DPA, template attacks | `ml_dsa::masked` | | Shuffled NTT (secret poly) | **ML-DSA** | SPA, trace alignment for DPA | `ml_dsa::shuffle` | | Mask refresh between rounds | **ML-DSA** | Higher-order share correlation | `MaskedPoly::refresh()` between rejection iterations | The masking layer is mathematically transparent — the masked path produces **bit-identical** keys, ciphertexts, and signatures to the unmasked path, which is why the NIST ACVP vectors keep matching with `sca-protected` enabled. Internally: * **ML-KEM**: secret polynomials `s`, `e`, etc. are split into two additive shares mod `q = 3329` immediately after CBD sampling. NTTs run on each share independently (linearity of the NTT), pointwise multiplications by public matrices distribute over the shares. * **ML-DSA**: in `dsa::sign_internal`, the secret-key vectors `s1`, `s2`, `t0` are NTT-transformed via `shuffle::ntt_shuffled` then split into `MaskedPoly` arrays. Each per-rejection-iteration multiplication `ĉ · ŝx` runs through `masked_pointwise_mul_public`, followed by `MaskedPoly::refresh()` to prevent inter-iteration share correlation. Mask randomness is drawn from a SHAKE256-seeded deterministic `ScaRng` (seed = `K ‖ rnd ‖ tr ‖ M'`), so `sign_internal` keeps a deterministic signature and the ACVP fixed-`rnd` vectors still match. * **SLH-DSA**: hash-based, no algebraic structure to mask — first-order masking does not buy anything here. The always-on defences (CT arithmetic, zeroization, hedged signing) are the relevant layer. #### Approximate cost (single-threaded, release mode) | Operation | Plain | `sca-protected` | Slowdown | |------------------------|------------|-------------------|------------| | ML-KEM-768 Decaps | ~0.03 ms | ~0.07 ms (double) | ~2.3× | | ML-DSA-65 Sign | ~2.2 ms | ~7.1 ms | ~3.2× | Numbers vary widely with hardware. Run the `quantica_bench` companion crate for measurements on your machine. ### Timing leakage verification (dudect) The shared `silentops::verify` module implements the dudect methodology of Reparaz, Balasch and Verbauwhede (2017). A pre-built harness exercises the most sensitive paths: ```bash cargo run --release -p silentops --features std --example ct_verify_pqc ``` Currently checks: * **ML-KEM-768 Decaps** — valid vs random ciphertext (implicit-rejection timing) * **ML-KEM Barrett reduce** — small vs large input * **ML-DSA-44 Sign** — message A vs message B (message-independent timing) * **ML-DSA-44 Verify** — valid vs invalid signature A t-statistic with `|t| < 4.5` after ~10⁶ samples is considered passing (`p < 10⁻⁵`). Note that ML-DSA Sign uses rejection sampling, so its timing inherently varies — a `FAIL` there is not necessarily a vulnerability if the variation is independent of the secret key. ### Known residual surface The following attack surfaces are *not* currently defended against and are documented here so the reader knows what they are deploying. They are tracked in the side-channel annex and in the tier-4 hardening roadmap. * **Masked Keccak / SHAKE** — the hash primitive feeding the PRF in ML-KEM / ML-DSA / SLH-DSA is not masked; a DPA attacker with trace access can mount Kannwischer-style attacks on `SK.seed`. A 3-share SHAKE variant is planned (see tier-4 item `T4-K`). * **Grafting-tree fault attacks on SLH-DSA** — SLH-DSA signing does not yet include a post-sign redundancy check; a single-fault attacker (physical or Rowhammer-class) can coerce a forgery. Redundancy is planned (tier-4 `T4-H` / `T4-J` / `T4-L`). * **Heap allocations on the secret path** — secret-key buffers come from `alloc` rather than caller-provided fixed buffers. A future refactor will thread `&mut [u8]` end-to-end for bare-metal stack-only operation. * **Higher-order DPA across rejection iterations** — ML-DSA shares `s1`, `s2`, `t0` are first-order-masked but not refreshed between rejection iterations; a higher-order adversary combining two iterations' leakage remains in scope. Scheduled as tier-4 `T4-C`. * **Pointer-level CMOV by the compiler** — the Rust bit-hack CT primitives are defended by the `silentops` asm backend on x86_64 and ARM; on targets without an asm backend (e.g. WebAssembly), the CT guarantee is best-effort source-level only. ### Per-algorithm deep dives The summary above lists which countermeasures are active; the full per-algorithm SCA analyses — threat matrices, attack references, code pointers, residual risks — live under `quantica/doc/sca/countermeasures/` in the repository. The Sphinx documentation pack (`./gendoc.sh quantica`) inlines them as a navigable cross-linked tree below. ```{toctree} :maxdepth: 2 ../quantica/sca/index ``` ## Performance Run the workspace bench tool: ```bash cargo run --release -p quantica_bench ``` Representative single-threaded numbers (no SIMD, no NEON, sca-protected on): | Algorithm | KeyGen | Sign / Encaps | Verify / Decaps | |------------------|----------|---------------|-----------------| | ML-KEM-768 | ~0.03 ms | ~0.04 ms | ~0.07 ms | | ML-DSA-65 | ~0.10 ms | ~7.1 ms | ~0.12 ms | | SLH-DSA-SHAKE-128f | ~2 ms | ~40 ms | ~2 ms | Notes: * ML-KEM uses full Montgomery NTT arithmetic (shifts instead of divisions). * ML-DSA Sign times vary because of rejection sampling. * SLH-DSA is dominated by SHAKE evaluations; release mode is essential (debug mode is ~100× slower). ## Building ### Desktop / server (default) ```bash # Build everything (opt-level=2, CT-safe, all algos + sca-protected on) cargo build --release -p quantica # Build with no SCA countermeasures (faster, dudect baseline) cargo build --release -p quantica \ --no-default-features --features std,ml-kem,ml-dsa,slh-dsa # Run all tests (ACVP vectors, secret-module, masked/shuffle round-trips) cargo test --release -p quantica # Generate the rustdoc API reference cargo doc -p quantica --no-deps --open ``` ### `no_std` / bare-metal cross-compile ```bash # Install the targets we care about rustup target add thumbv7em-none-eabihf # Cortex-M4/M7 rustup target add thumbv6m-none-eabi # Cortex-M0/M0+ rustup target add thumbv8m.main-none-eabihf # Cortex-M33 (TrustZone) rustup target add riscv32imc-unknown-none-elf # ESP32-C3, SiFive # Cross-compile no_std + all 3 algos + sca-protected cargo build -p quantica \ --no-default-features \ --features ml-kem,ml-dsa,slh-dsa,sca-protected \ --target thumbv7em-none-eabihf ``` In `no_std` mode the crate still depends on `alloc` (keys, ciphertexts and signatures are `Vec`-backed). The OS-backed `OsRng` is unavailable — provide your own `CryptoRng` implementation that delegates to a hardware TRNG. ### Cargo profiles The workspace `Cargo.toml` declares three profiles: | Profile | opt-level | CT guarantee | Use case | |---------------------|-----------|----------------------------------------|----------------------------------| | `release` | 2 | Yes (Rust source-level) | Desktop / server production | | `release-embedded` | z + abort | Yes (asm CT backends) | Embedded, minimum size | | `release-bench` | 3 | **No** (LLVM may break CT patterns) | Benchmarks only | > ⚠️ `opt-level=3` can defeat constant-time guarantees: LLVM may convert > bitwise mask patterns into conditional memory accesses. Always use > `opt-level=2` or lower for security-critical builds, or rely on the > assembly CT backends from `silentops` (`asm-aarch64`, `asm-thumbv7`, > `asm-thumbv6m`, `asm-riscv32`) which bypass the compiler entirely. ## Test validation All implementations are validated against three independent vector suites, all checked into `tests/vectors/`: ### NIST ACVP — happy-path conformance Official vectors from [`usnistgov/ACVP-Server`](https://github.com/usnistgov/ACVP-Server). These are the NIST-authored known-answer tests that every FIPS 203 / 204 / 205 claimant must pass. | Algorithm | KeyGen | SigGen / Encaps | SigVer / Decaps | |-----------|-------------|-----------------|-----------------| | ML-KEM | 75 / 75 | 75 / 75 | 30 / 30 | | ML-DSA | 15 / 15 | 15 / 15 | 30 / 30 | | SLH-DSA | 18 / 18 | 1 / 1 (128f) | 3 / 3 (128f) | (SLH-DSA SigGen / SigVer covered only on SHAKE-128f for test wall-clock reasons; all 6 parameter sets share the same code path and KeyGen is validated on every one.) ### Wycheproof — edge cases and negative tests Vectors from the [C2SP/wycheproof](https://github.com/C2SP/wycheproof) project, covering malformed inputs, corrupted keys, truncated ciphertexts / signatures, out-of-range coefficients, and other edge cases the NIST happy-path vectors do not exercise. Each vector carries a `result` field — `valid`, `invalid`, or `acceptable` — against which our implementation's accept / reject decision is compared. | Algorithm | Files | Vectors | Coverage | |-----------|------:|--------:|----------------------------------------------| | ML-KEM | 12 | ~1650 | 512 / 768 / 1024 — Encaps + Decaps | | ML-DSA | 9 | ~1020 | 44 / 65 / 87 — Sign (seed + noseed) + Verify | | **Total** | **21** | **~2 672** | | ### Custom negative / robustness tests A hand-curated suite in `tests/negative.rs` targeting the specific error paths of each typed key wrapper — wrong-length inputs, silent wrong-result scenarios, FIPS 203 §7.2 encapsulation-key modulus check, FO-transform integrity under malformed ciphertexts, etc. Around 25 tests across the three algorithms. ### Running everything ```bash cargo test --release -p quantica ``` ### Policy on test suites A necessary condition for adding a new cryptographic primitive to `quantica` is the availability of a public reference test suite for it. When a new peer-reviewed test corpus appears (a refreshed Wycheproof release, a new CAVP tranche, a community project like the IETF CFRG vectors), we re-import it and extend the test matrix accordingly; this is tracked as part of our ongoing crypto-research monitoring and is called out in the changelog. ## Examples ### Rust ```bash cargo run --release -p quantica --example ml_kem_roundtrip cargo run --release -p quantica --example ml_dsa_sign_verify cargo run --release -p quantica --example slh_dsa_sign_verify ``` ### C FFI For C consumers, the `quantica_ffi` companion crate exports a C ABI around the three algorithms and ships a standalone `test_quantica.c` example program. The shared library is built by: ```bash cargo build --release -p quantica_ffi ``` and the generated C header (`quantica.h`) is kept under the FFI crate's `include/` directory. ## Module map ``` quantica/ ├── Cargo.toml ├── README.md (this file) ├── src/ │ ├── lib.rs Re-exports the algo modules behind features │ ├── secret.rs SecretBytes / SecretArray (Zeroize-on-Drop) │ ├── sha3.rs Shared Keccak-f[1600] core (KeccakState) │ ├── ml_kem/ FIPS 203 ML-KEM (feature `ml-kem`) │ │ ├── mod.rs Public API: MlKem

, typed wrappers │ │ ├── params.rs MlKem512, MlKem768, MlKem1024 │ │ ├── sha3.rs Thin wrappers: H, G, J, PRF, Xof │ │ ├── ntt.rs NTT mod 3329 (full Montgomery, i16) │ │ ├── encode.rs ByteEncode/Decode, Compress/Decompress │ │ ├── sample.rs SampleNTT, SamplePolyCBD │ │ ├── kpke.rs K-PKE (KeyGen, Encrypt, Decrypt) │ │ ├── kem.rs ML-KEM + double-decaps + dk integrity (DFA) │ │ ├── rng.rs CryptoRng trait + OsRng (std-only) │ │ ├── masked.rs First-order additive masking (DPA) │ │ └── shuffle.rs Fisher-Yates shuffled NTT (SPA) │ ├── ml_dsa/ FIPS 204 ML-DSA (feature `ml-dsa`) │ │ ├── mod.rs Public API: MlDsa

, typed wrappers │ │ ├── params.rs MlDsa44, MlDsa65, MlDsa87 │ │ ├── sha3.rs Thin wrappers: SHAKE128/256, sha3_256/512 │ │ ├── ntt.rs NTT mod 8 380 417 (Montgomery, i32) │ │ ├── encode.rs BitPack, pk/sk/sig encode/decode │ │ ├── sample.rs SampleInBall, RejNTTPoly, ExpandA/S/Mask │ │ ├── decompose.rs Power2Round, Decompose, HighBits, Hints │ │ ├── dsa.rs KeyGen, Sign (rejection loop, masked), Verify │ │ ├── rng.rs CryptoRng trait + OsRng (std-only) │ │ ├── masked.rs First-order additive masking (DPA) │ │ └── shuffle.rs Fisher-Yates shuffled NTT (SPA) │ └── slh_dsa/ FIPS 205 SLH-DSA (feature `slh-dsa`) │ ├── mod.rs Public API: SlhDsa

, typed wrappers │ ├── params.rs 6 SHAKE parameter sets │ ├── sha3.rs Shake256 streaming wrapper │ ├── address.rs 32-byte ADRS structure │ ├── hash.rs H_msg, PRF, PRF_msg, T_l, H, F │ ├── wots.rs WOTS+ one-time signatures │ ├── xmss.rs XMSS Merkle trees │ ├── hypertree.rs Hypertree of XMSS trees │ ├── fors.rs FORS forest │ ├── slh.rs SLH-DSA top-level │ └── rng.rs CryptoRng trait + OsRng (std-only) ├── examples/ │ ├── ml_kem_roundtrip.rs │ ├── ml_dsa_sign_verify.rs │ └── slh_dsa_sign_verify.rs └── tests/ ├── ml_kem_kat.rs ├── ml_dsa_kat.rs ├── slh_dsa_kat.rs └── vectors/ NIST ACVP-Server JSON / .rsp vectors ``` ## Known limitations ### Side-channel protection * **`Vec` heap allocations**: secret-key buffers come from `alloc`, not from caller-provided fixed buffers. A future refactor will thread `&mut [u8]` everywhere for full bare-metal stack-only support. * **`write_volatile` zeroization** is the strongest erasure available in safe-ish Rust without external crates, but is not formally guaranteed against every compiler optimization on every target. * **No formal CT verification** yet (no ct-grind / Valgrind / ct-verif runs). The dudect harness gives statistical evidence, not proof. ### Standards conformance * **HashML-DSA** (Algorithms 4 / 5) and **HashSLH-DSA** (Algorithm 23) pre-hash variants are structurally supported by the API but not tested. ACVP vectors with `hashAlg != "none"` are skipped. * **SLH-DSA SHA2 parameter sets** are not implemented; only the 6 SHAKE-based sets are. * **Hedged signing** is implemented, but only the deterministic variant (`rnd = 0x00^32` for ML-DSA, `opt_rand = pk.seed` for SLH-DSA) is tested against ACVP vectors. * **No CAVP certification** — vectors come from the public NIST ACVP-Server GitHub mirror. ### Portability * **`OsRng` is Linux-only** — reads `/dev/urandom`. Windows / macOS builds need custom adapters (`BCryptGenRandom`, `SecRandomCopyBytes`). Embedded targets must supply a hardware-RNG `CryptoRng` impl regardless. ### Testing * **Partial ACVP coverage** — 1–25 vectors per operation, not the whole vector set, to keep test wall-clock low. Wycheproof is imported in full. * **No SLH-DSA Wycheproof corpus exists yet** — SLH-DSA validation currently rests on NIST ACVP vectors plus the custom negative suite; a Wycheproof import will be added when the upstream project ships vectors for FIPS 205. * **No fuzzing**, **no CI/CD pipeline**. ## Roadmap The full hardening roadmap lives under `quantica/doc/sca/` (HTML rendered by `./gendoc.sh quantica`). The summary below is the project's **living plan towards a third-party evaluation**, indexed by Tier item identifier so each row maps to a stable cross-reference in the source code, the SCA annex and the workspace `SECURITY.md` lifecycle. Status legend: ✅ done · 🔧 in progress · 📋 planned · 💤 deferred. ### Tier 1 — Active vulnerabilities (critical path) Items addressing documented attack vectors that affect the security of the implemented algorithms. The bulk of these are post-veille (2026-04-21) findings on the SLH-DSA fault surface, plus the ML-DSA mask-hygiene gaps surfaced by Hermelink CRYPTO 2025. | Id | Item | Status | |-------|------------------------------------------------------------------------------------------|--------| | T1-A | A3 — refresh ML-DSA shares (`s1`, `s2`, `t0`) at the start of every rejection iteration | ✅ | | T1-B | Hermelink 2025/276 audit pass on `ml_dsa::masked` (information-theoretic leakage map) | ✅ | | T1-C | FORS signature redundancy (anti-grafting-tree forgery, Castelnovi 2018, SLasH-DSA 2025) | ✅ | | T1-D | Full-tree streaming FORS sign (defeats template idx-recovery, Kannwischer 2018) | ✅ | | T1-E | Digest → FORS-indices integrity check | ✅ | | T1-F | Constant-time `fors_pk_from_sig` (prerequisite for T1-C) | ✅ | ### Tier 2 — Hardening for evaluation | Id | Item | Status | |-------|------------------------------------------------------------------------------------------|--------| | T2-A | Explicit `ct_grind::unpoison` after the algorithmic unmask of `w1`, `h`, `z` in ML-DSA | 📋 | | T2-B | Branch-free `generate_permutation` in ML-DSA shuffle (Feistel- or Floyd-based) | 📋 | | T2-C | Documentation traceability — convert `tools/ctgrind.supp` into a "resolved-findings" annex once T2-A and T2-B land | 📋 | | T2-D | Explicit `ct_grind::unpoison` of `R`, `digest`, FORS / WOTS / XMSS indices in SLH-DSA | 📋 | ### Tier 3 — Verification tooling | Id | Item | Status | |-------|------------------------------------------------------------------------------------------|--------| | T3-A | Cross-arch test infrastructure: qemu-user matrix (aarch64 / armv7 / riscv64 Linux) via `cross` + qemu-system matrix (riscv32imc / riscv32imac / thumbv6m / thumbv7em bare-metal) + custom semihosting host↔guest vector-streaming protocol so KAT corpora are not compiled into the bare-metal image. `thumbv8m.main` (M33 / STM32U5) is wired in tree but currently sidelined by an upstream rustc + cortex-m-rt link issue — `asm-thumbv7` coverage is preserved via `thumbv7em`. | ✅ | | T3-B | Codeberg Forgejo Actions workflow (qemu-user + qemu-system + qemu-vector jobs) — replaces the originally scoped Gitea / `turtle.local` plan after the project moved its public CI to codeberg.org. | ✅ | ### Tier 4 — Deferred / beyond the current evaluation scope | Id | Item | Status | |-------|------------------------------------------------------------------------------------------|--------| | T4-A | SUCRE (TCHES 2026.1) shuffle-and-unmask migration evaluation — 4–6× speedup vs. the current Coron 2024/1149 masked-`y` pipeline | 💤 | | T4-B | First-order Boolean masking of the SHAKE PRF in SLH-DSA (Fluhrer 2024/500, 1.7× overhead) | 💤 | | T4-C | Higher-order arithmetic masking on ML-DSA `s1`/`s2`/`t0` (2-share, CC EAL4+ grade) | 💤 | | T4-D | Higher-order masking on ML-KEM `s` (3-share, CC EAL4+ grade) | 💤 | | T4-E | Hardened ML-KEM FO comparison against the eprint 2025/1577 template attack | 💤 | | T4-F | Twiddle-factor masking inside the ML-KEM shuffled NTT (additional DPA defence layer) | 💤 | | T4-G | SHA2-based SLH-DSA parameter sets (FIPS 205 Section 8) — currently SHAKE only | 💤 | | T4-H | HashML-DSA / HashSLH-DSA pre-hash variants (FIPS 204 §6, FIPS 205 Algorithm 23) | 💤 | ### Tier 5 — Documentation pass Cross-cutting documentation work, orthogonal to the cryptographic tiers above. Planned (not deferred); timing to be sequenced against the external evaluation calendar. | Id | Item | Status | |-------|------------------------------------------------------------------------------------------|--------| | T5-A | Workspace-wide doc pass (`quantica` + `arcana`): neutralise evaluation-target references — replace any CSPN-/ANSSI-specific language with generic *evaluation / certification / audit* terminology so the doc set reads cleanly against any third-party reviewer | ✅ | | T5-B | TOC review across the workspace doc set (`doc/TOC.md` contract + per-crate `doc/` trees) — reorder chapters into 4 thematic clusters; rename ch.8 "Side-channel countermeasures" → "(summary)" + add `Per-algorithm deep dives` H3 bridging to the Sphinx pack | ✅ | ### Already shipped (trace-back) Items below were entries on a prior version of this roadmap and have since been delivered. They are kept here so a third-party reviewer can match each closed concern to its commit without re-opening it. | Item | Status | |-------------------------------------------------------------------------------------------------|----------------------| | ML-DSA `sca-masked-y` pipeline (Coron 2024/1149) | ✅ commit `3149b68` | | ML-DSA `sca-ct-rejection` (constant-time rejection loop) | ✅ | | ML-DSA first-order arithmetic masking on `s1`/`s2`/`t0` + Fisher-Yates shuffled NTT | ✅ | | ML-DSA seven RAM-reduction features (179 KB → ~17 KB peak Sign stack) | ✅ | | ML-KEM first-order arithmetic masking on `s`/`e` + shuffled NTT | ✅ | | ML-KEM double-decaps + `H(ek)` integrity DFA | ✅ | | ML-KEM branchless fault-fallback (closes the timing oracle on the fault path) | ✅ commit `5f0bdad` | | SLH-DSA iterative BDS FORS treehash (256 KiB → 448 B per call) | ✅ commit `fff156f` | | SLH-DSA streaming signature output (one allocation, `*_into` variants throughout) | ✅ commit `1eb224f` | | `silentops` x86_64 / aarch64 inline-asm CT backends | ✅ commit `90a1168` | | `silentops::ct_grind::poison`/`unpoison` Valgrind instrumentation | ✅ commit `90a1168` | | Per-algorithm ctgrind harness (`quantica_bench/src/bin/ctgrind.rs`) + suppression file | ✅ commit `241aeb1` | | Stack-painting memcheck tool (`quantica_bench/src/bin/memcheck.rs`) | ✅ commit `e21d6d0` | | Static stack-size analysis via nightly `-Z emit-stack-sizes` (`tools/stack-sizes.sh`) | ✅ commit `5f30e69` | | Sphinx side-channel doc pack with bibliography + per-algorithm countermeasure chapters | ✅ commit `32a76bd` | | Self-contained crate-owned `quantica/doc/` tree (Option B layout) | ✅ commit `5fc8c9b` | | T1-F — Constant-time `fors_pk_from_sig` (prereq for T1-C FORS redundancy) | ✅ commit `1fe4b18` | | T1-C — FORS recompute-and-compare redundancy (`sca-fors-redundancy` feature, SLH-DSA grafting-tree defence) | ✅ commit `c6a916e` | | API cleanup post-T1C — single CT `fors_pk_from_sig`, unified `slh_sign_internal`, `&Adrs` template | ✅ commit `a8d9a4a` | | T1-D — Full-tree streaming FORS sign (`sca-fors-dummy-siblings` feature, anti-template Kannwischer 2018) | ✅ commit `5d779c6` | | T1-E — Digest → FORS-indices integrity check (`sca-fors-indices-check` feature, anti-fault Castelnovi 2018) | ✅ commit `8ff4e01` | | T1-B — Hermelink 2025/276 audit annex on `ml_dsa::masked` (doc-only, classifies leak surface) | ✅ commit `d73dc70` | | T1-A — Per-iteration mask refresh in ML-DSA rejection loop (head-of-loop, Hermelink §4 prescription) | ✅ commit `738ec73` | | T5-A — Workspace-wide doc pass: neutralise evaluation-target language (CSPN/ANSSI → generic evaluation) | ✅ commit `eac79f5` | | T5-B — TOC reorder (4 thematic clusters) + SCA chapter summary-bridge to per-algo deep dives | ✅ this branch | | T3-A — Cross-arch test infrastructure (qemu-user matrix + qemu-system bare-metal matrix + semihosting vector-streaming protocol) | ✅ commits `ce06085`, `fe9b3d4`, `617120f`, `dd7f867`, `1d7b6fa` | | T3-B — Codeberg Forgejo Actions workflow (`.forgejo/workflows/qemu-cross-tests.yml`) covering all three qemu layers | ✅ this branch | ### Suggested execution order (critical path) 1. **Sprint 1**: T1-F + T1-C — closes the dominant published attack on SLH-DSA (Castelnovi grafting / SLasH-DSA Rowhammer). T1-F is the prerequisite (CT `fors_pk_from_sig`), T1-C the redundancy itself. 2. **Sprint 2**: T1-D + T1-E + T1-B — completes the FORS hardening (template + fault on idx) and pushes the Hermelink leakage checklist through `ml_dsa::masked`. 3. **Sprint 3**: T1-A + T2-A + T2-B — closes the ML-DSA higher-order recombination + the last two ctgrind suppressions for ML-DSA. 4. **Sprint 4**: T2-D + T3-A + T3-B + T2-C — ctgrind unpoisons for SLH-DSA, CT3 QEMU portability, CI wiring, and the documentation conversion of `tools/ctgrind.supp` to a "resolved-findings" annex. The evaluation doc pack ships at the end of this sprint. Effort estimate: ~3 weeks of dev for Tier 1 + Tier 2 (T1-C dominates, the rest are mostly mechanical), plus ~1 week for the Tier 3 verification wiring. Updates to this table are tracked in the change log of `quantica/doc/sca/index.rst`. ## References * [NIST FIPS 203](https://doi.org/10.6028/NIST.FIPS.203) — ML-KEM * [NIST FIPS 204](https://doi.org/10.6028/NIST.FIPS.204) — ML-DSA * [NIST FIPS 205](https://doi.org/10.6028/NIST.FIPS.205) — SLH-DSA * [NIST ACVP-Server](https://github.com/usnistgov/ACVP-Server) — official conformance test vectors * [C2SP / Wycheproof](https://github.com/C2SP/wycheproof) — edge-case and negative test vectors * Reparaz, Balasch, Verbauwhede (2017) — *"dude, is my code constant time?"* (the dudect methodology used in `silentops::verify`) ## License Apache-2.0.