.. # ============================================================================= # C O P Y R I G H T # ----------------------------------------------------------------------------- # Copyright (c) 2026 by ETAS GmbH. All rights reserved. # # The reproduction, distribution and utilization of this file as # well as the communication of its contents to others without express # authorization is prohibited. Offenders will be held liable for the # payment of damages. All rights reserved in the event of the grant # of a patent, utility model or design. # ============================================================================= .. _crypto_design_decisions: Design Decisions ================ ABI Compatibility for IPC Layer (Deferred Post-Stabilisation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: ABI Compatibility for IPC Layer (Deferred Post-Stabilisation) :id: dec_rec__crypto__no_abi_compatibility_ipc :status: proposed :context: doc__crypto_architecture :decision: ABI compatibility for the IPC layer between the crypto library and daemon is deferred until the API and wire format are stable. Once stable, a versioned wire format (FlatBuffers is the primary candidate) will be introduced to allow independent deployment of library and daemon versions. .. :affects: comp__crypto ABI compatibility for the IPC layer between the crypto library and daemon is deferred until the API and wire format are stable. Once stable, a versioned wire format (FlatBuffers is the primary candidate) will be introduced to allow independent deployment of library and daemon versions. Context ------- The crypto module uses an IPC layer to communicate between the client library and the daemon process. During initial development, maintaining strict ABI compatibility would impose versioning overhead (protocol negotiation, backward/forward compatibility handling) before the interfaces are settled. The decision is therefore to defer ABI compatibility until the stack is stable, at which point the investment is justified. Once the API and wire format are frozen, the following will be required: * A versioned wire format with forward/backward compatibility guarantees * Protocol negotiation mechanisms during connection establishment * Support for a defined compatibility window (e.g., N−1 daemon with N library) * Regression testing across supported version combinations Decision -------- ABI compatibility for the IPC layer is intentionally deferred for the initial pre-stable phase. Per machine, only one valid library-daemon combination is supported until the API stabilises. After stabilisation, a versioned wire format will be introduced — this decision will be revisited and finalised at that point. Consequences ------------ **Positive:** * Simplified implementation during pre-stable phase — no protocol versioning overhead * Reduced testing surface — only matching version pairs need validation * Faster initial development cycle — breaking changes can be made freely * Clearer deployment model for early adopters — single version per machine **Negative:** * Library and daemon must currently be updated together as a single unit * Cannot have multiple applications using different library versions on the same machine * No graceful degradation when versions mismatch during the deferred phase Alternatives Considered ----------------------- FlatBuffers ^^^^^^^^^^^ FlatBuffers is the primary candidate for the versioned wire format once stabilisation is reached. Advantages """""""""" * **Schema evolution** — fields can be added with default values without breaking existing serialised data; removed fields leave a gap that is safely skipped. * **Zero-copy access** — FlatBuffers tables are accessed in-place from the shared-memory buffer, aligning with the ``IMemoryAllocator`` zero-copy design. * **Deterministic layout** — table format is fully specified; no hidden heap allocation during access. * **Compact code generation** — generated C++ headers are ``noexcept``-friendly and free of ``std::function`` or ``std::string`` members. Disadvantages """"""""""""" * Final choice is deferred; other candidates (Protocol Buffers, Cap'n Proto, hand-rolled length-prefixed structs) are not excluded. Justification for the Decision ------------------------------ The decision to defer ABI compatibility is justified by the current pre-stable phase of development. Introducing versioning infrastructure before the interfaces are settled would add maintenance burden without benefit. The FlatBuffers candidate and this record preserve the intent and analysis so that the versioning work can proceed efficiently once the stack stabilises. --- CryptoResourceGuard Lifetime via Daemon-Side Reference Counting ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: CryptoResourceGuard Lifetime via Daemon-Side Reference Counting :id: dec_rec__crypto__cry_res_grd_lifetime :status: accepted :context: doc__crypto_architecture :decision: Transient key lifetime is managed exclusively in the daemon via reference counting. The guard holds a type-erased IPC release handle; Create*Context() increments the daemon ref-count atomically. The guard may be destroyed after Create*Context() returns. On client disconnect, the daemon bulk-frees all resources for that client. .. :affects: comp__crypto Transient key lifetime is managed exclusively in the daemon. The daemon ref-counts every ephemeral key; the client communicates changes via two IPC calls: ``Release(id)`` (guard destructor) and ``Create*Context(config with key_id)`` (context creation). Context ------- Transient crypto resources (keys, certificates) produced within an ``IKeyManagementContext`` or ``ICertificateManagementContext`` session are represented by a ``CryptoResourceGuard``. Key lifetime must survive all contexts actively using the key and be freed deterministically when neither a guard nor a context holds a reference. Decision -------- Key lifetime is managed in the daemon via a per-key reference count: .. list-table:: :header-rows: 1 :widths: 55 45 * - Event - Daemon action * - ``GenerateKey`` / ``DeriveKey`` / ``LoadKey`` / etc. - Creates key; ref = 1 * - ``Create*Context(config with key_id)`` - Validates key alive; ref++ * - Guard destroyed (``Release(id)`` IPC) - ref--; free key if ref == 0 * - Context destroyed - ref-- for each bound key * - Client disconnect (crash or normal exit) - Daemon bulk-frees all resources for that client This means: - The guard carries only the ``CryptoResourceId`` and a type-erased IPC release handle (``shared_ptr`` internally). - ``Create*Context()`` sends a single IPC call that validates the key and atomically records context ownership. If the guard was released before this call, the daemon returns ``kResourceNotFound`` — fail-fast, diagnosable behaviour. - **Crash safety**: on hard process termination (SIGKILL, power loss), no destructors run. The daemon detects the client disconnect and bulk-frees all resources. Daemon-side ref-counting is crash-safe by design. - ``BaseContextConfig`` carries only a ``CryptoResourceId`` for the key. **Application contract**: The guard must remain alive (``IsActive()`` returns true) at the moment ``Create*Context()`` is called. After that call returns successfully, the guard may be destroyed in any order relative to the context. .. code-block:: cpp auto key = key_mgmt->GenerateKey("AES-256").value(); CipherContextConfig config; config.SetAlgorithm("AES-256-GCM").SetKey(key).SetDirection(CipherDirection::kEncrypt); // guard must be alive here: auto cipher = ctx->CreateCipherContext(config).value(); // Daemon has incremented the key ref-count. Guard may now be destroyed. // ... use cipher ... Explicit Release and Guard Synchronisation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The daemon is the sole source of truth for resource validity. The guard's ``active_`` flag is a client-side hint, not a daemon query. **Normal path — destructor** (no application code needed): The guard destructor sends ``Release(id)`` IPC when active, silently swallowing the result (destructors cannot propagate errors). **Explicit path — synchronous error handling**: When the application needs to explicitly confirm release before destruction, call ``guard.Release()``: .. code-block:: cpp auto result = guard.Release(); // explicit, returns Result if (result.has_value()) { // no-op destructor } **Persist path — copy semantics**: ``IKeyManagementContext::PersistKey(const CryptoResourceId&, slot)`` takes the ephemeral key by ID (copy semantics). The guard that produced the ID remains active after ``PersistKey`` returns. The ephemeral copy continues to exist until the guard is released or goes out of scope; the persisted slot copy is independent: .. code-block:: cpp auto key = key_mgmt->GenerateKey((GenerateKeyParams{}.SetAlgorithm("AES-256"))).value(); key_mgmt->PersistKey(slot, key).value(); // key (guard) remains active // 'key' still holds the ephemeral copy — explicit Release() or destructor frees it // 'slot' is the persistent copy — use slot handle for all future durable operations ``SetKey`` and Implicit Guard Conversion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``CryptoResourceGuard`` provides an implicit conversion operator to ``const CryptoResourceId&``. All key-accepting config structs expose a single ``SetKey(const CryptoResourceId& k)`` overload. Passing a guard works directly via this conversion — no extra overload is needed: .. code-block:: cpp auto key = key_mgmt->GenerateKey((GenerateKeyParams{}.SetAlgorithm("AES-256"))).value(); // guard must be alive when CreateCipherContext is called: auto cipher = ctx->CreateCipherContext( CipherContextConfig{}.SetAlgorithm("AES-256-GCM").SetKey(key) .SetDirection(CipherDirection::kEncrypt)).value(); // Daemon incremented key ref-count. Guard may now be independently destroyed. cipher->Init(iv); cipher->Finalize(out_span); Consequences ------------ **Positive:** * Single source of truth for key lifetime: the daemon. No client-side ``shared_ptr`` propagation across API boundaries. * Crash-safe: daemon bulk-frees on disconnect regardless of client destructor state. * Fail-fast: releasing a guard before ``Create*Context`` is called returns ``kResourceNotFound`` — diagnosable, deterministic behaviour. * ``BaseContextConfig`` carries only ``CryptoResourceId`` — minimal and flat. **Negative:** * Guards must remain alive until ``Create*Context()`` returns — an intuitive but explicit application contract. * Daemon must maintain per-key ref-counts for all active ephemeral keys. Memory cost is bounded by max concurrent keys (deployment-time constant). --- Shared-Connection Anchor for Persistent Resource ID Stability ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Shared-Connection Anchor for Persistent Resource ID Stability :id: dec_rec__crypto__conn_anchor_persistent_ids :status: accepted :context: doc__crypto_architecture :decision: The IPC connection to the crypto daemon is the anchor for kPersistent resource IDs. All ICryptoStack instances within the same application share a single connection, ensuring that the same resource name always resolves to the same numeric CryptoResourceId regardless of which stack or context resolves it. ID assignment is on-demand per connection, and each application gets its own isolated connection with an independent ID namespace. .. :affects: comp__crypto The application-level daemon connection is a shared anchor across all ``ICryptoStack`` instances within the same application. Persistent resource IDs (``kPersistent``) are assigned on-demand per connection and remain stable for the connection's lifetime, independent of any individual ``ICryptoContext``. Context ------- Persistent resources (key slots, stored certificates, CRLs) outlive any individual context or session. Tying the validity of their ``CryptoResourceId`` handle to the lifetime of a specific ``ICryptoContext`` would couple context lifetime management to resource identifier validity unnecessarily. The desired property is that a resolved persistent ID remains usable as long as the application is connected to the daemon — independent of which context or stack resolved it. Decision -------- The **connection** (the underlying transport connection to the crypto daemon) is the anchor for ``kPersistent`` resource IDs: * All ``ICryptoStack`` instances within the same application share a single connection. Multiple stacks always resolve the same resource name to the same numeric ``CryptoResourceId``. * ID assignment is **on-demand**: a numeric ``id`` is assigned when a resource is first resolved by any stack on the connection. Subsequent resolutions of the same name by any stack return the identical ``id``. * Each application gets its own isolated connection with an independent ID namespace — IDs are not globally stable across application restarts or across different applications. This preserves the security property: IDs are per-application-unique and not externally predictable. * Connection ID tracking is managed by the underlying IPC transport layer as an implementation detail, not exposed to library users. Lifetime Strategy (Deployment Choice) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The connection lifetime is a deployment-level choice between two strategies: 1. **Reference-counted (early cleanup):** Connection is destroyed when the last ``ICryptoStack`` referencing it is destroyed. Daemon-side resources are freed immediately. Suitable when stacks are short-lived and memory footprint must be minimised. 2. **Fixed (application lifetime):** Connection is created once at application startup and destroyed when the application terminates. Daemon resources are held in daemon memory for the application lifetime. Simpler to reason about; preferred when stacks are frequently created and destroyed. Both strategies are implementation details of the transport layer — the public ``ICryptoStack`` and ``ICryptoContext`` API is identical in either case. Cross-application connection sharing is **not allowed**; ``IConnectionAnchor`` remains a private implementation type. Scope ^^^^^ Applies to ``kPersistent`` resources only: ``kKeySlot``, ``kCertificate``, ``kCertSlot``, ``kVerificationTrustStore``. Ephemeral (``kKey``) IDs remain session-scoped (valid only within the ``IKeyManagementContext`` session that produced them). IPC Schema ^^^^^^^^^^ Whether per-connection ID assignment and the registry require new proto messages or can reuse the existing ``CryptoResourceId`` wire format is an open implementation item for the daemon development to resolve before the daemon is built. Consequences ------------ **Positive:** * Independent context lifetime for persistent resources — ``ICryptoContext`` can be destroyed after resolution without invalidating the ``CryptoResourceId``. * Stable, deterministic IDs within an application — same name always resolves to the same ``id`` regardless of which stack or context resolves it. * No security leakage — IDs are per-application-unique, on-demand assigned, not globally stable or predictable from outside the application. * Applications can share resolved ``CryptoResourceId`` values between stacks without re-resolving. **Negative:** * Daemon must maintain a per-connection ID registry (map from resource name to numeric ``id``). Memory cost is bounded by the number of distinct persistent resources accessed per application. * With fixed-lifetime strategy: daemon memory for resolved IDs is held for the entire application lifetime even if the IDs are no longer used. --- ``AlgorithmId`` as ``FixedCapacityString<64>`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: AlgorithmId Represented as FixedCapacityString<64> :id: dec_rec__crypto__alg_id_fixed_cap_str :status: accepted :context: doc__crypto_architecture :decision: ``AlgorithmId`` is defined as ``FixedCapacityString<64>`` rather than a compile-time enum or std::string, providing open-set extensibility with deterministic stack allocation. 64 bytes covers the longest currently-known PQC identifier (e.g., "SLH-DSA-SHA2-128s") with comfortable headroom. All constructors and assignments are noexcept. .. :affects: comp__crypto ``AlgorithmId`` is defined as ``FixedCapacityString<64>`` rather than a compile-time enum or ``std::string``, providing open-set extensibility with deterministic stack allocation. Context ------- Algorithm identifiers must accommodate current algorithms (e.g., ``"AES-256-GCM"``, ``"SHA-256"``, ``"SLH-DSA-SHA2-128s"``), future PQC schemes, and provider-specific extensions — an open set that cannot be enumerated at compile time. Decision -------- Three candidate representations were evaluated: 1. **``enum class AlgorithmId``** — type-safe, zero overhead, but a *closed* set. Adding a new PQC algorithm requires recompiling the library and all callers. Incompatible with the extensibility goal and with runtime-configured providers. 2. **``std::string``** — open set, but heap-allocating. Every ``AlgorithmId`` value creates at least one heap allocation. Violates MISRA A18-5-1, incompatible with ASIL ``noexcept`` destructors, and introduces non-deterministic WCET. 3. **``FixedCapacityString<64>``** — open set, stack-allocated, exception-free (oversized input silently truncated, ``truncated()`` flag set). 64 bytes covers the longest currently-known PQC identifier (``"SLH-DSA-SHA2-128s"`` = 18 chars) with comfortable headroom. All constructors and assignments are ``noexcept``. ``FixedCapacityString<64>`` (option 3) was selected. The same rationale applies to ``ResourceId`` (``FixedCapacityString<64>``) and ``ProviderInfo::name`` (``FixedCapacityString<32>``). Consequences ------------ **Positive:** * Zero heap allocation for any algorithm or resource identifier. * Open set — new PQC algorithms deploy at daemon level; no library recompile needed. * ``noexcept`` constructors and assignments — compatible with ASIL containers. * ``GetAlgorithm()``, ``GetAllowedAlgorithm()``, ``GetPublicKeyAlgorithm()`` are now ``const noexcept``, enabling use in safety-annotated code. **Negative:** * 64-byte fixed storage regardless of actual string length (minor waste for short names). * Silent truncation on overflow — callers must check ``truncated()`` when constructing from untrusted input. * No compile-time algorithm validation — typos become runtime errors. --- Synchronous-Only IPC Model ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Synchronous-Only IPC Model :id: dec_rec__crypto__synchronous_ipc :status: accepted :context: doc__crypto_architecture :decision: All IPC calls between the client library and the crypto daemon are synchronous (blocking). No callback, future, or async-notify pattern is exposed in the V1 public API. .. :affects: comp__crypto All IPC calls between the client library and the crypto daemon are synchronous (blocking). No callback, future, or async-notify pattern is exposed in the V1 public API. Context ------- A synchronous model blocks the calling thread for the full IPC round-trip plus provider execution. The alternatives introduce heap usage, threading complexity, or event-loop coupling that are incompatible with MISRA and automotive middleware requirements. Decision -------- All IPC calls are synchronous (blocking). Asynchronous alternatives were evaluated and rejected: * **Future/Promise API** — ``Result>`` return types, daemon uses a worker-thread pool. This would result in higher code complexity. * **Callback model** — caller provides a ``std::function)>`` invoked on completion. ``std::function`` violates MISRA A18-5-1 (heap). Custom fixed-size delegate types possible but add significant complexity. * **Polling / event-loop** — caller polls a completion token. Couples the crypto API to the application event loop; not idiomatic for automotive middleware. Consequences ------------ **Positive:** * Simple, predictable call semantics — no threading model imposed on callers. * Full ``Result`` error propagation on every call. * MISRA-compliant — no ``std::function``, no heap, no thread creation in library. * Operation timeouts (``dec_rec__crypto__two_layer_timeout``) provide bounded blocking behaviour — the calling thread never blocks indefinitely. **Negative:** * Thread blocks for full operation duration including IPC + provider execution. * Potential priority inversion in RTOS environments with high-priority callers. * Cannot pipeline or batch multiple crypto operations. * Future async API (``GenerateKeyAsync``) must be added as an extension; The daemon-side ref-count model (see ``dec_rec__crypto__cry_res_grd_lifetime``) naturally accommodates async key generation: the daemon can atomically bind the key to the context when the async result is consumed. --- Context ``Reset()`` for Streaming Context Reuse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Context Reset() for Streaming Context Reuse :id: dec_rec__crypto__context_reset_reuse :status: accepted :context: doc__crypto_architecture :decision: ``IStreamingContext`` exposes a ``Reset()`` method returning the context to .. :affects: comp__crypto ``IStreamingContext`` exposes a ``Reset()`` method returning the context to post-construction state after ``Finalize()``, preserving key, algorithm, and config bindings. This avoids repeated factory + IPC overhead for same-configuration repeated operations. Context ------- Streaming contexts (hash, sign, verify, encrypt, decrypt, MAC, AEAD) follow the ``Init()`` → ``Update()`` * → ``Finalize()`` state machine. Without reuse, each new operation requires: (1) ``ICryptoContext::Create*Context()`` IPC call, (2) ``Init()`` IPC call. For high-frequency operations (e.g., per-frame hashing, per-message MAC), this doubles the IPC overhead. Decision -------- ``Reset()`` is added to ``IStreamingContext``. Alternatives evaluated: * **Destroy and re-create** — simplest, but doubles IPC cost per operation. * **``Reset()`` on ``IStreamingContext``** — single IPC call to the daemon to clear provider-side state; key, algorithm, config, and session handle remain valid. State machine transitions: ``kFinalized`` → ``kCreated`` on success. ``kContextResetFailed`` (``0x01070003``) reported on daemon-side failure. * **Context pooling in the library** — a pool of pre-created contexts returned to callers. Adds ~200 LOC of pool management, thread-safety concerns, and a fixed pool bound that may be too large or too small for different deployments. Consequences ------------ **Positive:** * Halves IPC round-trips for high-frequency same-configuration operations. * No API surface change — ``Reset()`` is additive; callers that destroy and re-create continue to work unchanged. * Key, algorithm, and config bindings preserved — no re-configuration needed. * Single additional error code (``kContextResetFailed``) for daemon failure path. **Negative:** * Daemon must track context state and clear provider-side buffers on ``Reset()``. * Callers must ensure ``Finalize()`` is called before ``Reset()``; calling ``Reset()`` in ``kUpdating`` state returns ``kInvalidOperation``. --- Two-Layer Per-Call Timeout with Daemon-Side Enforcement ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Two-Layer Per-Call Timeout with Daemon-Side Enforcement :id: dec_rec__crypto__two_layer_timeout :status: accepted :context: doc__crypto_architecture :decision: Two-layer timeout model selected for bounded behaviour at both layers. .. :affects: comp__crypto Operation timeouts are enforced at two layers — a stack-wide default in ``CryptoStackConfig`` and a per-context override in ``BaseContextConfig`` with an explicit ``DisableTimeout()`` escape — with the daemon as the enforcement point. Client threads unblock on timeout; timed-out contexts transition to ``kError``. Context ------- Without timeouts, a hung HSM or stalled IPC channel blocks the calling thread indefinitely — a safety violation for ISO 26262 ASIL functions. A single global timeout is insufficient because some legitimate operations (e.g., RSA-4096 key generation on software providers) require more time than typical operations. Client-side-only timeouts fail to release daemon-held resources when the deadline expires. Decision -------- Three timeout models were evaluated: 1. **Client-side only (``std::future::wait_for``)** — client unblocks after deadline, but the daemon continues executing the stalled operation, holding the resource. Does not provide bounded daemon resource usage. 2. **Daemon-side only (watchdog thread per context)** — daemon kills stalled operations after timeout and notifies client via error response. Provides bounded resource usage but requires the daemon to maintain one watchdog timer per in-flight context. 3. **Two-layer: client deadline + daemon enforcement** — client passes timeout value on each IPC call; daemon enforces the deadline server-side. If deadline expires, daemon transitions context to error, releases provider resources, and returns ``kOperationTimedOut`` to client. ``DisableTimeout()`` is available for legitimately long operations (e.g., RSA-4096 key generation on software providers). Model 3 was selected for bounded behaviour at both layers. Consequences ------------ **Positive:** * Guaranteed bounded execution — no call blocks indefinitely. * Daemon-side enforcement means provider resources are released on timeout, not just the client thread. * Per-context override allows fine-grained tuning without changing global config. * ``DisableTimeout()`` supports legitimate long-running operations without disabling safety globally. * Satisfies ISO 26262 Part 6 Table 1 (bounded execution time for safety functions). **Negative:** * Daemon must maintain a per-context deadline and cancel infrastructure. * ``DisableTimeout()`` is a safety escape hatch; misuse (e.g., in Safety paths) must be justified in the application's safety case. * Two-layer design adds complexity compared to a single global timeout. --- ``KeyOperationPermission`` as a Capability Bitmask ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: KeyOperationPermission as a Capability Bitmask :id: dec_rec__crypto__key_op_permission_bitmask :status: accepted :context: doc__crypto_architecture :decision: Per-key usage restrictions are encoded as a ``KeyOperationPermission`` bitmask (named bit-flag capabilities: ``kEncrypt``, ``kDecrypt``, ``kSign``, ``kVerify``, ``kDerive``, ``kWrap``, ``kExport``, ``kGenerate``). The daemon enforces the bitmask at context creation time; operations not permitted by the key's bitmask return ``kKeyOperationNotPermitted``. .. :affects: comp__crypto Per-key usage restrictions are encoded as a ``KeyOperationPermission`` bitmask (named bit-flag capabilities: ``kEncrypt``, ``kDecrypt``, ``kSign``, ``kVerify``, ``kDerive``, ``kWrap``, ``kExport``, ``kGenerate``). The daemon enforces the bitmask at context creation time; operations not permitted by the key's bitmask return ``kKeyOperationNotPermitted``. Context ------- A key provisioned for signing must not be usable for encryption or export. Enforcing usage restrictions at the API level prevents misuse even if the caller has a valid resource handle. The representation must be heap-free and extensible to support future operations without breaking existing code. Decision -------- Three designs were evaluated: 1. **``std::unordered_set``** — flexible but heap-allocating. Violates MISRA A18-5-1, incompatible with ASIL ``noexcept`` requirements. 2. **``KeyPermissions`` struct with boolean flags** — type-safe, stack-allocated, but verbose (one field per operation). Extending to a new operation requires a breaking ABI change. 3. **Bitmask (``uint32_t`` or ``enum class`` with ``operator|``)** — compact, heap-free, trivially copyable, extensible (new bits added without breaking existing code). Standard pattern in OS capability models (POSIX, seL4). The bitmask (option 3) was selected for compactness and extensibility. Consequences ------------ **Positive:** * Compact representation — one ``uint32_t`` field in key metadata. * Extensible — new operations add a new named bit; existing bitmasks remain valid. * Daemon enforces at context creation — enforcement point is in the trusted boundary. * Two-layer access control: *who* (``ResolveResource()`` ACL, uid) and *what* (``KeyOperationPermission`` bitmask per key). **Negative:** * Bitmask operations (``&``, ``|``, ``~``) are less type-safe than a method API; callers can accidentally construct invalid combinations. * No runtime check that a bitmask value is a valid combination of named bits (mitigated by providing named constants and ``operator|`` overloads). --- Unified Cipher Context (No Encrypt/Decrypt or Symmetric/Asymmetric Split) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Unified Cipher Context Rather Than Separate Encrypt/Decrypt Plus Symmetric/Asymmetric Split :id: dec_rec__crypto__unified_enc_dec_contexts :status: accepted :context: doc__crypto_architecture :decision: A single ``ICipherContext`` type is used for both encryption and decryption, with direction configured via ``CipherContextConfig``. Sign, verify, MAC, and AEAD are each separate context types under the ``IStreamingContext`` hierarchy. The algorithm identifier and key type determine whether the operation is symmetric or asymmetric at runtime. Separate ``SymmetricEncryptContext`` / ``AsymmetricEncryptContext`` types were rejected for increased complexity without benefit. .. :affects: comp__crypto Encrypt and decrypt share a single ``ICipherContext`` (direction set via ``CipherContextConfig``). Sign, verify, MAC, and AEAD are each a separate context type under the ``IStreamingContext`` hierarchy. The algorithm identifier and key type determine whether the operation is symmetric or asymmetric at runtime. Separate ``SymmetricEncryptContext`` / ``AsymmetricEncryptContext`` types were rejected. Context ------- A split design creates a separate class per algorithm family. Adding PQC KEM support requires a new class even if the streaming interface is identical. This increases include weight and virtual dispatch hierarchy depth without benefit, and makes the API surface grow with the algorithm set. Decision -------- Three designs were evaluated: 1. **Symmetric/Asymmetric split** — mirrors algorithm families at the type level. Callers must select the correct context type; runtime algorithm selection within a family is still possible. Adds N context types per new algorithm family. Increases include weight and virtual dispatch hierarchy depth. 2. **Unified context per operation** — ``ICipherContext`` (encrypt + decrypt unified), ``ISignContext``, ``IVerifySignatureContext``, ``IMacContext``, ``IAeadContext``. Algorithm and key type determine behaviour. Adding ML-KEM or ML-DSA requires no new context type — only a new algorithm name string and provider support. 3. **Single universal context** — one ``ICryptoOperationContext`` with a mode parameter. Rejects the SRP; makes misuse easier (e.g., calling ``Sign()`` on a context configured for encryption). Rejected for API clarity reasons. Design 2 was selected for the balance of clarity and extensibility. Consequences ------------ **Positive:** * Flat, stable hierarchy — five concrete context types regardless of algorithm count. * PQC extensibility — ML-KEM, ML-DSA, SLH-DSA require no new context types. * Each context enforces its own state machine; misuse (e.g., ``Update()`` before ``Init()``) returns a typed error regardless of algorithm. * Consistent ``Init()`` / ``Update()`` / ``Finalize()`` / ``Reset()`` pattern across all operations simplifies caller code. **Negative:** * Algorithm-specific overloads (e.g., ``ICipherContext::Init(iv)`` vs. ``IStreamingContext::Init()``) require ``using`` declarations to suppress name-hiding warnings (see §3.5 in the evaluation report). * AEAD tag handling (``SetTag()`` / ``GetTag()``) cannot be expressed identically to non-AEAD operations — ``IAeadContext`` requires additional methods. --- ``IMemoryAllocator`` Separated from ``ICryptoStack`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: IMemoryAllocator Separated from ICryptoStack :id: dec_rec__crypto__memory_allocator_separation :status: accepted :context: doc__crypto_architecture :decision: ``IMemoryAllocator`` is a standalone interface independent of ``ICryptoStack``. It represents the data plane, can be injected into components that need only memory management, and enables isolated unit testing — none of which are possible if it is coupled to the control-plane ``ICryptoStack``. .. :affects: comp__crypto ``IMemoryAllocator`` is a standalone interface independent of ``ICryptoStack``. It represents the data plane, can be injected into components that need only memory management, and enables isolated unit testing — none of which are possible if it is coupled to the control-plane ``ICryptoStack``. Context ------- The crypto module uses shared memory as the zero-copy data plane between the application and the crypto daemon. A naive design would expose memory allocation directly through ``ICryptoStack``, coupling the data plane to the control-plane IPC object. Three reasons motivated a separate interface: 1. **Architectural independence of data plane and control plane** — the memory subsystem operates independently of IPC: buffers can be allocated, written, and passed to providers without any IPC call. Coupling allocation to ``ICryptoStack`` would obscure this separation. 2. **Independent injection** — components that need only memory management (e.g., a buffer pool, a serialiser) can receive an ``IMemoryAllocator`` without depending on the full ``ICryptoStack`` interface. This is consistent with the Interface Segregation Principle and reduces unnecessary coupling in the component graph. 3. **Isolated unit testing** — by taking ``IMemoryAllocator`` as a dependency, individual components can be tested with a mock or stub allocator without standing up an ``ICryptoStack`` or a daemon connection. Decision -------- ``IMemoryAllocator`` is defined as a standalone interface independent of ``ICryptoStack``. Applications obtain both objects separately; the memory allocator is the data plane and the crypto stack is the control plane. Cross-application connection sharing is not supported; each application has its own allocator instance. Consequences ------------ **Positive:** * Data-plane and control-plane concerns are visibly separated in the API. * Components that allocate buffers do not depend on ``ICryptoStack``. * Unit tests for memory-dependent components are cheaper and hermetic. * The zero-copy path (``kProviderCompatible`` allocation) is a data-plane concern and sits cleanly on ``IMemoryAllocator`` without polluting ``ICryptoStack``. **Negative:** * Applications must obtain and manage two objects (``ICryptoStack`` + ``IMemoryAllocator``) where a monolithic interface would require only one. --- Control Plane IPC Boundary Copy with Daemon-Internal Zero-Copy References ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Control Plane IPC Boundary Copy with Daemon-Internal Zero-Copy References :id: dec_rec__crypto__ipc_boundary_copy :status: accepted :context: doc__crypto_architecture :decision: The control plane IPC layer on the daemon side shall create exactly one owning copy of all incoming request data at the deserialization boundary. All subsequent daemon-internal processing shall operate on non-owning references into that single owned copy. This prevents time-of-check-time-of-use (TOCTOU) attacks on control information while minimising memory copies within the daemon. The data plane is explicitly excluded — it carries only opaque data payloads and may use zero-copy transfer. The control plane IPC layer on the daemon side shall create exactly one owning copy of all incoming request data at the deserialization boundary. All subsequent daemon-internal processing shall operate on non-owning references into that single owned copy. This prevents time-of-check-time-of-use (TOCTOU) attacks on control information while minimising memory copies within the daemon. The data plane is explicitly excluded — it carries only opaque data payloads and may use zero-copy transfer. Context ------- This decision applies to the **control plane** only — the channel carrying operation requests, responses, and their parameters (key identifiers, algorithm names, operation codes, session IDs, in-band data). The **data plane** is **out of scope** and may use zero-copy (e.g., shared memory). The data plane carries only opaque payloads (plaintext, ciphertext) — not control information that the daemon validates or routes upon — so TOCTOU modification cannot cause the daemon to mis-route an operation, use the wrong key, or bypass access control. At worst the provider processes corrupted input, equivalent to the client submitting bad data in the first place. Control plane requests carry parameters from an untrusted client. The daemon must decide whether to work directly from the transport buffer or copy first. Consequences ------------ **Positive:** * Eliminates TOCTOU on control information. * One copy at ingress, zero through the handler chain, one at egress. * Clear ownership — the request structure owns; all handlers borrow. * Data plane stays zero-copy for bulk payloads. **Negative:** * Mandatory copy per control plane request, even when the transport buffer is safe (deliberate performance-for-security trade-off; overhead is small since the control plane carries only metadata and small in-band buffers). * Two representations needed in the protocol types: owning types at the IPC boundary, non-owning views internally. Alternatives Considered ----------------------- 1. **Zero-copy end-to-end** — read directly from the transport buffer. Vulnerable to TOCTOU: a client could swap a key ID or algorithm name between validation and use. 2. **Copy at every layer boundary** — excessive allocation; e.g., three copies of the same hash input across IPC adapter, mediator, and provider. 3. **Single copy at the control plane IPC boundary, references thereafter** — the deserialization layer copies all mutable parameters into daemon-owned memory. Downstream layers receive const references and use non-owning views. Strategy 3 was selected. Justification for the Decision ------------------------------ Control information determines *which* operation runs, *with which key*, *using which algorithm*. If the daemon reads this from a buffer still writable by the client, the client can mutate it between validation and use (TOCTOU). A single owning copy at the IPC boundary makes the request immutable from the client's perspective. Non-owning references within the daemon are safe because: * The owned request data outlives the entire synchronous processing chain. * All downstream handlers receive it as const. * Processing is single-threaded per request. The data plane does not carry control information, so TOCTOU cannot redirect operations — zero-copy is safe there by design. --- Per-Operation Parameter Structs with Dual Overloads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. dec_rec:: Per-Operation Parameter Structs with Dual Overloads :id: dec_rec__crypto__per_op_params :status: proposed :context: doc__crypto_architecture :decision: Key management operations (``GenerateKey``, ``DeriveKey``, ``AgreeKey``, ``UnwrapKey``, ``ImportKey``, ``WrapKey``) are encapsulated in dedicated fluent-builder parameter structs. Dual overloads support both ephemeral keys (returning ``CryptoResourceGuard``) and direct-to-slot writes (returning ``bool``). KDF configuration is represented as a structured ``KdfParameters`` type rather than opaque byte spans, providing type safety and extensibility. .. :affects: comp__crypto Each key management operation accepts parameters via a dedicated fluent-builder struct (``GenerateKeyParams``, ``DeriveKeyParams``, ``AgreeKeyParams``, ``WrapKeyParams``, ``UnwrapKeyParams``, ``ImportKeyParams``). Dual overloads support both ephemeral and persistent slot targets. KDF configuration is expressed as a structured ``KdfParameters`` type containing typed fields for all supported KDFs. Context ------- Key operations have complex parameter needs: algorithm selection, permission bitmasks, exportability flags, KDF configuration, IV, AAD, wrapping algorithm, peer public key data, and format specifiers. These parameters must be passed safely and extensibly to support both ephemeral keys (returned to the caller) and direct-to-slot writes (persisted in the daemon). Decision -------- Six dedicated parameter structs are provided: * ``GenerateKeyParams`` — algorithm, permissions, slot size * ``DeriveKeyParams`` — algorithm, permissions, KDF config, salt, label * ``AgreeKeyParams`` — peer public key, algorithm, permissions * ``WrapKeyParams`` — wrapping algorithm, IV, AAD * ``UnwrapKeyParams`` — format specifier, permissions * ``ImportKeyParams`` — algorithm, permissions, format specifier Each struct is a fluent builder with named setters (``SetAlgorithm()``, ``SetPermissions()``, etc.), enabling readable call sites. Dual overloads are provided for all key-producing operations: * **Ephemeral overload**: ``Result XxxKey(const XxxxKeyParams&)`` * **Persistent overload**: ``Result XxxKey(const CryptoResourceId& target_slot, const XxxxKeyParams&)`` The ``target_slot`` parameter is always first in slot-targeting overloads, consistent with ``PersistKey(target_slot, ephemeral_key)``. KDF configuration is replaced with a structured ``KdfParameters`` struct containing typed fields (salt, label, iteration count, output length) for all supported KDFs: HKDF, TLS 1.2 PRF, TLS 1.3 HKDF, PBKDF2, SP800-108. Opaque byte spans are no longer used for KDF parameters. Alternatives Considered ----------------------- Single Fat Config Struct ^^^^^^^^^^^^^^^^^^^^^^^^ One ``KeyOperationConfig`` for all operations, with optional fields for each mode. This is rejected: most fields are unused for any given operation, creating confusion and enabling invalid parameter combinations at compile time. A per-operation struct enforces that only valid parameters are set. Separate Named Methods ^^^^^^^^^^^^^^^^^^^^^^ Methods like ``GenerateKeyToSlot()`` instead of overloads. This is rejected: it doubles the API surface without adding clarity. Overloads are distinguished by return type (``CryptoResourceGuard`` vs ``bool``) and by the presence of ``target_slot`` as the first parameter, providing clear intent. Builder Pattern on Context ^^^^^^^^^^^^^^^^^^^^^^^^^^^ A fluent chain on the context object (e.g., ``key_mgmt->Generate().Algorithm("AES-256").Execute()``). This is rejected: it requires runtime validation of missing required fields. A params-struct approach catches missing required fields at compile time via member initialization (the daemon performs final validation). Consequences ------------ **Positive:** * Named fields and fluent builders eliminate parameter-order confusion and enable readable call sites with self-documenting intent. * Adding new optional fields to a params struct is non-breaking; existing callers continue to compile unchanged. * Structured ``KdfParameters`` provides compile-time type safety for KDF configuration (salt, label, iteration count) where opaque byte spans did not. * Dual overloads cleanly separate ephemeral key creation (RAII guard return) from persistent slot writes (boolean return), with consistent calling convention. * Span fields (peer public key, wrapped data, import key data, IV, AAD) reference caller-owned memory — zero-copy for large buffers (PQC public keys can reach 1–2 KB). **Negative:** * Six additional parameter struct types increase compilation includes if not forward-declared. * Callers must construct a params struct even for simple operations: ``GenerateKeyParams{}.SetAlgorithm("AES-256")`` is more verbose than a single factory call with a string argument. * Span fields in params structs have lifetime constraints — referenced data must outlive the struct. This is documented but not enforced at compile time.