Design Decisions#

ABI Compatibility for IPC Layer (Deferred Post-Stabilisation)#

ABI Compatibility for IPC Layer (Deferred Post-Stabilisation)
status: proposed
context: doc__crypto_architecture
decision: ABI compatibility for the IPC layer between the crypto library and daemon is deferred until the API and wire format are stable. Once stable, a versioned wire format (FlatBuffers is the primary candidate) will be introduced to allow independent deployment of library and daemon versions.

ABI compatibility for the IPC layer between the crypto library and daemon is deferred until the API and wire format are stable. Once stable, a versioned wire format (FlatBuffers is the primary candidate) will be introduced to allow independent deployment of library and daemon versions.

Context#

The crypto module uses an IPC layer to communicate between the client library and the daemon process. During initial development, maintaining strict ABI compatibility would impose versioning overhead (protocol negotiation, backward/forward compatibility handling) before the interfaces are settled. The decision is therefore to defer ABI compatibility until the stack is stable, at which point the investment is justified.

Once the API and wire format are frozen, the following will be required:

  • A versioned wire format with forward/backward compatibility guarantees

  • Protocol negotiation mechanisms during connection establishment

  • Support for a defined compatibility window (e.g., N−1 daemon with N library)

  • Regression testing across supported version combinations

Decision#

ABI compatibility for the IPC layer is intentionally deferred for the initial pre-stable phase. Per machine, only one valid library-daemon combination is supported until the API stabilises. After stabilisation, a versioned wire format will be introduced — this decision will be revisited and finalised at that point.

Consequences#

Positive:

  • Simplified implementation during pre-stable phase — no protocol versioning overhead

  • Reduced testing surface — only matching version pairs need validation

  • Faster initial development cycle — breaking changes can be made freely

  • Clearer deployment model for early adopters — single version per machine

Negative:

  • Library and daemon must currently be updated together as a single unit

  • Cannot have multiple applications using different library versions on the same machine

  • No graceful degradation when versions mismatch during the deferred phase

Alternatives Considered#

FlatBuffers#

FlatBuffers is the primary candidate for the versioned wire format once stabilisation is reached.

Advantages#
  • Schema evolution — fields can be added with default values without breaking existing serialised data; removed fields leave a gap that is safely skipped.

  • Zero-copy access — FlatBuffers tables are accessed in-place from the shared-memory buffer, aligning with the IMemoryAllocator zero-copy design.

  • Deterministic layout — table format is fully specified; no hidden heap allocation during access.

  • Compact code generation — generated C++ headers are noexcept-friendly and free of std::function or std::string members.

Disadvantages#
  • Final choice is deferred; other candidates (Protocol Buffers, Cap’n Proto, hand-rolled length-prefixed structs) are not excluded.

Justification for the Decision#

The decision to defer ABI compatibility is justified by the current pre-stable phase of development. Introducing versioning infrastructure before the interfaces are settled would add maintenance burden without benefit. The FlatBuffers candidate and this record preserve the intent and analysis so that the versioning work can proceed efficiently once the stack stabilises.

CryptoResourceGuard Lifetime via Daemon-Side Reference Counting#

CryptoResourceGuard Lifetime via Daemon-Side Reference Counting
status: accepted
context: doc__crypto_architecture
decision: Transient key lifetime is managed exclusively in the daemon via reference counting. The guard holds a type-erased IPC release handle; Create*Context() increments the daemon ref-count atomically. The guard may be destroyed after Create*Context() returns. On client disconnect, the daemon bulk-frees all resources for that client.

Transient key lifetime is managed exclusively in the daemon. The daemon ref-counts every ephemeral key; the client communicates changes via two IPC calls: Release(id) (guard destructor) and Create*Context(config with key_id) (context creation).

Context#

Transient crypto resources (keys, certificates) produced within an IKeyManagementContext or ICertificateManagementContext session are represented by a CryptoResourceGuard. Key lifetime must survive all contexts actively using the key and be freed deterministically when neither a guard nor a context holds a reference.

Decision#

Key lifetime is managed in the daemon via a per-key reference count:

Event

Daemon action

GenerateKey / DeriveKey / LoadKey / etc.

Creates key; ref = 1

Create*Context(config with key_id)

Validates key alive; ref++

Guard destroyed (Release(id) IPC)

ref–; free key if ref == 0

Context destroyed

ref– for each bound key

Client disconnect (crash or normal exit)

Daemon bulk-frees all resources for that client

This means:

  • The guard carries only the CryptoResourceId and a type-erased IPC release handle (shared_ptr<void> internally).

  • Create*Context() sends a single IPC call that validates the key and atomically records context ownership. If the guard was released before this call, the daemon returns kResourceNotFound — fail-fast, diagnosable behaviour.

  • Crash safety: on hard process termination (SIGKILL, power loss), no destructors run. The daemon detects the client disconnect and bulk-frees all resources. Daemon-side ref-counting is crash-safe by design.

  • BaseContextConfig carries only a CryptoResourceId for the key.

Application contract: The guard must remain alive (IsActive() returns true) at the moment Create*Context() is called. After that call returns successfully, the guard may be destroyed in any order relative to the context.

auto key = key_mgmt->GenerateKey("AES-256").value();
CipherContextConfig config;
config.SetAlgorithm("AES-256-GCM").SetKey(key).SetDirection(CipherDirection::kEncrypt);
// guard must be alive here:
auto cipher = ctx->CreateCipherContext(config).value();
// Daemon has incremented the key ref-count. Guard may now be destroyed.
// ... use cipher ...

Explicit Release and Guard Synchronisation#

The daemon is the sole source of truth for resource validity. The guard’s active_ flag is a client-side hint, not a daemon query.

Normal path — destructor (no application code needed):

The guard destructor sends Release(id) IPC when active, silently swallowing the result (destructors cannot propagate errors).

Explicit path — synchronous error handling:

When the application needs to explicitly confirm release before destruction, call guard.Release():

auto result = guard.Release();  // explicit, returns Result<std::monostate>
if (result.has_value()) {
    // no-op destructor
}

Persist path — copy semantics:

IKeyManagementContext::PersistKey(const CryptoResourceId&, slot) takes the ephemeral key by ID (copy semantics). The guard that produced the ID remains active after PersistKey returns. The ephemeral copy continues to exist until the guard is released or goes out of scope; the persisted slot copy is independent:

auto key = key_mgmt->GenerateKey((GenerateKeyParams{}.SetAlgorithm("AES-256"))).value();
key_mgmt->PersistKey(slot, key).value();  // key (guard) remains active
// 'key' still holds the ephemeral copy — explicit Release() or destructor frees it
// 'slot' is the persistent copy — use slot handle for all future durable operations

SetKey and Implicit Guard Conversion#

CryptoResourceGuard provides an implicit conversion operator to const CryptoResourceId&. All key-accepting config structs expose a single SetKey(const CryptoResourceId& k) overload. Passing a guard works directly via this conversion — no extra overload is needed:

auto key = key_mgmt->GenerateKey((GenerateKeyParams{}.SetAlgorithm("AES-256"))).value();
// guard must be alive when CreateCipherContext is called:
auto cipher = ctx->CreateCipherContext(
    CipherContextConfig{}.SetAlgorithm("AES-256-GCM").SetKey(key)
                          .SetDirection(CipherDirection::kEncrypt)).value();
// Daemon incremented key ref-count. Guard may now be independently destroyed.

cipher->Init(iv);
cipher->Finalize(out_span);

Consequences#

Positive:

  • Single source of truth for key lifetime: the daemon. No client-side shared_ptr propagation across API boundaries.

  • Crash-safe: daemon bulk-frees on disconnect regardless of client destructor state.

  • Fail-fast: releasing a guard before Create*Context is called returns kResourceNotFound — diagnosable, deterministic behaviour.

  • BaseContextConfig carries only CryptoResourceId — minimal and flat.

Negative:

  • Guards must remain alive until Create*Context() returns — an intuitive but explicit application contract.

  • Daemon must maintain per-key ref-counts for all active ephemeral keys. Memory cost is bounded by max concurrent keys (deployment-time constant).

Shared-Connection Anchor for Persistent Resource ID Stability#

Shared-Connection Anchor for Persistent Resource ID Stability
status: accepted
context: doc__crypto_architecture
decision: The IPC connection to the crypto daemon is the anchor for kPersistent resource IDs. All ICryptoStack instances within the same application share a single connection, ensuring that the same resource name always resolves to the same numeric CryptoResourceId regardless of which stack or context resolves it. ID assignment is on-demand per connection, and each application gets its own isolated connection with an independent ID namespace.

The application-level daemon connection is a shared anchor across all ICryptoStack instances within the same application. Persistent resource IDs (kPersistent) are assigned on-demand per connection and remain stable for the connection’s lifetime, independent of any individual ICryptoContext.

Context#

Persistent resources (key slots, stored certificates, CRLs) outlive any individual context or session. Tying the validity of their CryptoResourceId handle to the lifetime of a specific ICryptoContext would couple context lifetime management to resource identifier validity unnecessarily. The desired property is that a resolved persistent ID remains usable as long as the application is connected to the daemon — independent of which context or stack resolved it.

Decision#

The connection (the underlying transport connection to the crypto daemon) is the anchor for kPersistent resource IDs:

  • All ICryptoStack instances within the same application share a single connection. Multiple stacks always resolve the same resource name to the same numeric CryptoResourceId.

  • ID assignment is on-demand: a numeric id is assigned when a resource is first resolved by any stack on the connection. Subsequent resolutions of the same name by any stack return the identical id.

  • Each application gets its own isolated connection with an independent ID namespace — IDs are not globally stable across application restarts or across different applications. This preserves the security property: IDs are per-application-unique and not externally predictable.

  • Connection ID tracking is managed by the underlying IPC transport layer as an implementation detail, not exposed to library users.

Lifetime Strategy (Deployment Choice)#

The connection lifetime is a deployment-level choice between two strategies:

  1. Reference-counted (early cleanup): Connection is destroyed when the last ICryptoStack referencing it is destroyed. Daemon-side resources are freed immediately. Suitable when stacks are short-lived and memory footprint must be minimised.

  2. Fixed (application lifetime): Connection is created once at application startup and destroyed when the application terminates. Daemon resources are held in daemon memory for the application lifetime. Simpler to reason about; preferred when stacks are frequently created and destroyed.

Both strategies are implementation details of the transport layer — the public ICryptoStack and ICryptoContext API is identical in either case. Cross-application connection sharing is not allowed; IConnectionAnchor remains a private implementation type.

Scope#

Applies to kPersistent resources only: kKeySlot, kCertificate, kCertSlot, kVerificationTrustStore. Ephemeral (kKey) IDs remain session-scoped (valid only within the IKeyManagementContext session that produced them).

IPC Schema#

Whether per-connection ID assignment and the registry require new proto messages or can reuse the existing CryptoResourceId wire format is an open implementation item for the daemon development to resolve before the daemon is built.

Consequences#

Positive:

  • Independent context lifetime for persistent resources — ICryptoContext can be destroyed after resolution without invalidating the CryptoResourceId.

  • Stable, deterministic IDs within an application — same name always resolves to the same id regardless of which stack or context resolves it.

  • No security leakage — IDs are per-application-unique, on-demand assigned, not globally stable or predictable from outside the application.

  • Applications can share resolved CryptoResourceId values between stacks without re-resolving.

Negative:

  • Daemon must maintain a per-connection ID registry (map from resource name to numeric id). Memory cost is bounded by the number of distinct persistent resources accessed per application.

  • With fixed-lifetime strategy: daemon memory for resolved IDs is held for the entire application lifetime even if the IDs are no longer used.

AlgorithmId as FixedCapacityString<64>#

AlgorithmId Represented as FixedCapacityString<64>
status: accepted
context: doc__crypto_architecture
decision: ``AlgorithmId`` is defined as ``FixedCapacityString<64>`` rather than a compile-time enum or std::string, providing open-set extensibility with deterministic stack allocation. 64 bytes covers the longest currently-known PQC identifier (e.g., "SLH-DSA-SHA2-128s") with comfortable headroom. All constructors and assignments are noexcept.

AlgorithmId is defined as FixedCapacityString<64> rather than a compile-time enum or std::string, providing open-set extensibility with deterministic stack allocation.

Context#

Algorithm identifiers must accommodate current algorithms (e.g., "AES-256-GCM", "SHA-256", "SLH-DSA-SHA2-128s"), future PQC schemes, and provider-specific extensions — an open set that cannot be enumerated at compile time.

Decision#

Three candidate representations were evaluated:

  1. ``enum class AlgorithmId`` — type-safe, zero overhead, but a closed set. Adding a new PQC algorithm requires recompiling the library and all callers. Incompatible with the extensibility goal and with runtime-configured providers.

  2. ``std::string`` — open set, but heap-allocating. Every AlgorithmId value creates at least one heap allocation. Violates MISRA A18-5-1, incompatible with ASIL noexcept destructors, and introduces non-deterministic WCET.

  3. ``FixedCapacityString<64>`` — open set, stack-allocated, exception-free (oversized input silently truncated, truncated() flag set). 64 bytes covers the longest currently-known PQC identifier ("SLH-DSA-SHA2-128s" = 18 chars) with comfortable headroom. All constructors and assignments are noexcept.

FixedCapacityString<64> (option 3) was selected. The same rationale applies to ResourceId (FixedCapacityString<64>) and ProviderInfo::name (FixedCapacityString<32>).

Consequences#

Positive:

  • Zero heap allocation for any algorithm or resource identifier.

  • Open set — new PQC algorithms deploy at daemon level; no library recompile needed.

  • noexcept constructors and assignments — compatible with ASIL containers.

  • GetAlgorithm(), GetAllowedAlgorithm(), GetPublicKeyAlgorithm() are now const noexcept, enabling use in safety-annotated code.

Negative:

  • 64-byte fixed storage regardless of actual string length (minor waste for short names).

  • Silent truncation on overflow — callers must check truncated() when constructing from untrusted input.

  • No compile-time algorithm validation — typos become runtime errors.

Synchronous-Only IPC Model#

Synchronous-Only IPC Model
status: accepted
context: doc__crypto_architecture
decision: All IPC calls between the client library and the crypto daemon are synchronous (blocking). No callback, future, or async-notify pattern is exposed in the V1 public API.

All IPC calls between the client library and the crypto daemon are synchronous (blocking). No callback, future, or async-notify pattern is exposed in the V1 public API.

Context#

A synchronous model blocks the calling thread for the full IPC round-trip plus provider execution. The alternatives introduce heap usage, threading complexity, or event-loop coupling that are incompatible with MISRA and automotive middleware requirements.

Decision#

All IPC calls are synchronous (blocking). Asynchronous alternatives were evaluated and rejected:

  • Future/Promise APIResult<std::future<T>> return types, daemon uses a worker-thread pool. This would result in higher code complexity.

  • Callback model — caller provides a std::function<void(Result<T>)> invoked on completion. std::function violates MISRA A18-5-1 (heap). Custom fixed-size delegate types possible but add significant complexity.

  • Polling / event-loop — caller polls a completion token. Couples the crypto API to the application event loop; not idiomatic for automotive middleware.

Consequences#

Positive:

  • Simple, predictable call semantics — no threading model imposed on callers.

  • Full Result<T> error propagation on every call.

  • MISRA-compliant — no std::function, no heap, no thread creation in library.

  • Operation timeouts (dec_rec__crypto__two_layer_timeout) provide bounded blocking behaviour — the calling thread never blocks indefinitely.

Negative:

  • Thread blocks for full operation duration including IPC + provider execution.

  • Potential priority inversion in RTOS environments with high-priority callers.

  • Cannot pipeline or batch multiple crypto operations.

  • Future async API (GenerateKeyAsync) must be added as an extension; The daemon-side ref-count model (see dec_rec__crypto__cry_res_grd_lifetime) naturally accommodates async key generation: the daemon can atomically bind the key to the context when the async result is consumed.

Context Reset() for Streaming Context Reuse#

Context Reset() for Streaming Context Reuse
status: accepted
context: doc__crypto_architecture
decision: ``IStreamingContext`` exposes a ``Reset()`` method returning the context to

IStreamingContext exposes a Reset() method returning the context to post-construction state after Finalize(), preserving key, algorithm, and config bindings. This avoids repeated factory + IPC overhead for same-configuration repeated operations.

Context#

Streaming contexts (hash, sign, verify, encrypt, decrypt, MAC, AEAD) follow the Init()Update() * → Finalize() state machine. Without reuse, each new operation requires: (1) ICryptoContext::Create*Context() IPC call, (2) Init() IPC call. For high-frequency operations (e.g., per-frame hashing, per-message MAC), this doubles the IPC overhead.

Decision#

Reset() is added to IStreamingContext. Alternatives evaluated:

  • Destroy and re-create — simplest, but doubles IPC cost per operation.

  • ``Reset()`` on ``IStreamingContext`` — single IPC call to the daemon to clear provider-side state; key, algorithm, config, and session handle remain valid. State machine transitions: kFinalizedkCreated on success. kContextResetFailed (0x01070003) reported on daemon-side failure.

  • Context pooling in the library — a pool of pre-created contexts returned to callers. Adds ~200 LOC of pool management, thread-safety concerns, and a fixed pool bound that may be too large or too small for different deployments.

Consequences#

Positive:

  • Halves IPC round-trips for high-frequency same-configuration operations.

  • No API surface change — Reset() is additive; callers that destroy and re-create continue to work unchanged.

  • Key, algorithm, and config bindings preserved — no re-configuration needed.

  • Single additional error code (kContextResetFailed) for daemon failure path.

Negative:

  • Daemon must track context state and clear provider-side buffers on Reset().

  • Callers must ensure Finalize() is called before Reset(); calling Reset() in kUpdating state returns kInvalidOperation.

Two-Layer Per-Call Timeout with Daemon-Side Enforcement#

Two-Layer Per-Call Timeout with Daemon-Side Enforcement
status: accepted
context: doc__crypto_architecture
decision: Two-layer timeout model selected for bounded behaviour at both layers.

Operation timeouts are enforced at two layers — a stack-wide default in CryptoStackConfig and a per-context override in BaseContextConfig with an explicit DisableTimeout() escape — with the daemon as the enforcement point. Client threads unblock on timeout; timed-out contexts transition to kError.

Context#

Without timeouts, a hung HSM or stalled IPC channel blocks the calling thread indefinitely — a safety violation for ISO 26262 ASIL functions. A single global timeout is insufficient because some legitimate operations (e.g., RSA-4096 key generation on software providers) require more time than typical operations. Client-side-only timeouts fail to release daemon-held resources when the deadline expires.

Decision#

Three timeout models were evaluated:

  1. Client-side only (``std::future::wait_for``) — client unblocks after deadline, but the daemon continues executing the stalled operation, holding the resource. Does not provide bounded daemon resource usage.

  2. Daemon-side only (watchdog thread per context) — daemon kills stalled operations after timeout and notifies client via error response. Provides bounded resource usage but requires the daemon to maintain one watchdog timer per in-flight context.

  3. Two-layer: client deadline + daemon enforcement — client passes timeout value on each IPC call; daemon enforces the deadline server-side. If deadline expires, daemon transitions context to error, releases provider resources, and returns kOperationTimedOut to client. DisableTimeout() is available for legitimately long operations (e.g., RSA-4096 key generation on software providers).

Model 3 was selected for bounded behaviour at both layers.

Consequences#

Positive:

  • Guaranteed bounded execution — no call blocks indefinitely.

  • Daemon-side enforcement means provider resources are released on timeout, not just the client thread.

  • Per-context override allows fine-grained tuning without changing global config.

  • DisableTimeout() supports legitimate long-running operations without disabling safety globally.

  • Satisfies ISO 26262 Part 6 Table 1 (bounded execution time for safety functions).

Negative:

  • Daemon must maintain a per-context deadline and cancel infrastructure.

  • DisableTimeout() is a safety escape hatch; misuse (e.g., in Safety paths) must be justified in the application’s safety case.

  • Two-layer design adds complexity compared to a single global timeout.

KeyOperationPermission as a Capability Bitmask#

KeyOperationPermission as a Capability Bitmask
status: accepted
context: doc__crypto_architecture
decision: Per-key usage restrictions are encoded as a ``KeyOperationPermission`` bitmask (named bit-flag capabilities: ``kEncrypt``, ``kDecrypt``, ``kSign``, ``kVerify``, ``kDerive``, ``kWrap``, ``kExport``, ``kGenerate``). The daemon enforces the bitmask at context creation time; operations not permitted by the key's bitmask return ``kKeyOperationNotPermitted``.

Per-key usage restrictions are encoded as a KeyOperationPermission bitmask (named bit-flag capabilities: kEncrypt, kDecrypt, kSign, kVerify, kDerive, kWrap, kExport, kGenerate). The daemon enforces the bitmask at context creation time; operations not permitted by the key’s bitmask return kKeyOperationNotPermitted.

Context#

A key provisioned for signing must not be usable for encryption or export. Enforcing usage restrictions at the API level prevents misuse even if the caller has a valid resource handle. The representation must be heap-free and extensible to support future operations without breaking existing code.

Decision#

Three designs were evaluated:

  1. ``std::unordered_set<KeyOperation>`` — flexible but heap-allocating. Violates MISRA A18-5-1, incompatible with ASIL noexcept requirements.

  2. ``KeyPermissions`` struct with boolean flags — type-safe, stack-allocated, but verbose (one field per operation). Extending to a new operation requires a breaking ABI change.

  3. Bitmask (``uint32_t`` or ``enum class`` with ``operator|``) — compact, heap-free, trivially copyable, extensible (new bits added without breaking existing code). Standard pattern in OS capability models (POSIX, seL4).

The bitmask (option 3) was selected for compactness and extensibility.

Consequences#

Positive:

  • Compact representation — one uint32_t field in key metadata.

  • Extensible — new operations add a new named bit; existing bitmasks remain valid.

  • Daemon enforces at context creation — enforcement point is in the trusted boundary.

  • Two-layer access control: who (ResolveResource() ACL, uid) and what (KeyOperationPermission bitmask per key).

Negative:

  • Bitmask operations (&, |, ~) are less type-safe than a method API; callers can accidentally construct invalid combinations.

  • No runtime check that a bitmask value is a valid combination of named bits (mitigated by providing named constants and operator| overloads).

Unified Cipher Context (No Encrypt/Decrypt or Symmetric/Asymmetric Split)#

Unified Cipher Context Rather Than Separate Encrypt/Decrypt Plus Symmetric/Asymmetric Split
status: accepted
context: doc__crypto_architecture
decision: A single ``ICipherContext`` type is used for both encryption and decryption, with direction configured via ``CipherContextConfig``. Sign, verify, MAC, and AEAD are each separate context types under the ``IStreamingContext`` hierarchy. The algorithm identifier and key type determine whether the operation is symmetric or asymmetric at runtime. Separate ``SymmetricEncryptContext`` / ``AsymmetricEncryptContext`` types were rejected for increased complexity without benefit.

Encrypt and decrypt share a single ICipherContext (direction set via CipherContextConfig). Sign, verify, MAC, and AEAD are each a separate context type under the IStreamingContext hierarchy. The algorithm identifier and key type determine whether the operation is symmetric or asymmetric at runtime. Separate SymmetricEncryptContext / AsymmetricEncryptContext types were rejected.

Context#

A split design creates a separate class per algorithm family. Adding PQC KEM support requires a new class even if the streaming interface is identical. This increases include weight and virtual dispatch hierarchy depth without benefit, and makes the API surface grow with the algorithm set.

Decision#

Three designs were evaluated:

  1. Symmetric/Asymmetric split — mirrors algorithm families at the type level. Callers must select the correct context type; runtime algorithm selection within a family is still possible. Adds N context types per new algorithm family. Increases include weight and virtual dispatch hierarchy depth.

  2. Unified context per operationICipherContext (encrypt + decrypt unified), ISignContext, IVerifySignatureContext, IMacContext, IAeadContext. Algorithm and key type determine behaviour. Adding ML-KEM or ML-DSA requires no new context type — only a new algorithm name string and provider support.

  3. Single universal context — one ICryptoOperationContext with a mode parameter. Rejects the SRP; makes misuse easier (e.g., calling Sign() on a context configured for encryption). Rejected for API clarity reasons.

Design 2 was selected for the balance of clarity and extensibility.

Consequences#

Positive:

  • Flat, stable hierarchy — five concrete context types regardless of algorithm count.

  • PQC extensibility — ML-KEM, ML-DSA, SLH-DSA require no new context types.

  • Each context enforces its own state machine; misuse (e.g., Update() before Init()) returns a typed error regardless of algorithm.

  • Consistent Init() / Update() / Finalize() / Reset() pattern across all operations simplifies caller code.

Negative:

  • Algorithm-specific overloads (e.g., ICipherContext::Init(iv) vs. IStreamingContext::Init()) require using declarations to suppress name-hiding warnings (see §3.5 in the evaluation report).

  • AEAD tag handling (SetTag() / GetTag()) cannot be expressed identically to non-AEAD operations — IAeadContext requires additional methods.

IMemoryAllocator Separated from ICryptoStack#

IMemoryAllocator Separated from ICryptoStack
status: accepted
context: doc__crypto_architecture
decision: ``IMemoryAllocator`` is a standalone interface independent of ``ICryptoStack``. It represents the data plane, can be injected into components that need only memory management, and enables isolated unit testing — none of which are possible if it is coupled to the control-plane ``ICryptoStack``.

IMemoryAllocator is a standalone interface independent of ICryptoStack. It represents the data plane, can be injected into components that need only memory management, and enables isolated unit testing — none of which are possible if it is coupled to the control-plane ICryptoStack.

Context#

The crypto module uses shared memory as the zero-copy data plane between the application and the crypto daemon. A naive design would expose memory allocation directly through ICryptoStack, coupling the data plane to the control-plane IPC object. Three reasons motivated a separate interface:

  1. Architectural independence of data plane and control plane — the memory subsystem operates independently of IPC: buffers can be allocated, written, and passed to providers without any IPC call. Coupling allocation to ICryptoStack would obscure this separation.

  2. Independent injection — components that need only memory management (e.g., a buffer pool, a serialiser) can receive an IMemoryAllocator without depending on the full ICryptoStack interface. This is consistent with the Interface Segregation Principle and reduces unnecessary coupling in the component graph.

  3. Isolated unit testing — by taking IMemoryAllocator as a dependency, individual components can be tested with a mock or stub allocator without standing up an ICryptoStack or a daemon connection.

Decision#

IMemoryAllocator is defined as a standalone interface independent of ICryptoStack. Applications obtain both objects separately; the memory allocator is the data plane and the crypto stack is the control plane. Cross-application connection sharing is not supported; each application has its own allocator instance.

Consequences#

Positive:

  • Data-plane and control-plane concerns are visibly separated in the API.

  • Components that allocate buffers do not depend on ICryptoStack.

  • Unit tests for memory-dependent components are cheaper and hermetic.

  • The zero-copy path (kProviderCompatible allocation) is a data-plane concern and sits cleanly on IMemoryAllocator without polluting ICryptoStack.

Negative:

  • Applications must obtain and manage two objects (ICryptoStack + IMemoryAllocator) where a monolithic interface would require only one.

Control Plane IPC Boundary Copy with Daemon-Internal Zero-Copy References#

Control Plane IPC Boundary Copy with Daemon-Internal Zero-Copy References
status: accepted
context: doc__crypto_architecture
decision: The control plane IPC layer on the daemon side shall create exactly one owning copy of all incoming request data at the deserialization boundary. All subsequent daemon-internal processing shall operate on non-owning references into that single owned copy. This prevents time-of-check-time-of-use (TOCTOU) attacks on control information while minimising memory copies within the daemon. The data plane is explicitly excluded — it carries only opaque data payloads and may use zero-copy transfer.

The control plane IPC layer on the daemon side shall create exactly one owning copy of all incoming request data at the deserialization boundary. All subsequent daemon-internal processing shall operate on non-owning references into that single owned copy. This prevents time-of-check-time-of-use (TOCTOU) attacks on control information while minimising memory copies within the daemon. The data plane is explicitly excluded — it carries only opaque data payloads and may use zero-copy transfer.

Context#

This decision applies to the control plane only — the channel carrying operation requests, responses, and their parameters (key identifiers, algorithm names, operation codes, session IDs, in-band data).

The data plane is out of scope and may use zero-copy (e.g., shared memory). The data plane carries only opaque payloads (plaintext, ciphertext) — not control information that the daemon validates or routes upon — so TOCTOU modification cannot cause the daemon to mis-route an operation, use the wrong key, or bypass access control. At worst the provider processes corrupted input, equivalent to the client submitting bad data in the first place.

Control plane requests carry parameters from an untrusted client. The daemon must decide whether to work directly from the transport buffer or copy first.

Consequences#

Positive:

  • Eliminates TOCTOU on control information.

  • One copy at ingress, zero through the handler chain, one at egress.

  • Clear ownership — the request structure owns; all handlers borrow.

  • Data plane stays zero-copy for bulk payloads.

Negative:

  • Mandatory copy per control plane request, even when the transport buffer is safe (deliberate performance-for-security trade-off; overhead is small since the control plane carries only metadata and small in-band buffers).

  • Two representations needed in the protocol types: owning types at the IPC boundary, non-owning views internally.

Alternatives Considered#

  1. Zero-copy end-to-end — read directly from the transport buffer. Vulnerable to TOCTOU: a client could swap a key ID or algorithm name between validation and use.

  2. Copy at every layer boundary — excessive allocation; e.g., three copies of the same hash input across IPC adapter, mediator, and provider.

  3. Single copy at the control plane IPC boundary, references thereafter — the deserialization layer copies all mutable parameters into daemon-owned memory. Downstream layers receive const references and use non-owning views.

Strategy 3 was selected.

Justification for the Decision#

Control information determines which operation runs, with which key, using which algorithm. If the daemon reads this from a buffer still writable by the client, the client can mutate it between validation and use (TOCTOU). A single owning copy at the IPC boundary makes the request immutable from the client’s perspective.

Non-owning references within the daemon are safe because:

  • The owned request data outlives the entire synchronous processing chain.

  • All downstream handlers receive it as const.

  • Processing is single-threaded per request.

The data plane does not carry control information, so TOCTOU cannot redirect operations — zero-copy is safe there by design.

Per-Operation Parameter Structs with Dual Overloads#

Per-Operation Parameter Structs with Dual Overloads
status: proposed
context: doc__crypto_architecture
decision: Key management operations (``GenerateKey``, ``DeriveKey``, ``AgreeKey``, ``UnwrapKey``, ``ImportKey``, ``WrapKey``) are encapsulated in dedicated fluent-builder parameter structs. Dual overloads support both ephemeral keys (returning ``CryptoResourceGuard``) and direct-to-slot writes (returning ``bool``). KDF configuration is represented as a structured ``KdfParameters`` type rather than opaque byte spans, providing type safety and extensibility.

Each key management operation accepts parameters via a dedicated fluent-builder struct (GenerateKeyParams, DeriveKeyParams, AgreeKeyParams, WrapKeyParams, UnwrapKeyParams, ImportKeyParams). Dual overloads support both ephemeral and persistent slot targets. KDF configuration is expressed as a structured KdfParameters type containing typed fields for all supported KDFs.

Context#

Key operations have complex parameter needs: algorithm selection, permission bitmasks, exportability flags, KDF configuration, IV, AAD, wrapping algorithm, peer public key data, and format specifiers. These parameters must be passed safely and extensibly to support both ephemeral keys (returned to the caller) and direct-to-slot writes (persisted in the daemon).

Decision#

Six dedicated parameter structs are provided:

  • GenerateKeyParams — algorithm, permissions, slot size

  • DeriveKeyParams — algorithm, permissions, KDF config, salt, label

  • AgreeKeyParams — peer public key, algorithm, permissions

  • WrapKeyParams — wrapping algorithm, IV, AAD

  • UnwrapKeyParams — format specifier, permissions

  • ImportKeyParams — algorithm, permissions, format specifier

Each struct is a fluent builder with named setters (SetAlgorithm(), SetPermissions(), etc.), enabling readable call sites.

Dual overloads are provided for all key-producing operations:

  • Ephemeral overload: Result<CryptoResourceGuard> XxxKey(const XxxxKeyParams&)

  • Persistent overload: Result<std::monostate> XxxKey(const CryptoResourceId& target_slot, const XxxxKeyParams&)

The target_slot parameter is always first in slot-targeting overloads, consistent with PersistKey(target_slot, ephemeral_key).

KDF configuration is replaced with a structured KdfParameters struct containing typed fields (salt, label, iteration count, output length) for all supported KDFs: HKDF, TLS 1.2 PRF, TLS 1.3 HKDF, PBKDF2, SP800-108. Opaque byte spans are no longer used for KDF parameters.

Alternatives Considered#

Single Fat Config Struct#

One KeyOperationConfig for all operations, with optional fields for each mode. This is rejected: most fields are unused for any given operation, creating confusion and enabling invalid parameter combinations at compile time. A per-operation struct enforces that only valid parameters are set.

Separate Named Methods#

Methods like GenerateKeyToSlot() instead of overloads. This is rejected: it doubles the API surface without adding clarity. Overloads are distinguished by return type (CryptoResourceGuard vs bool) and by the presence of target_slot as the first parameter, providing clear intent.

Builder Pattern on Context#

A fluent chain on the context object (e.g., key_mgmt->Generate().Algorithm("AES-256").Execute()). This is rejected: it requires runtime validation of missing required fields. A params-struct approach catches missing required fields at compile time via member initialization (the daemon performs final validation).

Consequences#

Positive:

  • Named fields and fluent builders eliminate parameter-order confusion and enable readable call sites with self-documenting intent.

  • Adding new optional fields to a params struct is non-breaking; existing callers continue to compile unchanged.

  • Structured KdfParameters provides compile-time type safety for KDF configuration (salt, label, iteration count) where opaque byte spans did not.

  • Dual overloads cleanly separate ephemeral key creation (RAII guard return) from persistent slot writes (boolean return), with consistent calling convention.

  • Span fields (peer public key, wrapped data, import key data, IV, AAD) reference caller-owned memory — zero-copy for large buffers (PQC public keys can reach 1–2 KB).

Negative:

  • Six additional parameter struct types increase compilation includes if not forward-declared.

  • Callers must construct a params struct even for simple operations: GenerateKeyParams{}.SetAlgorithm("AES-256") is more verbose than a single factory call with a string argument.

  • Span fields in params structs have lifetime constraints — referenced data must outlive the struct. This is documented but not enforced at compile time.