Assertions#
TA-SUPPLY-CHAIN
|
status: valid
|
||||
All sources for XYZ and tools are mirrored in our controlled environment Guidance This assertion is satisfied to the extent that we have traced and captured source code for XYZ and all of its dependencies (including transitive dependencies, all the way down), and for all of the tools used to construct XYZ from source, and have mirrored versions of these inputs under our control. ‘Mirrored’ in this context means that we have a version of the upstream project that we keep up-to-date with additions and changes to the upstream project, but which is protected from changes that would delete the project, or remove parts of its history. Clearly this is not possible for components or tools are provided only in binary form, or accessed via online services - in these circumstances we can only assess confidence based on attestations made by the suppliers, and on our experience with the suppliers’ people and processes. Keep in mind that even if repositories with source code for a particular component or tool are available, not all of it may be stored in Git as plaintext. A deeper analysis is required in TA-INPUTS to assess the impact of any binaries present within the repositories of the components and tools used. Evidence
Confidence scoring Confidence scoring for TA-SUPPLY_CHAIN is based on confidence that all inputs and dependencies are identified and mirrored, and that mirrored projects cannot be compromised. Checklist
|
|||||
TA-INPUTS
|
status: valid
|
||||
Components and tools used to construct and verify XYZ are assessed, to identify potential risks and issues Guidance To satisfy this assertion, the components and tools used to construct and verify XYZ releases need to be identified and assessed, to identify available sources of evidence about these dependencies. For components, we need to consider how their potential misbehaviours might impact our expectations for XYZ, identify sources of information (e.g. bug databases, published CVEs) that can be used to identify known risks or issues, and tests that can be used to identify these. These provide the inputs to TA-FIXES. For the tools we use to construct and verify XYZ, we need to consider how their misbehaviour might lead to an unintended change in XYZ, or fail to detect misbehaviours of XYZ during testing, or produce incorrect or incomplete data that we use when verifying an XYZ release. Where impacts are identified, we need to consider both how serious they might be (severity) and whether they would be detected by another tool, test or manual check (detectability). For impacts with a high severity and/or low detectability, additional analysis should be done to check whether existing tests are effective at detecting the misbehaviours or resulting impacts, and new tests or Expectations should be added to prevent or detect misbehaviours or impacts that are not currently addressed. Evidence
Confidence scoring Confidence scoring for TA-INPUTS is based on the set of components and tools identified, how many of (and how often) these have been assessed for their risk and impact for XYZ, and the sources of risk and issue data identified. Checklist
|
|||||
TA-TESTS
|
status: valid
|
||||
All tests for XYZ, and its build and test environments, are constructed from controlled/mirrored sources and are reproducible, with any exceptions documented Guidance This assertion is satisfied to the extent that we
All of the above must ensure that test results are retroactively reproducible, which is easily achieved through automated end-to-end test execution alongside necessary environment setups. Note that with non-deterministic software, exact results may not be reproducible, but high-level takeaways and exact setup should still be possible. Evidence
Confidence scoring Confidence scoring for TA-TESTS is based on the presence of tests and our confidence in their implementation and construction. CHECKLIST
|
|||||
TA-RELEASES
|
status: valid
|
||||
Construction of XYZ releases is fully repeatable and the results are fully reproducible, with any exceptions documented and justified. Guidance This assertion is satisfied if the construction of a given iteration of XYZ is both repeatable, demonstrating that all of the required inputs are controlled, and reproducible, demonstrating that the construction toolchain and build environment(s) are controlled (as described by TA-TESTS). This assertion can be most effectively satisfied in a Continuous Integration environment with mirrored projects (see TA-SUPPLY_CHAIN), using build servers that have no internet access. The aim is to show that all build tools, XYZ components and dependencies are built from inputs that we control, that rebuilding leads to precisely the same binary fileset, and that builds can be repeated on any suitably configured server. Again this will not be achievable for components/tools provided in binary form, or accessed via an external service - we must consider our confidence in attestations made by/for the supply chain. All non-reproducible elements, such as timestamps or embedded random values from build metadata, are clearly identified and considered when evaluating reproducibility. Evidence
Confidence scoring Calculate: R = number of reproducible components (including sources which have no build stage) N = number of non-reproducible B = number of binaries M = number of mirrored X = number of things not mirrored Confidence scoring for TA-RELEASES could possibly be calculated as R / (R + N + B + M / (M + X)) Checklist
|
|||||
TA-ITERATIONS
|
status: valid
|
||||
All constructed iterations of XYZ include source code, build instructions, tests, results and attestations. Guidance This assertion is best satisfied by checking generated documentation to confirm that:
Evidence
Confidence scoring Confidence scoring for TA-ITERATIONS based on
Checklist
|
|||||
TA-FIXES
|
status: valid
|
||||
Known bugs or misbehaviours are analysed and triaged, and critical fixes or mitigations are implemented or applied. Guidance This assertion is satisfied to the extent that we have identified, triaged and applied fixes and/or mitigations to the faults identified in XYZ and the bugs and CVEs identified by upstream component projects. We can increase confidence by assessing known faults, bugs and vulnerabilities, to establish their relevance and impact for XYZ. In principle this should involve not just the code in XYZ, but also its dependencies (all the way down), and the tools used to construct the release. However we need to weigh the cost/benefit of this work, taking into account
Evidence
Confidence scoring Confidence scoring for TA-FIXES can be based on
Each iteration, we should improve the algorithm based on measurements Checklist
|
|||||
TA-UPDATES
|
status: valid
|
||||
XYZ components, configurations and tools are updated under specified change and configuration management controls. Guidance This assertion requires that that we have control over all changes to XYZ, including changes to the configurations, components and tools we use to build it, and the versions of dependencies that we use. This means, the trustable controlled process is the only path to production for constructed target software. Evidence
Confidence scoring Confidence scoring for TA-UPDATES is based on confidence that we have control over the changes that we make to XYZ, including its configuration and dependencies. Checklist
|
|||||
TA-BEHAVIOURS
|
status: valid
|
||||
Expected or required behaviours for XYZ are identified, specified, verified and validated based on analysis. Although it is practically impossible to specify all of the necessary behaviours and required properties for complex software, we must clearly specify the most important of these (e.g. where harm could result if given criteria are not met), and verify that these are correctly provided by XYZ. Guidance This assertion is satisfied to the extent that we have:
Expectations could be verified by:
The number and combination of the above verification strategies will depend on the scale of the project. For example, unit testing is more suitable for the development of a small library than of an OS. Evidence
Confidence scoring Confidence scoring for TA-BEHAVIOURS is based on our confidence that the list of Expectations is accurate and complete, that Expectations are verified by tests, and that the effectiveness of these tests is validated by appropriate strategies. Checklist
|
|||||
TA-MISBEHAVIOURS
|
status: valid
|
||||
Prohibited misbehaviours for XYZ are identified, and mitigations are specified, verified and validated based on analysis. The goal of TA-MISBEHAVIOURS is to force engineers to think critically about their work. This means understanding and mitigating as many of the situations that cause the software to deviate from Expected Behaviours as possible. This is not limited to the contents of the final binary. Guidance This assertion is satisfied to the extent that we can:
Once Expected Behaviours have been identified in TA-BEHAVIOURS, there are at least four classes of Misbehaviour that can be identified:
Identified Misbehaviours must be mitigated. Mitigations include patching, re-designing components, re-designing architectures, removing components, testing, static analysis etc. They explicitly do not include the use of AWIs to return to a known-good state. These are treated specifically and in detail in TA-INDICATORS. Mitigations could be verified by:
Remember that a Misbehaviour is anything that could lead to a deviation from Expected Behaviour. The specific technologies in and applications of XYZ should always be considered in addition to the guidance above. Suggested evidence
Confidence scoring Confidence scoring for TA-MISBEHAVIOURS is based on confidence that identification and coverage of misbehaviours by tests is complete when considered against the list of Expectations. Checklist
|
|||||
TA-INDICATORS
|
status: valid
|
||||
Advance warning indicators for misbehaviours are identified, and monitoring mechanisms are specified, verified and validated based on analysis. Not all deviations from Expected Behaviour can be associated with a specific condition. Therefore, we must have a strategy for managing deviations that arise from unknown system states, process vulnerabilities or configurations. This is the role of Advanced Warning Indicators (AWI). These are specific metrics which correlate with deviations from Expected Behaviour and can be monitored in real time. The system should return to a defined known-good state when AWIs exceed defined tolerances. Guidance This assertion is met to the extent that:
Note, the set of possible deviations from Expected behaviour is not the same as the set of Misbehaviours identified in TA-MISBEHAVIOURS, as it includes deviations due to unknown causes. Deviations are easily determined by negating recorded Expectations. Potential AWIs could be identified using source code analysis, risk analysis or incident reports. A set of AWIs to be used in production should be identified by monitoring candidate signals in all tests (functional, soak, stress) and measuring correlation with deviations. The known-good state should be chosen with regard to the system’s intended consumers and/or context. Canonical examples are mechanisms like reboots, resets, relaunches and restarts. The mechanism for returning to a known-good state can be verified using fault induction tests. Incidences of AWIs triggering a return to the known-good state in either testing or production should be considered as a Misbehaviour in TA-MISBEHAVIOURS. Relying on AWIs alone is not an acceptable mitigation strategy. TA-MISBEHAVIOURS and TA-INDICATORS are treated separately for this reason. The selection of AWIs can be validated by analysing failure data. For instance, a high number of instances of deviations with all AWIs in tolerance implies the set of AWIs is incorrect, or the tolerance is too lax. Evidence
Confidence scoring Confidence scoring for TA-INDICATORS is based on confidence that the list of indicators is comprehensive / complete, that the indicators are useful, and that monitoring mechanisms have been implemented to collect the required data. Checklist
|
|||||
TA-CONSTRAINTS
|
status: valid
|
||||
Constraints on adaptation and deployment of XYZ are specified. Guidance Constraints on reuse, reconfiguration, modification, and deployment are specified to enhance the trustability of outputs. To ensure clarity, scoping boundaries regarding what the output cannot do - especially where common assumptions from applied domains may not hold - must be explicitly documented. These constraints are distinct from measures that mitigate misbehaviours; rather, they define the boundaries within which the system is designed to operate. This upfront documentation clarifies the intended use of specified Statements, highlights known limitations, and prevents misinterpretation. These constraints - categorized into explicit limitations and assumptions of use - serve as a guide for both stakeholders and users. They define the intended scope and provide a clear interface for how upstream and downstream systems can integrate, modify, install, reuse, or reconfigure to achieve the desired output. Additionally, the documentation explicitly defines the contexts in which the integrity of existing Statements is preserved and whether any reimplementation is necessary. Crucially, these limitations are not unresolved defects resulting from triage decisions but deliberate exclusions based on design choices. Each omission should be accompanied by a clear rationale, ensuring transparency for future scope expansion and guiding both upstream and downstream modifications. Suggested evidence
Confidence scoring The reliability of these constraints should be assessed based on the absence of contradictions and obvious pitfalls within the defined Statements. Checklist
|
|||||
TA-VALIDATION
|
status: valid
|
||||
All specified tests are executed repeatedly, under defined conditions in controlled environments, according to specified objectives. Guidance This assertion is satisfied to the extent that all of the tests specified in TA-BEHAVIOURS and constructed in TA-TESTS are correctly executed in a controlled environment on a defined cadence (e.g. daily) or for each proposed change, and on all candidate release builds for XYZ. Note that correct behaviour of tests may be confirmed using fault induction (e.g. by introducing an error or misconfiguration into XYZ). Evidence
Confidence scoring Confidence scoring for TA-VALIDATION is based on verification that we have results for all tests (both pass / fail and performance) Checklist
|
|||||
TA-DATA
|
status: valid
|
||||
Data is collected from tests, and from monitoring of deployed software, according to specified objectives. Guidance This assertion is satisfied if results from all tests and monitored deployments are captured accurately, ensuring:
To prevent misinterpretation, all data storage mechanisms and locations must be documented, along with long-term storage strategies, ensuring historical analyses can be reliably replicated. Evidence
Confidence scoring Confidence scoring for TA-DATA is based on comparison of actual failure rates with targets, and analysis of spikes and trends Checklist
|
|||||
TA-ANALYSIS
|
status: valid
|
||||
Collected data from tests and monitoring of deployed software is analysed according to specified objectives. Guidance This assertion is satisfied to the extent that test data, and data collected from monitoring of deployed versions of XYZ, has been analysed, and the results used to inform the refinement of expectations and risk analysis. The extent of the analysis is with sufficient precision to confirm that:
Where test results expose misbehaviours not identified in our analysis (TA-ANALYSIS), we add the new misbehaviours to our Expectations (TA-BEHAVIOURS and TA-MISBEHAVIOURS). Where necessary, as informed by our ongoing confidence evaluation (TA-CONFIDENCE), we improve and repeat the analysis (TA-ANALYSIS). Evidence
Confidence scoring Confidence scoring for TA-ANALYSIS is based on Key Performance Indicators (KPIs) that may indicate problems in development, test, or production CHECKLIST
|
|||||
TA-METHODOLOGIES
|
status: valid
|
||||
Manual methodologies applied for XYZ by contributors, and their results, are managed according to specified objectives. Guidance To satisfy this assertion, the manual processes applied in the verification of XYZ must be documented, together with the methodologies used, the results of applying these processes to specific aspects and iterations of XYZ or its components, and evidence that they have been reviewed using documented criteria. Any data analysis for TA-ANALYSIS should ideally be mostly automated to establish continuous feedback mechanisms. However, specifically, manual processes - such as those used for quality control assurances and the appropriate assignment of responsibilities - must be documented as evidence for TA-METHODOLOGIES. Evidence
Confidence scoring Confidence scoring for TA-METHODOLOGIES is based on identifying areas of need for manual processes, assessing the clarity of proposed processes, analysing the results of their implementation, and evaluating the evidence of effectiveness in comparison to the analysed results Checklist
|
|||||
TA-CONFIDENCE
|
status: valid
|
||||
Confidence in XYZ is measured based on results of analysis Guidance To quantify confidence, either a subjective assessment or a statistical argument must be presented for each statement and then systematically and repeatably aggregated to assess whether the final deliverable is fit for purpose. To improve the accuracy of confidence evaluations in reflecting reality, the following steps are necessary:
As subjective assessments are progressively replaced with statistical arguments and past confidence evaluations are refined against new evidence, the accuracy of evaluations improves over time. When project circumstances inevitably change, existing statements are repurposed, with their associated confidence scores eventually offering insights into the systematic capability of the project to deliver according to set objectives. This process should itself be analysed to determine the maturity of any given confidence score. A suitable meta-analysis can assess long-term trends in score sourcing, score accumulation, and weighting mechanisms. Evidence
Confidence scoring Confidence scoring for TA-CONFIDENCE is based on quality of the confidence scores given to Statements Checklist
|
|||||