MLE.4 Machine Learning Model Testing#

The purpose is to ensure compliance of the trained ML model and the deployed ML model with the ML requirements.

Process outcomes#

  1. A ML test approach is defined.

  2. A ML test data set is created.

  3. The trained ML model is tested.

  4. The deployed ML model is derived from the trained ML model and tested.

  5. Consistency and bidirectional traceability are established between the ML test approach and the ML requirements, and the ML test data set and the ML data requirements; and bidirectional traceability is established between the ML test approach and ML test results.

  6. Results of the ML model testing are summarized and communicated with the deployed ML model to all affected parties.

Base practices#

MLE.4.BP1: Specify an ML test approach
status: valid
tags: aspice40_mle4

Specify an ML test approach suitable to provide evidence for compliance of the trained ML model and the deployed ML model with the ML requirements. The ML test approach includes

  • ML test scenarios with distribution of data characteristics (e.g., gender, weather conditions, street conditions within the ODD) defined by ML requirements,

  • distribution and frequency of each ML test scenario inside the ML test data set,

  • expected test result per test datum,

  • entry and exit criteria of the testing,

  • approach for data set creation and modification, and

  • the required testing infrastructure and environment setup.

Note

Expected test result per test datum might require labeling of test data to support comparison of output of the ML model with the expected output.

Note

Test datum is the smallest amount of data which is processed by the ML model into only one output. E.g., one image in photo processing or an audio sequence in voice recognition.

Note

Data characteristic is one property of the data that may have different expressions in the ODD. E.g., weather condition may contain expressions like sunny, foggy or rainy.

Note

An ML test scenario is a combination of expressions of all defined data characteristics e.g., weather conditions = sunny, street conditions = gravel road.

MLE.4.BP2: Create ML test data set
status: valid
tags: aspice40_mle4

Create the ML test data set needed for testing of the trained ML model and testing of the deployed ML model from the ML data collection provided by SUP.11 considering the ML test approach. The ML test data set shall not be used for training.

Note

The ML test data set for the trained ML model might differ from the test data set of the deployed ML model.

Note

Additional data sets might be used for special purposes like assurance of safety, fairness, robustness.

MLE.4.BP3: Test trained ML model
status: valid
tags: aspice40_mle4

Test the trained ML model according to the ML test approach using the created ML test data set. Record and evaluate the ML test results.

Note

Evaluation of test logs might include pattern analysis of failed test data to support e.g., trustworthiness.

MLE.4.BP4: Derive deployed ML model
status: valid
tags: aspice40_mle4

Derive the deployed ML model from the trained ML model according to the ML architecture. The deployed ML model shall be used for testing and delivery to software integration.

Note

The deployed ML model will be integrated into the target system and may differ from the trained ML model which often requires powerful hardware and uses interpretative languages.

MLE.4.BP5: Test deployed ML model
status: valid
tags: aspice40_mle4

Test the deployed ML model according to the ML test approach using the created ML test data set. Record and evaluate the ML test results.

MLE.4.BP6: Ensure consistency and establish bidirectional traceability
status: valid
tags: aspice40_mle4

Ensure consistency and establish bidirectional traceability between the ML test approach and the ML requirements, and the ML test data set and the ML data requirements; and bidirectional traceability is established between the ML test approach and ML test results.

Note

Bidirectional traceability supports consistency, and facilitates impact analyses of change requests, and verification coverage demonstration. Traceability alone, e.g., the existence of links, does not necessarily mean that the information is consistent with each other.

MLE.4.BP7: Summarize and communicate results
status: valid
tags: aspice40_mle4

Summarize the ML test results of the ML model. Inform all affected parties about the agreed results and the deployed ML model.