SUP.11 Machine Learning Data Management#

The purpose is to define and align ML data with ML data requirements, maintain the integrity and quality of the ML data, and make them available to affected parties.

Process outcomes#

  1. A ML data management system including an ML data lifecycle is established.

  2. A ML data quality approach is developed including ML data quality criteria.

  3. Collected ML data are processed for consistency with ML data requirements.

  4. ML data are verified against defined ML data quality criteria and updated as needed.

  5. ML data are agreed and communicated to all affected parties.

Base practices#

SUP.11.BP1: Establish an ML data management system
status: valid
tags: aspice40_sup11

Establish an ML data management system which supports

  • ML data management activities,

  • relevant sources of ML data,

  • ML data life cycle including a status model, and

  • interfaces to affected parties.

Note

Supported ML data management activities may include data collection, labeling/annotation, and structuring.

SUP.11.BP2: Develop an ML data quality approach
status: valid
tags: aspice40_sup11

Develop an approach to ensure that the quality of ML data is analyzed based on defined ML data quality criteria and activities are performed to support avoidance of biases of data.

Note

Examples of ML data quality criteria are relevant data sources, reliability and consistency of labelling, completeness against ML data requirements.

Note

The ML data management system should support the quality criteria and activities of the ML data quality approach.

Note

Biases to avoid may include sampling bias (e.g., gender, age) and feedback loop bias.

Note

For creation of ML data sets see MLE.3.BP2 and MLE.4.BP2.

SUP.11.BP3: Collect ML data
status: valid
tags: aspice40_sup11

Relevant sources for raw data are identified and continuously monitored for changes. The raw data is collected according to the ML data requirements.

Note

The identification and collection of ML data might be an organizational responsibility.

Note

Continuous monitoring should include the ODD and may lead to changes of the ML requirements.

SUP.11.BP4: Process ML data
status: valid
tags: aspice40_sup11

The raw data are processed (annotated, analyzed, and structured) according to the ML data requirements.

SUP.11.BP5: Assure quality of ML data
status: valid
tags: aspice40_sup11

Perform the activities according to the ML data quality approach to ensure that the ML data meets the defined ML data quality criteria.

Note

These activities may include sample-based reviews or statistical methods.

SUP.11.BP6: Communicate agreed processed ML data
status: valid
tags: aspice40_sup11

Inform all affected parties about the agreed processed ML data and provide them to the affected parties.