SUP.11 Machine Learning Data Management#
The purpose is to define and align ML data with ML data requirements, maintain the integrity and quality of the ML data, and make them available to affected parties.
Process outcomes#
A ML data management system including an ML data lifecycle is established.
A ML data quality approach is developed including ML data quality criteria.
Collected ML data are processed for consistency with ML data requirements.
ML data are verified against defined ML data quality criteria and updated as needed.
ML data are agreed and communicated to all affected parties.
Base practices#
SUP.11.BP1: Establish an ML data management system
|
status: valid
|
||||
Establish an ML data management system which supports
Note Supported ML data management activities may include data collection, labeling/annotation, and structuring. |
|||||
SUP.11.BP2: Develop an ML data quality approach
|
status: valid
|
||||
Develop an approach to ensure that the quality of ML data is analyzed based on defined ML data quality criteria and activities are performed to support avoidance of biases of data. Note Examples of ML data quality criteria are relevant data sources, reliability and consistency of labelling, completeness against ML data requirements. Note The ML data management system should support the quality criteria and activities of the ML data quality approach. Note Biases to avoid may include sampling bias (e.g., gender, age) and feedback loop bias. Note For creation of ML data sets see MLE.3.BP2 and MLE.4.BP2. |
|||||
SUP.11.BP3: Collect ML data
|
status: valid
|
||||
Relevant sources for raw data are identified and continuously monitored for changes. The raw data is collected according to the ML data requirements. Note The identification and collection of ML data might be an organizational responsibility. Note Continuous monitoring should include the ODD and may lead to changes of the ML requirements. |
|||||
SUP.11.BP4: Process ML data
|
status: valid
|
||||
The raw data are processed (annotated, analyzed, and structured) according to the ML data requirements. |
|||||
SUP.11.BP5: Assure quality of ML data
|
status: valid
|
||||
Perform the activities according to the ML data quality approach to ensure that the ML data meets the defined ML data quality criteria. Note These activities may include sample-based reviews or statistical methods. |
|||||
SUP.11.BP6: Communicate agreed processed ML data
|
status: valid
|
||||
Inform all affected parties about the agreed processed ML data and provide them to the affected parties. |
|||||