Releases: datakaveri/data-quality-assessment
DQ Assesment Tool v2.1.0
This minor update incorporates the updated methodology, wherein there are 6 metrics instead of 7. The IAT Outliers metric has been dropped due to its redundancy with sensor uptime. All references to IAT outliers have been removed due to a large number of edge cases that led to non-usable visualizations in the output PDF report. This step has been taken from a user-friendliness readability standpoint.
DQ Assesment Tool v2.0.0
This tool can be used to assess the quality of a dataset by providing a metric score between 0 and 1, where 1 is the highest possible score, indicating a 100% score. Currently, the tool is able to assign a score to seven parameters, an increase from the 5 parameters described in v1.0.0. These parameters attempt to quantify the quality of specifically sensor data, and they are:
-Regularity of Inter-Arrival Time
-Outlier Presence in Inter-Arrival Time
-Sensor Uptime
-Absence of Duplicate Values
-Adherence to Attribute Format
-Absence of Unknown Attributes
-Adherence to Mandatory Attributes
The tool takes static historical data as input for the user to be able to check and assess the quality.
The output of the tool is primarily a fully fleshed out PDF with visualisations and explanations, a change from the JSON output of v1.0.0. This change was made with readability and accessibility for the end user in mind. The previous functionality of a JSON output has been retained and improved as well, for a user that wants to view the scores in isolation.
Full Changelog: v1.0.0...v2.0.0
DQ Assesment Tool v1.0.0
This tool can be used to assess the quality of a dataset by providing a metric score between 0 and 1, where 1 is the highest possible score, indicating a 100% score. Currently, the tool is able to assign a score to 5 parameters. These parameters attempt to quantify the quality of specifically sensor data, and they are:
- Regularity
- Duplicates
- Completeness
- Format Adherence
- Mandatory Attributes
The tool takes static historical data as input for the user to be able to check and assess the quality.
The output of the tool is a JSON report with scores for each of the 5 dimensions.
Full Changelog: https://github.com/datakaveri/data-quality-assessment/commits/v1.0.0