What is a Data Quality API?

A data quality API is a software intermediary that serves requests and responses for various data quality functions. Data Quality APIs have different names like data quality firewall, real-time data quality validation, central data quality engine, etc. The API is triggered when an event takes place and routes incoming data through it for quality verification before directing it to the source database. This ensures that no data quality errors are migrated from the data-capturing application to the data-storing source. The API is built on event-driven architecture and can be used in proactive data quality approaches, where data is tested and treated before it is stored in the database.


Reference: David Loshin, in The Practitioner’s Guide to Data Quality Improvement, 2011

Why is data quality important?

Gartner research estimates organizations of all sizes lose about $13M annually in poor data quality with an average of 60% of businesses surveyed not measuring the cost of bad data. With tools like data quality APIs can help act as an intermediary between two applications and handles requests and responses being transmitted between them. In the context of data quality, a data quality API acts as a gatekeeper between a data-capturing application and a data storage source, ensuring that data quality errors are not migrated from one end to the other.
DataGovs API is built on event-driven architecture, which means it is triggered when an event occurs. For example, when new data is created or existing data is updated in a connected application, the update is first sent to the data quality API where it is verified for quality. If errors are found, a set of transformation rules are executed to clean the data. In some cases, a data quality steward may need to intervene to resolve issues where data values are ambiguous and cannot be well-processed by configured algorithms. Once the data has been cleaned and verified, it is then sent to the destination source.


Example of a Data Quality Scorecard used at the API and JSON Level

The Impact of Data Quality Monitoring

With DataGovs, our data quality API can be used in a proactive approach to data quality, where the data is tested and treated before it is stored in the database. This approach is superior to a reactive approach, where the data is cleaned after it has been stored, as it ensures the reliability and availability of the data. By deploying a data quality API, companies can ensure that the data they produce and use is of high quality, which is essential for making informed business decisions.
In summary, a data quality API is a software tool that acts as a gatekeeper between a data-capturing application and a data storage source. It is triggered when an event occurs and routes incoming data through it for quality verification before directing it to the source database. By using a proactive approach and deploying a data quality API, companies can ensure that their data is of high quality and can be trusted for making important business decisions.