What we do (and what we should do) to ensure data quality

Here is a very practical list. You don’t need to have a technical profile to understand it, but you do need discipline to apply it.

Define ‘what each piece of data means’

A field is not just a place to enter information, but an agreement on what that data means. For example, when we talk about ‘date of receipt’, it is important to clarify what time it refers to: physical entry, registration in the system, or validation? To avoid confusion, it is essential to have a simple, living, shared data dictionary.

Validate at the point of capture

The sooner an error is detected and stopped, the easier and cheaper it is to correct. That is why it is important to validate data as it is entered, using closed lists where necessary instead of free text, applying simple rules such as ranges, formats or mandatory fields, and displaying clear warnings when information is missing or something does not fit.

True traceability (history + who + why)

It is not just a matter of storing data, but of being able to understand what has happened to it. A system must allow you to know who made a change, when it was made, why it was made, and what the data was worth before. In products such as AniBio or NorayBanks, this complete traceability is key to communicating changes, maintaining a history, and sustaining critical processes without losing control of the information.

Eliminate duplicates judiciously

A duplicate does not always mean that two records have exactly the same name. To detect them correctly, it is necessary to rely on unique identifiers, apply matching rules between different fields, and perform reviews when there is any doubt.

Measure quality as you would measure any KPI

If data quality is not measured, it is difficult to improve. To do this, you can use simple metrics such as the percentage of incomplete records, the percentage of duplicates, the average delay in registration, or the classification of errors by type, whether format, range, or catalogue.

Governancesomeone must own’ the data

This does not refer to a ‘legal’ owner of the data, but rather an operational manager. In other words, someone who defines what is considered correct, who approves changes and who prioritises necessary improvements.

The uncomfortable part: data quality is culture, not just software

Software helps a lot, but on its own, it does not guarantee data quality. This is achieved when data is treated as an asset, processes are designed with it in mind, and problems are corrected at their source. If you want truly useful AI, start with the data: only with reliable information can it become a competitive advantage. Before AI comes data quality, and before data quality comes traceability, clear rules, and discipline.

If you missed the first part, you can read it here.