Data quality: the real ‘superpower’ behind AI

Lately, it seems that everything can be solved with artificial intelligence (AI). And in part, this is true; AI can help classify, predict, detect anomalies, or summarise information with impressive efficiency. However, there is one issue that hardly ever makes the headlines: AI does not correct poor data; on the contrary, it generally amplifies it.

If data is fuel, data quality is the oil that lubricates the engine and prevents it from failing. In the field of biomedical research, biobanks, laboratories and animal facilities, this has critical implications. Incorrect or incomplete data can lead to wrong decisions, loss of traceability, duplication of work or unreliable scientific conclusions.

If AI feeds on this ‘noise’, it can generate results that appear convincing but are profoundly erroneous. NorayBio designs its solutions precisely to guarantee this traceability and control: from comprehensive animal facility management with AniBio, to samples with NorayBanks, and ethical committee processes thanks to NorayDocs.

What is data quality in practice?

It is not about accumulating lots of data or organising it in spreadsheets. Quality means that each piece of data meets minimum standards when used for decision-making or analysis.

Therefore, the fundamental pillars of quality data would be:

Accuracy. Does it reflect reality?
Completeness. Does it include all essential fields?
Consistency. Is it consistent with the rest of the information?
Uniqueness. Does it avoid duplicates?
Timeliness. Is it available when I need it?
Lineage. Are its origin, modifications and those responsible for it known?

In our sector, lineage should not be a nice-to-have or non-essential: it distinguishes absolute confidence from permanent doubt.

The risk: when AI starts to ‘guess’

AI excels at finding patterns, but if these are contaminated, it can make various mistakes:

Duplicates. A sample recorded twice appears as ‘two events’, and the AI learns that it occurs more often than it actually does.
Empty fields. AI fills in gaps ‘by eye’ (by probability). Sometimes it gets it right, sometimes it makes it up.
Inconsistencies. If the same concept appears with different names, AI believes they are different things.
Systematic errors. If a team or process generates a bias, AI normalises and reproduces it.

The result is polished reports and ‘reliable’predictions built on unstable foundations. As the saying goes, garbage in, garbage out, and AI only speeds up the process.

Why is it more important than ever?

In the past, bad data affected a query, a report or a specific decision.

Today, with AI:

Everything is reused. The same data feeds models, dashboards, automations and audits.
The impact is scaled. A small error is replicated in a thousand outputs.
We lose our ‘instinct’. When a ‘machine’ says something with certainty, we tend to believe it more.

Therefore, if your organisation wants to use artificial intelligence, these are the three steps to follow: Data quality, Governance, AI (with meaning).

Don’t miss Part 2 in our next newsletter.

➡️Specific practices to ensure data quality, key metrics, and how organisational culture makes a difference.

Data quality: the real ‘superpower’ behind AI (Part 1)

Data quality: the real ‘superpower’ behind AI (Part 1)