How much data do you really need to get the most out of AI?

How much data do I need for Data Analytics and AI

How much data is enough?

This is an interesting question and one we get asked a lot.

Typically, people talk about how many rows they have.

To be honest, it doesn’t matter whether you have a hundred, a thousand, a million or 7 billion rows of data.

Editor: We recently worked on a project with Channel 9 that ingested over 7 billion rows of data and made it available in a Tableau dashboard for users to explore.  I have to say, that was really cool!

The easy answer is “as much as you can get your hands on”.

But that’s not the smartest approach.  What really matters is understanding what you are trying to learn from your data.

Understand your customers much better

Say you want to understand more about your customers, you’ll need your customer records.  If you want to dig into your customers’ buying patterns over time, then you’ll also need to get their sales history too.

Analysis around sales all relates to seasonality, so you’ll need sales data for each customer for a reasonable period.

That also begs another question: what is a season?  In many retail companies, seasons go beyond the weather seasons ‘normal’ people talk about.  Whatever they are to you, clearly define them at the start.

That aside, the data should go back as far as practical and should ideally include a few cycles through whatever your typical seasons are.

Learn more about staff productivity and efficiency

If you are looking at HR and staff performance, then you will need all staff records and perhaps all their timesheet data or job data.

Like customer data, you’ll probably need to look at this over a reasonable period of time to get a good understanding of the patterns. Depending on your industry, this could be a few years or a few months.

Anything else on your mind?

The same rules apply to any other kind of data:

  • Financial
  • Operational
  • Building metrics

If you can access it, you can use it.

Think before you act 

First things first: Outline what it is you are trying to achieve 
THEN
…look at whether the data exists to support that objective
THEN
…consider if you have enough data available to use.

Context is the most important aspect

Rather than looking at what data you have and using that as a benchmark of what is necessary, it’s more important to consider the problem or the outcome to establish the necessary amount of data you’d need.

Think inventory, if you were trying to understand how efficiently you are stocking inventory, you’d need to have all your inventory records, current stock levels over time and then the locations of where the stock is.  That should give you all the data points you need.

From a seasonal point of view, the length of time you look over is also important.  If you are analysing trends for a seasonal point of view, like Xmas or Easter, then 4-5 year as would be ideal.  More generally, you’ll need 7 years to cover off a standard economic cycle.

And realistically you’ll need to go back to 2007 and pre-GFC era to really get a sense of changes over time.

But, whatever you decide, start with contextual.

Share this post