Back to all articles

4 best practices for better data management

David Grover
David Grover
Data Architect
Length
11 min read
Date
13 January 2023

Every customer touchpoint in your enterprise has a team of software people trying to collect better data.

We collect that data so we can delight customers. We run digital transformation projects on these touchpoints to use the data we collect to measure and improve our customer’s experiences. We hope our customers and prospects see this data collection as attentiveness, and they’ll be delighted with our efforts.
 
But it’s not super-clear what we’re collecting. What is data? What makes data “good” or “bad?” How does data go bad? When data goes bad is it more like how apples go bad, or like oranges?

In this short piece, we’ll explain what data is, in clear business terms. We’ll explain how it goes bad. And then we’ll talk about ways you can make sure the data you collect helps delight customers.

What is data?

Every time someone clicks “submit” they create data. A mouse click is all it takes, but maybe even just an eyeblink or mouse squiggles.

The submit might be on a web page or a form on an old-school desktop application. Let’s use a concrete example. Think about the order form from possibly the world’s most successful order application, from Amazon.

To make an order we need five things: a buyer, a seller, a product, a price, and some dates to help set expectations. Those are the basics. You can spot where Amazon collects and uses bits of data about all those things on their form. Sometimes data about one of those things is created in the form, like when you click on one of the buttons. Sometimes the data is typed in by a robot, like when the order form knows your default address.

When the Amazon form is submitted the software creates a record of the order. The order record has to have all five basic things to be an order record. Bad orders happen when you click wrong, or a robot guessed wrong, or you defaulted wrong, and the data about one of those five basic things is wrong. With Amazon’s help you find the order and fix the bad data. But you do it lots, and so do I, and bad data is common.  

The words you typed on the form (or had a robot type for you) are bucketed into fields. A field is sort of a quantum thing because it’s what was meant by all the words you or your robots typed. Ideally, you and your robots always type the right words. But since that’s impossible, we also get bad data because the actual words typed in the fields don’t mean what we hoped.

Records like the one from the Amazon orders are kept for later using storage. Storage might be on clay tablets, in text files, or in database tables. You could store records as strands of DNA, as holograms, as paper copies of pulsar signals or morse code. The best storage balances keeping records whole and together for long periods of time, allowing the record to update easily and reliably. Text files easily store way more records than you could put onto clay tablets, and update easier too. Clay tablets last longer than text files or holograms, or even most DNA. 

In my office, we used to use a spreadsheet to track lunch orders, only rarely encoded as pulsar signals. Below is a clay tablet with similar lunch orders for a particular day roughly 5200 years ago:

Clay tablet

Your office might now use robots from Doordash or UberEats to manage lunch orders. Those robots might use anything from clay tablets to holograms for storage.

For each kind of storage we need a way to find just your lunch record, and not mine by accident. We also need the name of the storage location they’re in because there’s likely more than one. Is your order in last week’s batch of holograms or today’s clay tablet bin? Is it in the DNA that got sent to the archive yesterday, or in the text files backed up on the cloud file server last night? To find your particular order we first have to find your storage.

So our order record needs to add two more fields to go with the other five:

  • The location of the storage, maybe the name of the text file or the name of the bin labeled “yesterday’s clay tablets”
  • The name of the record itself, like your Amazon or Doordash order number

Let’s zoom out to 50 thousand feet: “Your data” is just all the records your enterprise has in all your storage, all the clay tablets, text files, databases or holograms. Your order data is all the order records you’ve got in all of their storage locations. Can you picture all of those locations? Your customer or product data is all your customer or product records, in all their storage locations.

Do you use one of those online CRMs to manage ecommerce, and an online ERP to manage brick-and-mortar? You have two storage locations where you keep order records, and also for your customer and product records. Those two locations might use the same kinds of storage, maybe as database tables. But that’s unlikely, so you also likely have two kinds of storage. Those kinds of storage are probably not as different as clay tablets and holograms, but you can’t be certain.

Yes. Your data is all the permanent records of all those form submits.

Remember, we need five fields (quantum fields) to make an order record:

  • A buyer
  • A seller
  • A product or service
  • A price
  • Expected dates

Plus two more fields to find the order itself:

  • The order name
  • The storage location name

To act on your order we also need to know how to read the record. We need to know which quantum field is the buyer and which one is the seller. That gives us one more field to add to the record:

  • The record map

If we swap buyer for seller randomly we create a lot of confusion when it’s time to pay the bill. If we swap order name for product name the people who put products into boxes will send us the wrong product. The map is critical to ensuring our records are good.

Think back to the clay tablet. Without a map you have no way of knowing which fields belong to which orders. Which text on this tablet is the order number and which is the buyer?

When all eight fields are tied together and stored together, we have a good order record. If we’re missing any one field, we will have a problem.

(Exciting historical aside: Poetry didn’t get written down until a couple of hundred years after the lunch orders started.) 

How does data go bad?

Records are valuable when they’re kept whole. With only part of an order record – maybe you’re missing the product piece of the tablet, or the buyer – you have a problem. How do you fulfill the order when you don’t know the buyer? 

You can guess. You could hire robots or people to guess and learn the missing data. One common approach is to ask unhappy customers what they think. But if you’re missing whole big chunks of your records in storage, you’re out of luck. 

You may have clay tablets with no record maps, or text files missing order names. Imagine 100 clay tablets but there was no seller on any of the orders. How do you pay the sellers? More clay tablets, and your best bet is still often to guess at what’s in the missing fields. 

So records with missing fields are bad, and that’s our first example of bad data:

1. Records are missing fields.

Let’s talk about some other examples.

Suppose your tablets are intact, and all the fields are at your command. But there are different records in separate storehouses. One is for your downtown lunch orders, and the other storehouse is for mom-and-pop lunches made not downtown. The customer with orders in both systems who wants to change the not-downtown one, which is not where you are, is pretty sure you have bad data. You’re missing at least one of their orders. Who’s at fault here? This outrageous example is our second kind of bad data:

2. Records are not co-located (or -locatable).

Another example: What happens if a customer comes in for their order but you have no master clay tablet with all the orders named? Your list of active orders might not include your very next customer’s latest order. And you have to read every tablet in all the storage locations every time a customer comes in, just to find their order. This is a third kind of bad data:

3. Missing the master list of records.

Finally: You’ve got a master list of all the records, all in one place, and nothing is broken. But your orders are unpredictable. You’re missing the map. You can’t figure out what the records mean. Your buyers get the wrong product, sellers get the bills, and nobody is delighted. This is a fourth kind of bad data:

4. Missing maps.

All of these kinds of bad data are caused by some variation of the “records broke and we can’t put them together” problem. You see the same problem in clay tablets, text files, databases, old vinyl, DNA, even holograms if you can imagine, i.e. all of the storage invented so far. Let’s review our four kinds of bad data:

  • Records are missing fields
  • Records are not co-located (or -locatable)
  • No master list of records
  • No maps

4 best practices for good data management

Those four examples might be true for clay tablets with stone age lunch orders. What about the modern digital enterprise? Our list sounds too simple to be true in 2023.

Surprisingly enough the complaints we most often hear from companies hamstrung by bad data are one of the four above.

For example, one common modern problem is senior executives who believe their reports are about the same records when they aren’t. One report points to records in one storehouse, and another’s a different storehouse, and the totals can’t match. Or one report mismaps order number to product number and miscounts both. Or order forms just can’t see all the orders because the master list is missing orders, and customers aren’t delighted.

Each of these problems has a logically simple solution. The actual creation of a solution turns out to be a complex organizational and technical operation, but that’s why our short list of four reasons data goes bad sounds too simple to be true. Surely in 2022 we can’t be making the same mistakes we made with clay tablets? Both our organizations and our technologies must have improved over the last 5000 years. 

Our examples suggest we use four best practices to keep data from going bad:

  • Records are missing fields? Find and fix the missing fields.
  • Records are not co-located (or -locatable)? Work to make good copies of all your data easier to get to.
  • No master list of records? Start making a master list of records.
  • No maps? Get good at guessing old maps and start documenting all your new maps.

It’s unlikely the excellence of your existing practice is spread evenly across all four best practices. That’s okay: You can’t be excellent at everything.

What makes the difference when being attentive to your customers is actively trying to get better. Your customer doesn’t care that you’ve nailed best practices 1-3 but 4 is just too hard. Delight will show when they know that at least you’re trying. That means you should try, and measure yourself, across all four best practices.

Learn more about our data strategy and consulting services, and how we can help your organization.