Five Common Misconceptions about Structured and Unstructured Data


Post by Rich Hale, CTO, ActiveNav


  • Structured data is quantitative (anything you can easily store in rows and columns) and relatively easier to keep compliant.
  • Unstructured data is qualitative (think your emails and Microsoft Teams chats) and much harder to manage.
  • Nearly all organizations are operating under one or more misconceptions about their data (and compliance or lack thereof with new privacy laws!).

The Two Types of Data Your Organization is Accumulating (and Why You Should Care)

We’ll start with why you should care.

If you’re familiar with the data compliance space, you already know that new laws require your organization to take specific steps to protect the rights of anyone whose data they hold. (If you’re not familiar with data compliance – surprise!)

The first step to maintaining compliance with these laws is understanding what data your organization actually has. Not having this understanding is dangerous for three reasons:

  • The less you know about your data holdings, the more likely they are to contain noncompliant data. Which means legal action and large fines if they stay that way.
  • In today’s world, it’s not if your data gets breached, it’s when. You want to ensure your data is in top shape to preserve your organization’s reputation.
  • Cost! And not just in fines and breach remediation expenses. Chances are your organization doesn’t need most of the data it’s holding, and therefore could be saving a bundle on data storage.

The first step to understanding your data holdings is to understand the difference between the two main types of data: structured and unstructured.

Structured data is what probably comes to mind when you hear the word “data”: spreadsheets on spreadsheets filled with quantitative information. Essentially, structured data is anything you can store in rows and columns, such as information stored in databases (think SQL), CSV files, and so on. It’s easily understood and analyzed by applications other than the ones that generated it, and it doesn’t scale easily – which is good for privacy purposes. It doesn’t grow out of control on its own, at least not for a long time.

Unstructured data is the qualitative data naturally generated from interactions with people. Think the text stored in your emails, Teams chats, social media, and websites. It can also comprise images, PDFs, Word docs – anything you can’t store in rows and columns. It’s not usually in a format that other applications can easily understand and analyze. And it multiplies like you wouldn’t believe: how many emails have you sent and received just this week?

Both types carry their own risk, but unstructured data is by far the riskier of the two. In today’s world, we generate it so quickly and in such high volume – and with such little organization –  that it’s functionally impossible to keep track of without using data discovery software.

On the flip side, once you have the right tool, getting to compliance becomes exponentially easier. When you can visualize all your unstructured data, you can see what’s out of compliance, fix that right quick, and understand where your policies and workflows need to change to keep everything above board.

Some Common Misconceptions Your Organization Might Have

We all know an ounce of prevention is worth a pound of cure. And yet: most of us don’t go to the doctor until we get sick.

The compliance world is no different. With regulations still relatively new, most organizations don’t fully appreciate the urgency surrounding the issue – and won’t until they themselves get breached.

If your organization is anything like most, you’re probably operating under one of the following misconceptions.

We Already Know What Data We Have

Name the last time you checked your Teams log. Or your Downloads folder. Your email archives? You get the idea.

People – and companies – don’t typically monitor or clean these types of things without a push. Without the proper privacy functions in place, we’re liable to think the trash in the ocean isn’t a problem. Until, of course, there’s an island of it.

We Won’t Get Breached

There is a roughly 30% chance your organization will get breached this year. This stat increases every year.

It’s also possible you’ve already been breached. According to IBM’s annual Cost of a Data Breach report, the average time to identify and contain a breach in 2021 was 287 days.

When you get breached, you can cut the time and expense involved significantly – nearly entirely – by already being in compliance. Compliant data equals a quick, cheap(er) remediation with no additional reputational damage on top of the fact that the breach occurred.

It’s Too Expensive to Figure Out What We Have

According to that same IBM report, the average cost of a breach in a hybrid cloud environment was $3.61 million. On top of that, compliance failure was the top factor found to amplify data breach costs. And remember, it’s not just the cost of remediating compliance flaws you have to worry about. Regulatory fines are getting steeper every year.

It’s Too Labor Intensive – We’d Need a Team of Experts

Since data privacy regulations are so new and the solution market is still growing, it’s easy to believe you’d need in-house specialists to operate whichever data discovery solution you ended up going with.

Not if you choose the right one! Specifically, you want to make sure you select a solution that’s purpose-built for ease of use. From deployment to monitoring and at every stage in between, no expert knowledge should be required. Don’t go with a solution that’s been repurposed from another area of the market, such as data loss prevention or data access management.

Traditional Data Inventory Methods Still Cut It

Back in the day, and still sometimes today, organizations would build data inventories through manual assessments and questionnaires: they’d basically ask their staff what data they thought the organization had.

In today’s world, with data accumulating and multiplying by the second, a manual static inventory won’t do the trick. It’s obsolete as soon as you create it.

To ensure continuous compliance, you need real-time visibility into your data.