Big Data: Bad Data?!?


Post By: Dean Gonsowski, Esq., Chief Revenue Officer for ActiveNav

Two decades ago, Silicon Valley executive and CEO of Sun Microsystems Scott McNealy quipped: “You have zero privacy…Get over it.”

That’s when:

Fast forward 20 years and it’s amazing to see how the privacy landscape has changed…  The biggest is that people aren’t “getting over” the whole privacy thing, even in America:

80% of U.S. adults were either “somewhat” or “very” concerned about how companies were using their data, and 81% of U.S. adults feel that they have little or no control over how their data is being used.

This growing angst over data privacy and the past fervor over big data has created a stark contrast. Initially, the promise of big data was boundless, perhaps the most hyped trend we’ve seen in recent history:

Big-data computing is perhaps the biggest innovation in computing in the last decade. We have only begun to see its potential to collect, organize, and process data in all walks of life.

The interesting nature of this promise was that it required (in many cases) practitioners to suspend disbelief and simply gather data (along the Velocity, Volume and Variety continuum) and wait until analytics technologies (largely not invented yet) could catch up.

In many ways, this data hoarding was done under the “data is the new oil” paradigm, which simply fueled ongoing beliefs about the capricious amassing of data. Even then the “oily” elements of big data were noted, but often not well understood, in the overall gold rush.

Data resembled oil because “it’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so, must data be broken down, analyzed for it to have value.” The emphasis on the work that is required to make information useful has been lost over the years, … In the process of simplification, the analogy’s historical ramifications — as well as its present dangers and its long-term repercussions — have been forgotten.

As the simplification above indicates, it became easy for the technical universe to adopt a very Texas-sized mantra where “bigger just is better” – even if no one could precisely explain why. And, if hoarding data was viewed as innocuous, then the can could simply get kicked down the road for the next generation to deal with.

This “sort it all out later” belief system seemed to work in the U.S. for the better part of two decades, while Americans watched as our European counterparts began a crusade against the proliferation of data, seeking to establish data privacy as a fundamental human right. GDPR spawned state-side legislation, with California passing the CCPA and revamping it almost immediately with the CPRA.

The proliferation of data privacy regulations in America has now become a tidal wave with everyone seeking to set standards. The key, once legislation is passed, will be to deal with enforcement.  This was one of the key changes between the CPRA and the CCPA, as evidence by the new California Privacy Protection Agency (CPPA) that will soon be established.

Escalating enforcement of privacy laws is sure to garner corporate attention worldwide and clearly will force action. But it does beg the question about where to start…

The key point is that there needs to be a philosophical about-face, and ultimately a rejection of the traditional big data mantra. Minimization will quickly become the new black and organizations will need to adopt true data governance frameworks. Those that urge consumers to simply “get over” their privacy concerns will likely be visited by a state/local/international regulator about their mishandling of consumer data.

Good Data, Bad Data

Collecting lots of data is easy, that’s why organizations have ended up with so much of it. But collecting data without governing it, leads to bad data, not just big data.

To quote Daniel Keys Moran:

You can have data without information, but you cannot have information without data.

Most information tends to be unstructured by nature which makes it hard to categorize and map when it’s collected and stored. This challenge is complex because data normally comes from external sources, often making it difficult to confirm its accuracy.

Privacy mandates have made organizations stop and think about what data they are collecting and why. The cost per record breached makes holding onto stale and unnecessary data an increasingly risky, not to mention an expensive, endeavor. Besides, big data only provides value if it’s accessible and clean.  An Experian report found that on average, companies around the world feel that 26% of their data is “dirty.” Not being able to leverage data for actionable insights leads to huge losses in productivity.

Best Practices for Cleaning Data  

  1. Remove duplicates: No one needs 27 copies of the same file – yet that’s the reality for most organizations. File analysis software can identify duplicate data to ensure that sensitive data isn’t making its way onto shared folders.
  2. Classify data: Not all data needs to be given the same level of protection. A spreadsheet about an office potluck does not warrant the same attention as a file with social security numbers. Data classification helps to provide context around your data so that it can be organized into categories based on file type and metadata. When your data is properly classified, it can be secured appropriately.
  3. Back up data: Once you have removed your duplicate data, you need to ensure that your data is backed up. If your data is backed up, you are less vulnerable should you be hit with a ransomware demand.
  4. Implement enterprise-wide data governance: If you don’t stop the stream of dirty data flowing into your organization, you’ll struggle to leverage it for actionable insights. Work with multiple departments including IT, Legal, Privacy, and compliance to come up with robust data governance programs, which includes harmonizing all the data across departments.

Limiting the retention of personal data is imperative in the new age of breaches and data privacy regulations. The value of data decreases very quickly. Collecting and storing data “just in case” and relying on the “big data” mantra is no longer a viable option. It’s time to get proactive and put data governance and stewardship at the top of your business priorities. A major leak of sensitive data won’t just have regulators knocking at your door, you’ll have also lost the trust of your customers. Why leave that to chance?