It seems today that Big Data analytics is everywhere -- from Starbucks, which uses location-based services to (yes, purposely) place its coffee shops just blocks away from each other; to Free People, which uses customer analytics to design the following season’s collection; to Orange Telecom, which uses network statistics to improve overall customer experience.
And, these are just a few of the thousands and thousands of major corporations and small businesses that are harnessing Big Data, meaning in this day and age, we are amassing and storing more data than ever before. But what is dark data?
Well?
Just like with anything else, when you accumulate and stockpile so much technical information, there’s bound to be some obsolete, valueless data in the mix as well. Think of the items that hobbyists collect; sure, some things are sentimental and valuable, but most of it is just plain useless. (Or is it...?) In simple terms, this is Dark Data.
It was defined by Gartner not too long ago as: "the information assets that organizations collect, process, and store during regular business activities, but generally fail to use for other purposes". It was dubbed Dark Data because of its similarity to Dark Matter, the mysterious non-luminous material suggested to exist in space; like Dark Matter, Dark Data is unnoticed, glanced over, forgotten, and multiplying.
It’s typical for businesses to hold on to the majority of the data they collect, even though a large chunk (90 percent!) of their data is considered at some point to not necessarily serve them any purpose. This irrelevant Dark Data includes anything that’s outdated or unstructured, such as: expired customer information, log files, account information, previous employee data, financial statements, raw survey data, email correspondences, notes or presentations, old versions of relevant documents, and much more.
But what if Dark Data holds seriously valuable information that isn’t being utilized, information that until now, couldn’t be retrieved or analyzed due to technological constraints or because of human misconceptions? What if all this Dark Data beholds can lead to identifying the root cause of cyber attacks, or if properly analyzed -- is an undiscovered gold mine for potential revenue streams?
Why Is Dark Data a Problem?
There are two major problems with Dark Data: storage costs and security risks.
First, storage costs: amassing useless data is expensive and costly. Imagine just how much storage the organizations would need to purchase as they continue to collect information they ultimately don’t use. In the age of Big Data, storage doesn’t come cheap.
Second, security risks: with all that information sitting there unprotected, there is a grave risk that it could be hacked and stolen. After all, just because the data is outdated and unstructured, doesn’t mean it didn’t once hold value: for example, there could be confidential financial information like credit cards and account data, patient records, business trade secrets, practices, etc. If cyber criminals were to get a hold of this information, the damage to businesses could be severe in terms of reputation, liability, and regulatory violation, especially as privacy issues are under intense scrutiny.
How to Manage Dark Data
Fortunately, there are several ways to manage Dark Data. This isn’t to say that organizations should trash any and all information they’re not currently using, but rather to implement a plan that mitigates the potential risks.
1. Encrypt Everything
If cybercriminals attack and gather your Dark Data, you better hope that company information, no matter how important, is strongly encrypted. This includes data on the in-house server as well as data being stored offsite, in the cloud, or in-transit. This way, if somehow cybercriminals do acquire the Dark Data, they’ll have a difficult time accessing its confidential contents. That said, Dark Data should never be readily accessible. If some managers require access, there should be specific procedures and protocols outlining who is able to access and for what reason.
2. Implement Data Retention Policies
If organizations adopt data retention policies, then perhaps Dark Data would not be the major concern that it is. Such policies determine which types of data should be retained and which should be destroyed, and, if they’re meant to be destroyed, the policies outline specific manners in which to do so.
The policies may also ultimately discourage organizations from amassing such large amounts of useless data, saving them storage costs and mitigating security risks. If they do choose to retain some Dark Data, they might consider for example accessing and analyzing their log files using new emerging technologies that do not require expensive probes and are able to compress larger amounts of data on smaller software.
In addition, such policies encourage organizations to periodically rummage through their databases and double check if there is any important information they didn’t recognize at first -- this closes the gap for any missed opportunities and hidden gems. In this periodic audit, organizations should first analyze the Dark Data to understand how much there is, where it is, and what it is; then, organize the Dark Data into groups; and lastly, classify it to determine if it should be archived, destroyed, or retained. This will help organizations focus on the Dark Data that might actually be valuable.
Concluding Thoughts
In short, as we move forward and continue to amass immense amounts of information, it will become increasingly important for organizations to manage this data. This is because, not only does Dark Data bring about storage costs and security risks, but it’s also an untapped resource.
If organizations and companies manage it more efficiently and productively, they can transform their Dark Data into into a Joker card and take a cyber attacker or a major competitor by surprise. Organizations can recognize and monetize an economic opportunity by either reducing internal costs or discovering additional revenue streams. Hopefully, in the near future, more and more organizations will recognize the increasing value of Dark Data.
Evelyn Kotler is the director of Marketing at SQream Technologies.
Published under license from ITProPortal.com, a Net Communities Ltd Publication. All rights reserved.
Photo Credit: xavier gallego morell/Shutterstock