What Is Dark Data & Why Does It Matter?

Inventor of the World Wide Web, Tim Berners-Lee, said, “Data is a precious thing and will last longer than the systems themselves.” Dark data is the data that businesses aren’t aware of. It’s collected but isn’t analyzed. Isn’t used. Yet, data is currency, and it’s a vital source of information with the potential to transform business operations.

Convert unstructured data into actionable, structured data.

Living in a data-driven culture, it’s crucial that organizations increase the amount of data analyzed and shared to include dark data. A subset of big data that is more often than not, ignored.

Major strategic insights can be found in business transaction documents. Purchase orders, sales orders, invoices, credit notes, receipts, goods received notes, despatch notes, quotations, financial statements, call center logs, social media posts, customer feedback surveys, emails, web server logs. But, this dark data is hidden and ignored.

A knowledge management strategy helps a company use all its data, leading to improved business results. Starting with data collection, it fosters data-driven decision making through corporate information systems and data analysis, business intelligence, and data visualization.

McKinsey says, “data-driven organizations are 23 times more likely to acquire customers, 6 times as likely to retain customers, and 19 times as likely to be profitable.”

Fail to exploit dark data and businesses risk falling behind in their industry.

How much data?!!!

We spend hours and hours on the Internet. Browsing websites, checking and answering email, downloading documents, subscribing to newsletters, posting on social media. Every single move online creates data. 

Any clue as to how much data is created on a daily basis?

3,500,000,000,000 bytes. That’s 3.5 quintillion bytes of data every day.

And dark data?

Roughly 50% of a company’s data is dark. That’s 50%’s worth of valuable insights… lost. 

What is dark data?

According to Gartner, “dark data is the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

Dark data is the data that businesses collect and store, but they don’t analyze, process, or use any of the insights hidden inside for their decision making. 

Big mistake. Why don’t they dump it?

Many organizations keep dark data for compliance reasons. While some think it could prove useful in the future, if they ever work out how to process it.

Dark data is often captured alongside purpose-driven data and companies may not be aware they’ve collected it. It can contain sensitive, vulnerable, personal, and regulated information. But as the dark data is undetermined, it’s often overlooked when it comes to the regulatory processes put in place for compliance. 

Look out! Cyberattack.

Safeguarding your data is important. The potential value of dark data insights means that you need to balance the security costs with the value.

Benefits of dark data

Businesses can spend big money collecting and analyzing new data. Looking for insights that they already have in stored dark data. If only they had the right technology.

Is that you?

The insights and business intelligence hidden in dark data can provide companies with a more detailed understanding of their customers, market trends, product performance, and their internal operations – production and supply chain delays. Helping them make more informed decisions to improve their performance and growth.

The patterns and trends identified in dark data will show a business its customers’ buying habits, preferences, sticking points, so it can create a more personalized customer experience and increase customer satisfaction.

Dark data also gives a clear vision of the competitive landscape, so companies can find new opportunities to differentiate their products from the competition.

How dark data is created

Dark data is created when data is collected but isn’t used or analyzed. Businesses can ignore this data as it’s considered out of date, incomplete, or redundant. Some don’t even know it exists.

Dark data is generated as a result of users’ daily interactions online. Including…

Unstructured data

Unstructured data differs from conventional data models. This makes it hard to store and manage. It’s collected from emails, documents, social media posts, video and audio files, mobile activity, satellite images, surveillance images, etc.

While structured data shows you what’s happening, unstructured data tells you why. Unstructured data is the driving force behind predictive, end-to-end automation, and artificial intelligence.

Server log files

Text documents containing all the activity related to a web server – IP address, data and time of request, name and location of requested file, size.

Machine data

Digital information created by activities and operations on networked devices – mobile phones, tablets, computers, connected wearable products, embedded systems.

Data silos

Data silos are collections of data held by multiple teams in an organization. Unfortunately, this leads to data that’s not easily accessible or shareable across the company.

No data governance

A lack of data governance happens when a business doesn’t have policies and procedures in place to manage data. This can lead to the collection of data that sits unused.

Relying on legacy systems

Businesses that rely on old technology that’s not compatible with new technology will struggle to access or use data that’s stored on newer systems. 

Types of dark data

The data in your business can be categorized as…

Critical business data

Data that you consider critical to the operation of a business and ongoing growth and success, or data that has to be kept for regulatory purposes.

Redundant, obsolete, trivial (ROT)

Data stored in internal networks that’s no longer relevant and can be marked for deletion.

Dark data

Hidden data that companies don’t know they have. Don’t use. Know they have but don’t know how to access. Posing potential security risks.

Sources of dark data

Dark data can be found in…

Unstructured data

A.K.A. dark data, it isn’t held in a database or data structure. Information that businesses collect, process, and store during business activities. 

Examples of unstructured data include audio and video files, email messages, text-only files, images, digital photographs, books, PDFs, product reviews, surveillance camera recordings.

Sources of unstructured data

Social media, email clients, websites and file logs, instant messaging systems, media viewing tools, location or geolocation data – GPS, weather satellites.


Data that can be textual or non-textual. Examples of structured data include dates, times, phone numbers, banking/transaction information, social security numbers, names, addresses, email addresses, product prices, serial numbers. 

Sources of structured data

Reservation systems – hotel, airlines, point of sale software, online forms, medical devices, customer relationship management systems – CRM, enterprise resources planning systems – ERP, financial data warehouses.


A form of structured data that doesn’t follow the tabular structure of data models. It’s not captured or formatted in a conventional way. It does contain some structural elements such as tags and organizational metadata, that can make it easier to analyze. Examples include HTML code, emails, invoices, graphs, tables, XML documents.

Why you should care about dark data

There are several reasons why you should be concerned about dark data…

Data analysis

Unused dark data will seriously stunt your business’ ability to perform effective data analysis. The analytics tools you’re using are only doing half a job if they can’t access all your data. Reducing accuracy.

Data potential

Dark data analysis will identify insights about your customers, prospects, operations, and more. Insights that may not be available in your structured data. Insights that can include…

  • What affects investment trends
  • Network security and activity patterns and trends
  • Time visitors spend on your website
  • When a visitor exits a web page
  • User feedback through call-in transcripts
  • What affects consumer behavior


Full analysis of your data will help your business develop products and services to outsmart your competitors. Ensuring you don’t miss new opportunities or trends.

Customer experience

How do your customers and prospects see your brand? Your products and services? Understand this and you can improve the customer experience, reduce churn, and increase sales.

5 ways to use dark data for a personalized customer experience – CX

  1. Analyze customer behavior patterns and preferences using dark data to create personalized experiences.
  2. Use dark data to identify customer segments and create targeted marketing campaigns.
  3. Combine dark data with structured customer data to gain a comprehensive view of customer interactions.
  4. Use machine learning algorithms to predict customer preferences based on dark data insights.
  5. Continuously monitor and update customer profiles using dark data to deliver personalized experiences in real time.

Dark data often includes personal identifiable information – PII, payment information, health data, financial data, etc. This data is highly regulated and if you don’t safeguard it, you’re going to suffer the consequences.

Internal security

With data breaches occurring more and more, the sensitive data hidden in your dark data is vulnerable. If you don’t organize and secure, you’re potentially exposing your customers to theft of their personal data, which can lead to identity theft.

Optimize the value of dark data with AI

Finding the value in dark data is difficult for companies that still work with manual processes. Translating unstructured data into coherent insights is more achievable and cost effective with automated business processes.

Check out my Digital Process Automation vs Manual blog to learn more about why automation is the only way to go.

Artificial intelligence – AI – and machine learning help businesses find, analyze, and secure their dark data. They will also support data management, ensuring non-compliance and security risks are identified quickly.

AI solutions such as Google’s Cloud Vision, Microsoft’s Azure Cognitive Services, AutoML, Amazon’s Textract, and IBM’s Datacap process dark data. Skills in AI, machine learning, deep learning, Python, Java, natural language processing – NLP, MLOps, will be essential for businesses across industries looking to benefit from dark data insights.

Machine learning

Machine learning is an analytical tool that allows systems to learn and complete tasks. With dark data, ML looks for patterns, notifying users when an exception is found. The user’s behavior – react or ignore – teaches the system so that it’ll automatically provide a similar solution when this event is presented again.

Cloud, cognitive analytics, and pattern recognition using machine learning ensures that dark data analytics is now a thing. 

Dark data analytics

Dark data analytics is technology that companies use to find unknown data so its value can be used to inform better business decisions.

Organizations that prioritize mining dark data will reduce risk and unlock business insights that’ll help them grow and increase revenue.

Intelligent Document Processing (IDP)

Robotic process automation – RPA – bots are programmable algorithms that perform tasks in a digital environment. 

Their goal is to automate repetitive, mundane tasks. Performing tasks that would previously have been completed by humans. 

These digital workers, along with AI and computer vision, enable intelligent document processing solutions to capture, classify, extract, validate, and export data from documents – PDFs, scanned documents, email, etc., and convert this unstructured data it into actionable, structured data. 

The data can then be uploaded to other systems to automate business workflows, centralize information, perform data analysis, and ensure data is easily accessible.

Automated document processing is particularly beneficial to industries that deal with high volumes of paperwork such as logistics, finance, legal. IDP solutions help businesses automate manual data entry workflows. Increasing productivity, reducing errors, and freeing up teams to work on more strategic tasks. And… provide meaningful insights into business operations that were previously hidden in their dark data.

Examples of dark data by industry

Dark data, or unused data, is present across most industries, but tends to be more prevalent across traditional industry verticals, such as supply chain, logistics, manufacturing. It can include log files, ex-employee information, customer information, financial statements, presentations, emails, inactive databases, call center transcripts, customer reviews, survey data. 

In pharma and healthcare overall, dark data can be used to drive R&D in the early stages of drug development. Dark data insights can identify optimal conditions and processes to replicate while efficiency is increased. Errors in drug development can be found quickly to avoid rollback.

Industries such as travel & hospitality and retail rely on detailed knowledge of their customers and prospects. While they collect surface level data – name, address, phone number, etc., dark data is prevalent in surveys and customer reviews. Feedback invaluable to businesses looking to improve their product/service. Data that enables them to personalize their marketing campaigns and increase customer loyalty.

In an accounts payable department the dark data is hidden in old transactions, CRM details, account history, etc. This unused data is considered risky because it’s sensitive and private information.

In banking, when a customer opens a new account, the important data is the customer details and whether they’re eligible. What’s ignored, but valuable, is the customer journey. The journey the user took to find the application page.

Dark data FAQs

What are the challenges of dark data?

While businesses collect dark data, it isn’t analyzed or used. Storage costs mount up, while the sensitive nature of the data makes it vulnerable to cyberattacks.

What are the advantages of dark data?

Dark data gives business a comprehensive view of customer habits, product usage, business performance, customer preferences. Providing valuable insights to marketing, product dev, customer support, and R&D teams.

What is the future of dark data?

That depends on us humans, and whether we get around to using new technology to analyze dark data and pull out the insights.

What is a dark data example?

Dark data examples include financial archives, raw data, customer information, account information, video and audio files, log files, emails.

What is the dark data type?

Dark data is part of big data, but despite being the biggest part, it’s not used.

Unleash your dark data

While businesses agree that dark data contains valuable insights, accessing, analyzing, and implementing this data is a daunting challenge. It’s also expensive as storage is costly and non-compliance fees can hit.

But, this should be balanced with the vast and valuable potential of dark data that can give your business the competitive edge you need to grow.

It’s essential you identify your dark data and unleash its value. Invest in new processes, skilled employees, and new technologies. You’ll see that the benefits outweigh the costs. 

If you’re looking to invest in technology that will help you release the potential of your dark data, check out Rossum’s AI document processing solution.

Set Your Dark Data Free!

Intelligent document processing unlocks the potential of dark data by transforming it into actionable, structured data ready for analysis.