Back in 2013, we posted a Big Data blog explaining that there are structured, unstructured and hybrid forms of data. I was the first to begin incorporating this hybrid data into the concept of big data. Today, it seems that all the experts are now discussing these different forms of big data. In this blog we will explore the different types of data and explain the differences at a high level.
So what's the difference in these various data types?
The first cylinder represents structured data. This includes data from ERP systems, mainframes and data warehouses. Although structured, these data types are structured differently.
ERP data and other transactional systems are structured in a way that allows for easy data entry. Data warehouses and business intelligence solutions are structured in a way that allows for easy retrieval of information. That said, both transactional and analytical systems are structured.
As described in an earlier blog on "Analytical versus Transactional Business Intelligence," ERP and other transactional data sources are designed to RUN your business. Data warehouse and business intelligence solutions are designed to help MANAGE your business. These are data sources typically stored in a traditional database and therefore has structure to them.
The second cylinder contains unstructured data. This is data often out there on the web. This includes social media data that includes things like “Tweets” and “Comments." Unstructured data also includes activity, including searches, followers, social authority and clicks.
Big Data started getting a lot of press back in 2013. It started in June with “The Wall Street Journal" article that the NSA, America’s National Security Agency, was obtaining a complete record of all Verizon customers and their calling history, including all local and long distance calls within the US.
This, rightly so, made a lot of people upset. The idea that the government is listening in on our calls means a potential invasion of privacy. Government claimed it tracked and used this information to help identify terrorists. We hope that’s true. But the fact that they have the capability and are monitoring this information is unsettling.
Big data also come up that year in association with the monitoring of certain journalists calls and activities. Big data was also related to the IRS scandal which required search capabilities that would target certain non-profit, applications. Regardless of political affiliation, most people found this disturbing because targeting groups for political gain is wrong.
Monitoring these activities requires the government to leverage big data. Right or wrong, for good or for bad, for profit or political aspirations, the capability to capture and leverage big data does exist. The crazy part is, we’ve been talking about this for years. Before 2013, everyone thought those of us who understand data mining and big data, and voiced our concerns, were thought to be paranoid. In just a few short years, most people understand and accept it as reality.
As it pertains to business, most companies leverage big data to target market and to manage their brands, sales, profits and company reputation. Either way, technology exists today that allows us to track and monitor and profile just about whatever and whomever we want.
The last cylinder represents multi-structured data or hybrid data. A lot of data sources can fall into this space.
For the purposes of a consumer goods manufacturer, I used common outside data sources in the cylinder to represent hybrid data. Point of sale (POS) data for example, comes in from multiple retailers with varying data elements at different times of the month. Relational Solutions works with POS and syndicated data all the time. Integrating outside data sources with internal master data is very complicated without an enterprise foundation to manage data issues. Even one retailer could have multiple ways of providing POS data.
Target is a good example of the ways in which POS data can arrive. If you are vendor for Target, you might get POS data in an EDI 852 file. You might also get POS data from Info Retriever or Partners On-line. In addition, you might purchase data from A.C. Nielsen or Symphony IRI. All these sources contain different data elements. But they also all contain point of sale (POS) data in many formats that require data manipulation in order to work with each other and with internal master data.
EDI files are structured. However, although it’s supposed to be standardized, it is not. Different retailers provide different data. Rules aren't followed. Files can be missing days or data elements. EDI from one retailer will be different from another retailer. Also, EDI from Target today, might be different than the EDI coming from Target was last year. There could also be missing data, duplicate data or recast data. We classify this as "hybrid" data because of the inconsistent, lose, structure of the data and all the work around it required to make it work well with other data. It’s structured, but not in a usable format.
Data also has different hierarchy's, inconsistent week end dates, etc. Outside data needs to align with your internal hierarchy’s and calendars. It also needs to be aligned with outside data sources like weather trends, currency conversion, A.C. Nielsen, Symphony IRI, NPD and other data sources.
These are just a few examples of data issues that arise from outside data sources. In other words, there is some structure to it, but the structure needs to be altered to be managed, integrated into other sources and ultimately provide more value.
Watch for next week blog where I explain in more detail the way big data is further defined and described by the industry. Request your free consultation.