So, most companies define Big data as Volume, Variety and Velocity. I add one more key element that comprises “Big Data.” Complexity.
Volume, Variety & Velocity along with Complexity makes up Big data. You might think that Variety covers complexity, but it doesn’t. Making social media data and other big data work with your business to provide value, involves a lot of complexity.
There are some vendors out there saying that infrastructure is not important. They are wrong. Maybe someday in the future, all data will be in the cloud, but that is not realistic today or in the near future. Every company has an internal infrastructure that includes SQL Server, SAP, Oracle, DB2, etc. Those companies will not be pulling out of internal IT departments any time soon.
Therefore, it's necessary for all big data to co-exist and to work together.
When data needs to be put into a usable format to be integrated with internal data, there are many alignments and rules that need to be applied. There are business rules, code, processes and mappings that need to be written to extract the data, load data, clean data and refine the data. All the variety of data needs to be transformed into a common data type for analysis. There is also meta data, which is data about the data that needs to be managed. Just the processes and data models alone that are designed to align data and put it into a usable format becomes new data.
Volume, Variety and Velocity focus strictly on the source, but data that makes the data
usable with internal data is also new data. That new data needs to be structured, documented, maintained and managed. This is a complexity that adds to “Big Data.”
These are a just a few examples of where CPG companies add to their big data due to complexity. Aligning hierarchy’s, integrating master data with retailer master data, comparing sales with sentiment as well as promotions and pricing. These are just a few examples of the complexity involved in getting more value out of big data.
Therefore, what comprises big data includes volume, variety, velocity & complexity!
We can’t talk about “Big Data,” without talking about Hadoop & MapReduce. So first, what is Hadoop? We'll describe that in next weeks blog, "Big Data, Part 7."