Subscribe via E-mail

Your email:

Follow Me

Our Blog

Current Articles |  RSS Feed

Big Data Part 6


So, most companies define Big data as Volume, Variety and Velocity. I add one more key element that comprises “Big Data.”  Complexity.

Volume, Variety & Velocity along with Complexity makes up Big data. You might think that Variety covers complexity, but it doesn’t. Making social media data and other big data work with your business to provide value, involves a lot of complexity.

There are some vendors out there saying that infrastructure is not important. They are wrong. Maybe someday in the future, all data will be in the cloud, but that is not realistic today or in the near future. Every company has an internal infrastructure that includes SQL Server, SAP, Oracle, DB2, etc. Those companies will not be pulling out of internal IT departments any time soon.

Therefore, it's necessary for all big data to co-exist and to work together.

When data needs to be put into a usable format to be integrated with internal data, there are many alignments and rules that need to be applied. There are business rules, code, processes and mappings that need to be written to extract the data, load data, clean data and refine the data. All the variety of data needs to be transformed into a common data type for analysis. There is also meta data, which is data about the data that needs to be managed. Just the processes and data models alone that are designed to align data and put it into a usable format becomes new data.

Volume, Variety and Velocity focus strictly on the source, but data that makes the data
usable with internal data is also new data. That new data needs to be structured, documented, maintained and managed. This is a complexity that adds to “Big Data.”

These are a just a few examples of where CPG companies add to their big data due to complexity. Aligning hierarchy’s, integrating master data with retailer master data, comparing sales with sentiment as well as promotions and pricing. These are just a few examples of the complexity involved in getting more value out of big data.

Therefore, what comprises big data includes volume, variety, velocity & complexity!

We can’t talk about “Big Data,” without talking about Hadoop & MapReduce. So first, what is Hadoop? We'll describe that in next weeks blog, "Big Data, Part 7."

Big Data Part 5


As described in the last two blogs, Big data comprises volume & variety. But it also includes velocity and another key characteristic that will be described in this
and the next blogs. In this blog we examine the fact that big data gets even bigger when you start to consider timing.

It’s not just Volume and Variety, it’s velocity. The speed with which data is coming in. ERP data is updated every second of the day by multiple users across the company. POS data can come in daily, weekly or monthly. Sometimes more often if your directly able to download information from retailers portals. Pricing information and zip code information might come in quarterly or annually.

On-line data? That’s an entirely new story. Click stream analysis is happening constantly. Twitter feeds and hash tags happen sporadically throughout the day. For smaller companies new “mentions” might only happen a few times a month. Regardless of frequency, these comments still need to be monitored.

Larger companies, with more popular brands and a wide customer base, will have regular
comments about their products throughout the day. Depending on the number of brands you have, you may have hundreds of “mentions” per minute.

A large consumer good’s company must have the ability to respond to negative sentiment almost immediately. At the very least, your social listening group should be checking your facebook page, at least every half hour to check for negative sentiment that could be on your own Facebook page (if not, this needs to change).

Negative comments on your Facebook page should not be there long enough for others to
“Like” them.

I was recently at a customer who had just implemented a “Social Listening” team about a month earlier. The team consisted of about a dozen people. During the meeting, I logged on to their Facebook page and pointed out to them that they had a very negative comment right at the top of their Facebook page, and that it had been there for over 2 hours. In those 2 hours, that comment had received over 100 “Likes” and several other negative comments were posted along with the first one. Although I advised them to address it immediately, the comment amazingly, remained on their Facebook page throughout the entire three hour meeting. I
could not believe that no one in the room logged in to fix the problem right away, even when I made them aware of it and told them they should address it.

Anyway, had this comment been caught earlier, they could have deleted the comment and
blocked the “Follower.” I actually believe that post was made by one of their competitors. The reason I thought that is because it was a very generic, negative comment. Completely unrelated to a bad experience or recent even. Most genuine negative comments are related to a bad experience or recent event.

Social media has made it very easy to influence a company’s reputation. Managing your
“Social Reputation” is more important today than ever before. But to manage it, you must be aware of it and have the ability to respond.

We recommend tools that will help make you aware of negative comments by automating the monitoring and receiving of alerts when your company or brands are mentioned. Once you are aware of what's being said, you can begin to manage your social reputation. You can then take it to the next level and start analyzing it so you can make those comments "work" for you. 

If you can take those comments and analyze who is saying what, and where that sentiment is coming from, you can start to leverage that social media data to your benefit. Imagine if you could identify where negative sentiment is coming from, read the associated comments and call the store manager to let them know that your customers can't find your product or perhaps the retailers customers aren't coming in because of untidy conditions. Or perhaps there is a picket line that is costing them more business than they realize.

Social media can and should work toward your benefit. It can help save time. It can make you aware of conditions where people are unhappy with you, your retailer, your supply chain, your new marketing, etc. 

The timeliness of these comments, clicks, likes, etc are rapid. But that doesn't mean they should be ignored. There are ways to monitor and integrate this data for your benefit. And if done correctly, the benefit can be great.

So, most companies stop there and define Big data as Volume, Variety and Velocity. I add one more key component to “Big Data.” That’s Complexity. Watch for my next blog, “Big Data Part 6” to learn how “complexity” factors in to big data.

Sign up for a Demo \u0026amp\u003B See How BlueSky Integration Studio Integrates Big Data

Big Data Part 4


“Big Data” is about volume, but it’s more than that... Big data is also about Variety and a couple other characteristics that will be described in follow up blogs…

CPG companies are no stranger to variety. In addition to their own internal variety
of data residing in databases such as Access, Excel, Oracle, main frames, Teradata, DB2, Netezza so forth. You have multiple
applications such as trade promotion management applications, supply chain, manufacturing, planograms, CRM applications, forecasting and a slew of others.

In addition, the variety of data coming in from point of sale (POS) sources include retailer files that include EDI 852 files, EDI 867 files, AS2, flat files, other EDI files, retailer portal downloads and syndicated data from AC Nielsen, IRI, NPD and others. Most companies are also buying competitive market data, demographic information, surveys, weather trends, currency conversion information, and might even be trying to integrate emerging market data. In addition, you might have space information, displays and diagrams that are unstructured or semi-structured.

Those are all examples of various data sources that have existed over the years. Some of these sources are newer than others. But the newest variety of data is coming in via the web. These sources are coming from various applications that track your “Social Reputation,” clicks, and media presence to name a few.

Marketing teams also have ads, including print, on-line ads, tv commercials, radio spots. They might also have online targeted marketing on social media that include offers on web sites, mobile offers, YouTube videos, etc. All these sources are in different data formats containing different information. All of this adds up to a lot of variety.

Big data just got bigger with more variety from the internet. In these last two blogs we discussed volume and variety, but it's also about velocity and one other key characteristic that will be discussed in the next two blogs. Watch for our next blog, Big Data Part 5 on velocity.

Sign up for a Demo \u0026amp\u003B See How BlueSky Integration Studio Integrates Big Data

Big Data Part 3


In the next 4 blogs, we'll explore the characteristics of Big Data.

Volume is one key characteristic!

Data volumes today are incredible. I'll continue to use a consumer goods manufacturer as an example for this series of blogs.

Think about a consumer goods company that has 2000 sku’s that are selling through 100 different retailers. That could consist of 100,000 stores.

Now imagine that every day, each store is sending that CPG company sales information
including what was sold, how many items were sold, the time and date of the sale, potentially the price and potentially even the loyalty and market basket information which would tell them who the customer was and what they bought with your product.

We’re talking massive, massive data volumes on top of the ERP data already available from inside sources.

Now consider sources like the company Facebook page, your LinkedIn page, Twitter feeds about your company and brands, your YouTube commercials, and so on. Your talking huge
data volumes.

I recently heard a supply chain expert define big data as a Petabyte. We all kind of chuckled at that because this came from an analyst who knows about supply chain reports but has zero experience in data warehousing, databases or anything related to IT infrastructure. Relational Solutions has unsurpassed experience working in very disparate IT environments. A petabyte is a number. But just because that’s a big number doesn’t mean a terabyte isn’t big data to another company.

Volume to one company or even to one individual can cause an issue even if the data volume is
the same as that of another company, who has no issue dealing with that same volume. Every company has different environments, different users and different ways of managing data. So even smaller amounts of data can cause issues for one company and not another. To apply a specific number to big data is irrelevant.

That said, volume is one characteristic of “Big Data.” Look for our next blog that discusses

Watch our Big DataTraining 101

Big Data Part 2


"Big Data Part 2" builds off my earlier blogs called “Before Big Data” and “Big Data Part 1.”

In this blog we will explore the different types of data and explain the differences at a high level. I thought of breaking this blog into three blogs due to length, but felt the subject matter was better served in one article.

So what's the difference in these various data types?

The first cylinder represents structured data. This includes data from ERP sytems, mainframes and data warehouses. Although structured, these data types are structured differently.

In my earlier blog, "Big Data Part 1," I separated these structured data types into two separate circles. That's because they are structured differently.

ERP data and other transactional systems are structured in a way that allows for easy data entry.

Data warehouse and business intelligence solutions are structured in a way that allows for easy retrieval of information. This is why I had them in separate circles on the previous blog. That said, both transactional and analytical systems are structured.

As described in my blog on "Analytical versus Transactional Business Intelligence," ERP and other transactional data sources are designed to RUN your business. Data warehouse and business intelligence solutions are designed to help MANAGE your business. These are data sources typically stored in a traditional database and therefore has structure to them.

The second cylinder contains unstructured data. This is data mainly found out there on the web. This includes social media data that includes things like “Tweets” and “Comments." But unstructured data also includes your activity, including your searches.

The internet captures a lot of different activity. Today, your social authority or clout can be tracked by determining how many followers you have and how many people follow you and how many times things you post are reposted, etc. Different applications apply different algorythms, but social authority is tracked in a variety of ways.

Authority can be tracked based on the number of people you have the capacity to influence. Someone with 100 followers does not have the same clout as someone with 3000 followers for example. However, someone with 3000 followers who is never on-line commenting, compared to someone who has 500 followers and regularly posts or tweets what they hear, could have a higher ranking authority level.

Big Data received a lot of attention in the press this summer. There were a lot of concerning stories. In June, "The Wall Street Journal" published an article that the NSA, America’s National Security Agency, was obtaining a complete record of all Verizon customers and their calling history, including all local and long distance calls within the US.

This made the news because it made a lot of people upset. The idea that the government is listening in on our calls means a potential invasion of privacy. Government claims it tracks and uses this information to help identify terrorists. We hope that’s true. But the fact that they have the capability and are monitoring this information can be unsettling.

Big data has also come up in recent stories associated with the monitoring of certain journalists calls and activities. In addition it is related to the IRS scandal which required search capabilities that would targeting certain non-profit, applications. Regardless of political affiliation, most people found this disturbing because targeting groups for political gain is wrong.

Monitoring these activities requires the government to leverage big data. But right or wrong, for good, for bad or for profit the capability to capture and leveraging big data does exist.

Most companies leverage big data to target market and to manage their brand and company reputations. Either way, technology exists today that allows us to track and monitor and profile just about whatever and whomever we want.

The last cylinder represents multi-structured data or hybrid data. A lot of data sources can fall into this space.

For the purposes of a consumer goods manufacturer, I used common outside data sources in the cylinder to represent hybrid data. Lets use point of sale data for example. Point of sale (POS) data comes in from multiple retailers with varying data elements at different times of the month. Even one retailer could have multiple ways of providing POS data.

Target is a good example of the ways in which POS data can arrive. If you are vendor for Target, you might get POS data in an EDI 852 file. You might also get POS data from Info Retriever or IRI. In addition, you might purchase data from A.C. Nielsen or Symphony IRI. All these sources contain different data elements. But they also all contain point of sale (POS) data.

Let's start with the POS data coming in from an EDI file. That EDI file is structured. However, although it’s supposed to be standardized, it is not. Different retailers provide different data. Rules aren't followed. Files can be missing days or data elements. EDI from one retailer will be different from another retailer. Also, EDI from Target today, might be different than the EDI coming from Target was last year. There could also be missing or duplicate data. In addition, retailers often "recast" data, etc. We classify this as "hybrid" data because of the inconsistent, lose, structure of the data and all the work around it required to make it work well with other data.

In addition to missing or invalid or duplicated data. Data has different hierarchy's, end dates, etc. Outside data needs to align with your internal hierarchy’s and calendars. It also needs to be aligned with outside data sources like weather trends, currency conversion, A.C. Nielsen, Symphony IRI, NPD and other data sources.

These are just a few examples of data issues that arise from outside data sources. In other words, there is some structure to it, but the structure needs to be altered to be managed, integrated into other sources and ultimately provide more value.

Watch for my next week blogs where I explain in more detail the way big data is further defined and described by the industry.

Watch our Big DataTraining 101

Before Big Data


Before Big data we had mainframes, ERP systems and data warehouses (and by the way, we still do). You could make the claim that big data started in the 1950’s with IBM’s “Big Iron” and “Big Data Processing” to handle mixed work-loads.

Or you could say big data started when Oracle coined the acronym, VLDB back in the 90’s to describe "very large databases." I can't decide which is bigger, “very large” or “big.”

Or was it Teradata, who back in 1992, built the first system over 1 terabyte for Walmart? This was the biggest implementation for its time. Teradata called their platform "massively parallel processing" or "MPP." I don’t know about you, but I definitely think “massive” sounds a lot bigger than “big.”  

Big, large, massive…regardless of the adjective used, database companies have been in the “big data” business for years. But “big data” today is NOT the same as “big data” from ten years ago. Today, there has been a full blow explosion of data.

One of my customers recently said “If you ask 50 people what “Big Data” is, you’ll get 50
answers.” My goal in this blog is to first explain the evolution of big data and in a series of blogs and help clarify some of the confusion.

Big Data encompasses many areas, starting with internal data in both transactional and analytical systems.

In transactional systems, the data is constantly changing and being updated. In analytical
systems, the data warehouse is typically updated once a day or sometimes more frequently. In analytical systems such as the enterprise data warehouse, companies are analyzing data, attempting to learn more about their customer, their buying patterns, their behaviors and how to best market to them.

Simply put, analytical systems are designed to help “manage” your business.
Transactional systems are designed to “run” your business.

The big data explosion started with data from applications designed to run your business. Mainframes were first on the scene. ERP (Enterprise Resource Planning) applications, really took off from the 80's & 90’s. Companies like SAP, Oracle, Microsoft, Sas, Infor, JDA & JDE all offer ERP solutions. They include applications for manufacturing, logistics, invoicing, order placement, call centers, etc. These applications also have reports associated with each of
their applications.

In the early 90’s companies started getting serious about using the data to improve knowledge, business processes and profits. All the buzz words back them evolved from Decision Support Systems (DSS) to Executive Information Systems (EIS) to data warehousing and business intelligence. Unfortunately, I’m old enough to remember these things.

Over the past ten years, we started seeing cooperation and data sharing between partners. I once felt like a missionary in the consumer goods space back in the 90’s trying to explain the value of sharing point of sale data. Retailers thought I was crazy and manufacturers said it will never happen. But today, they finally get it. Today they understand the value. Some more than others, but it’s finally caught on.

As more and more outside partners and vendors began offering new data and insights, we were
able to leverage that data through an architecture that allows new data sources to be integrated within a company’s existing data warehouse. That’s why Relational Solutions always stress the importance of a solid infrastructure.  

Some of these outside data sources include point of sale data, EDI files, syndicated data from companies like IRI, Nielsen and NPD, panel data, demographic data, currency conversions, weather trends and other sources.

New data sources are being shared every including loyalty data and emerging market data. Wholesalers, distributors, brokers and other selling partners are also starting to share data (not just reports).

Most companies just bought reports in the past. They didn’t understand the full value of having an infrastructure in place to leverage all data including future data, but that is starting to change. In the past, business users who had budget, would just go out and buy reports. Infrastructure didn’t matter to them and they didn't understand it. But as the market evolves, companies and people are maturing in their understanding of how important it is to have a big data infrastructure and more and more we see IT involved in those decisions.

Now, the latest evolution of “Big Data.” Combined, it is all big data, but in the pure sense of how software companies refer to Big Data today, they are mainly talking about unstructured data on the web.

See my next blog, "Big Data Part 1" as a follow up to this blog, “Before Big Data.” 


Data Marts and Data Warehouses and Big Data


Data marts and Data warehouses are often confused. Simply put, an enterprise data warehouse is the union of all marts. But that depends greatly on the underlying architecture.


A data mart can be stand alone reporting solution or it can be soundly integrated into an enterprise data warehouse.


Relational Solutions has been building enterprise data warehouses since the mid 90’s and pioneered the concept of an incremental, iterative approach.


This approach allows companies to get a fast ROI (return on investment) that will address
immediate needs of the business users. It will also provide a foundation that will allow you to get incremental benefits as new data is integrated. The design withstands the test of time and lets the data warehouse grow with your business and with the evolution of new data sources, including #bigdata.


Unfortunately, most data marts were designed as one off reporting solutions. When designed as
a stand alone, they are often referred to as a “stove pipe” or “silo” of informations.


Today, I hear some so-called, expert, CPG industry analysts use these terms as if this is some new concept. These are not new concepts or new terms. They are just new to these so-called experts. These "experts" are finally understanding what we’ve been preaching to them about the importance of architecture for years.


Data warehousing consultants have used these terms since the 90’s. They’re used to describe stand-alone reporting solutions. Typically these stand alone solutions are developed by individual teams or departments.

These groups develop “silo’s” or “stove pipe” reporting databases to achieve a specific goal that they were unable to get financial approval for. If they have a need for something that you can't get approval for, you resort to building something on your own. It happens in every company and every department.


That said, all data marts are not created equal. Some are in access, some in spreadsheets, some are in SQL Server or Oracle. Some are silo's and some are not. Data marts do not have to be silos. Designed correctly, a data mart can be integrated and should be fed from a single staging area where business rules are applied. Thus, a sound data warehouse is the union of all marts being fed by a single source.


Having an infrastructure that stages the data, cross references the data, cleanses it, harmonizes the data, and feeds it into a data model that then feeds subject specific marts
offers the best growth potential. The shared dimensions from one data mart to another provides consistency from department to department.


Relational Solutions are experts in data modeling and offer customized classes and consulting services in this area. Data modeling techniques vary depending on the database target. Data modeling is a big topic that involves too much description for this blog.


In short, designing the data model correctly allows business rules to be applied and data
to be accessed easily by the users. This design also maintains consistency from department to department. It also provides IT with a manageable solution that is designed to evolve over time to accommodate new data sources and new user requirements.

Companies who have a properly designed data warehouse can integrate internal data, outside data and even Big Data.


My next blog will start to explain big data and what makes various data sources different


Learn Mor4e about Relational Solutions Services.



Transactional versus Analytical Business Intelligence


The easiest way to understand the difference between a transactional and analytical system is to think of transactional systems as those applications designed to run your business and analytical systems are those designed to manage your business.

Applications like SAP, Oracle Financials, JD Edwards and JDA for example, are transactional applications. They provide reports, but they tend to be reports from their systems unless you separately acquire their data warehouse modules. In most cases, even their data warehouse modules handle their own data better than other data sources. In general, ERP (Enterprise Resource Planning) systems are systems that are modeled for data entry. They are updated constantly throughout the day.

Reports derived from these systems are reports designed to understand what is going on at this moment. For example, what time did my last truck leave? Is my manufacturing formula set correctly today? What did that last customer complain about? They answer the "What?" not the "What if?" questions. 

These are transactional reports, coming from transactional systems. They are necessary reports required to run your day to day operations. A report pulled from a transactional system at noon will produce different results than a report pulled at 12:01 because the operational system is constantly changing. Even reports pulled symotaneously will likely produce different results. That is because in a transaction system, the route of each query can take different paths. In addition you never know who might be updating the system at any one point in time. 

We call this the “twinkling database effect.” That is because the data is constantly changing.

These “twinkling databases” are fine for pulling operational reports. But trying to produce an analytical report from a transactional system is not wise.

First, the data is formatted for data entry, not data retrieval. Therefore it could can take days to query the system. In addition, an ad-hoc query against a transactional system will effect the performance of that operational system. It will also negatively effect end users using the system. The last thing you want to do is make it difficult for people to enter orders. This could have a direct and negative impact on sales. Not to mention the negative, time wasting effect it will have on other job functions.

Analytical queries against a transaction system will put an undue burden on your network. In addition, it will return inconsistent, and often times, inaccurate results.

That is why data warehouse solutions became a necessity. Transaction systems are designed to RUN the business, data warehouses are designed to help MANAGE the business.

The data warehouse is modeled in a way that business users can easily find and retrieve the data they need. The underlying infrastructure of an enterprise data warehouse (EDW) offers an architecture that will align data, provide business rules and accommodate growth and change in an iterative manner.

Query tools allow for easy analysis and business intelligence. Users need fast access to reliable information with the flexibility to change the view. They need to be able to drag, drop, drill, sort, compare and ultimately learn and act on the information they are receiving.

More and more, we are hearing business analysts referred to as “Data Scientists.” This is because today, they should have the capability to think outside the box with information available to them. Rather than spending their day gathering and cobbeling together reporting information, they can be freed up to analyze it. Today, data integration can be automated and put into a usable format for data exploration.

By leveraging ALL your data, companies and their Data Scientists can understand not only WHAT is happening, but WHY!

The data warehouse is fed by the operational system and typically updated on a nightly
basis. Sometimes more often but most often, nightly. More and more, we see the data warehouse is also being fed by other outside sources. Relational Solutions advocates leveraging all the information you have access to. Unfortunately, not all data warehouses are created equally, so it's not always as easy as it sounds.

I’m pointing out these differences because this all background information needed to understand how companies can use big data. Big data creates a potentially “fuzzy area” for reporting depending on how it’s defined.

In my next blog I'll explain the difference between data marts and data warehouses, and how evolving data sources such as "big data," should be leveraged in your enterprise data warehouse and offer more business intelligence.


What Is Business Intelligence and Why Do We Need It?


Business Intelligence, what is it and why do we need it?

Business intelligence is the ability to make “fact based decisions” based on reliable, integrated, data.

Business intelligence leverages data to provide you with reports and information, and allows users to move away from “Hunch based decisions” to “Fact Based Decisions.”

Business intelligence can come from both transactional and analytical reports. But in the true sense of analytical, business intelligence the data is derived from an enterprise data warehouse. Designed correctly, we call it “The Truth Database.”

Business intelligence can arguably also be derived from “stove pipe” solutions. These are point solutions typically developed within a department to answer specific questions. It can also be argued that ERP reports provide business intelligence. Business intelligence can also be derived from reports that end users had to manually integrate in order to develop reports for management, buyers or others.

Various types of reports are delivered through different means. As purists, we believe
analytical business intelligence should be derived from the data warehouse. But again,
that doesn’t mean all reports are the same. We also recognize that even business intelligence reports, designed for managing the business are derived in a multitude of ways.

Do we really need Business Intelligence? Back in the 90’s business intelligence (BI) was
considered a “luxury.” That’s because building a data warehouse was very costly to build and BI tools were very expensive to license. Today, business intelligence is NOT a luxury, it is a necessity. Your competitors are understanding more about their business and therefore you must as well, in order to maintain your competitive advantage.

Companies in the 90’s built data warehouses to GAIN a competitive advantage. Today it’s needed just to MAINTAIN your competitive advantage. Those companies who were visionaries in the 90’s and recognized business intelligence as a way to achieve competitive advantage are the same companies today who are leveraging big data from other sources to GAIN a competitive edge.

At Relational Solutions we believe it's important to note that operational reports are not the same as Analytical reports. That will be described in my next blog. But this series of blogs will discuss the evolution of big data.

A true analytical, business intelligence application is designed to support management decisions. It includes reports that are derived from a single, queryable source, of reliable, integrated data, fed by a single staging area. It should have applicable business rules established for your business users. In an ideal world, business intelligence is derived from a data warehouse that is designed, based on the union of all marts. The presentation layer should offer fast access to information that’s easily understood by the end users.

Look for my next blog that describes the differences between transactional and analytical reports and delves deeper into the evolution of big data.


All Posts