This is why big data needs context

In this day and age we are constantly recording facts (events). We do this through mobile phones, laptops, smartwatches and sensors (GPS, weather stations, etc.). But this unstructured mass of digital information is worthless without context. The context is analyzing data through standardization and transformation.

In this way we ensure that we can interpret these facts and therefore we can make substantiated decisions.

Big Data

To standardize these facts we need to store them, make them queryable and visual. Compared to ten years ago, we are recording much more facts and the demand for storage is increasing. Moreover, we register the facts in different ways (selfies, posts/tweets and sensors) and there is a chance that facts are temporary. We call this Big Data. It’s big data if it meets the three V’s; Volume, Variety and Velocity.

Volume

The books state that we express the ‘volume’ of data in Terabytes and Pentabytes. We can collect, validate and structure all that data in a Data Warehouse, but in my experience a Warehouse is not always suitable to answer certain issues. You will notice this when the speed of answers to an information question takes longer than desired. More about this in relational vs. non-relational.

Variety

The term Variety indicates the variety in which we record the facts. Traditionally, facts have been provided from source systems (CRM, ERP, etc.), but nowadays it can come from anywhere and in any format. Think of photos, videos, flat information and hierarchical information.

Velocity

These are fleeting facts. These are mainly messages that are queued before they are deleted. You can think of this a bit like the queue in Efteling for the roller coaster. The queue has a certain length and the roller coaster carts are the transport medium responsible for delivering the people safely.

We have Big Data and what now?

As I described before, facts without context are of no value. Context must also be created within Big Data to extract value from this maze of data that can lead to better decision-making. Traditionally, we store these facts in a relational database (more on this later) and create context by transforming the facts into business context (ETL -> Extract, Transform & Load).

But within Big Data we lose a lot of time and also facts (Velocity) when we store this data in tables, as with a relational database. This is not only a waste of time, but also a waste of effort. Because the facts that are stored in the messages often already have a standard structure.

That is why within Big Data we store the facts as they are (“As Is”). Once the data is stored on the Big Data platform, we can start creating context (Transform). This process is called ELT (Extract Load Transform) and is therefore different from the ETL just mentioned.

Make choices between the different databases

So there are different ways to give context to data. The choice for how you store that data and give context depends on which system you use. For this you can choose from various databases, document stores or Hadoop (ecosystems). Unfortunately, it’s not that simple. That is why I will go into more detail in another blog: ‘How to choose the right data supplier‘. Here I explain the differences between these systems and how you can determine which system best suits your needs/situation.

Any Questions? Please feel free to contact us.

This form is protected with Google reCAPTCHA (Privacy Policy - Terms of Service)