Which database is most suitable for storing Big Data?

The facts (and therefore your data) that are collected and registered by the processes within your organization must be stored somewhere. This way they can be edited or deleted. Whether your data is on-premise or in the cloud, you need a database in which this data is stored.

There are many different databases to choose from. But here I mention the most common at the moment.

  1. Relational databases
  2. Document Store
  3. Hadoop (Eco Systems)
  4. Graph Databases

This blog explains each of these databases. I will not explain how these databases work under the hood, because that is not relevant for this blog. The goal is to get an idea of what is out there and how you can determine which database is most suitable.

1. Relational Database

A relational database is software made to register, change and delete facts and descriptions in tables as we know it from statistics. The databases are relational because the tables are related to each other (3 V’s). You can see an example of this in Figure 1.

Tabellen en relaties in een relationele database

Figure 1: Tables and relationships in a relational database

Facts can be captured by screens (IoT) or smart devices (Smart Phones, Tablets, Weather Stations, Sensors). To start from a simple example, figures 2 and 3 show an example of employee data that is stored in a database.

Medewerker invoeren

Figure 2: Registering an employee

 

Medewerkers opgeslagen in een database

Figure 3: Employees stored in a database

1.1 SQL

For relational databases, a uniform language has been developed, namely (ANSI) SQL (Structured Query Language). This provides the context for the data. This allows you to ask a question to the database and get an answer in the form of a table.

Basically you first indicate (horizontal) which columns you need (SELECT), then you indicate from which table (FROM). Here is an example: When I want to have all first and last names from the Employees table (see image), I execute the following command in SQL: ”SELECT LNAME, FNAME FROM Employees”

Employees

Table 1: EmployeesThis way I get all the first and last names of all employees. An important characteristic of a Relational Database Management System (RDBMS) is that you search for data via the tables and columns.

The Relational Database is the oldest and least flexible form to store your data. To store the data of a screen in a Relational Database, an underlying table is required. Even when an extra field is added in the screen, you must add an extra column in the database. So with a relational database you determine the data structure in advance.

2. Document Store

In contrast to Relational Databases, the relationship between data structures in a Document Store is stored in the document itself. So the schedule does not have to be known in advance. This means you are flexible in entering your data. And that is a big advantage when you are dealing with unknown data from unknown suppliers.

Relationeel Database & Document Stores

Figure 4: Relational Database & Document Stores

There are 2 types of documents we can store in a Document Store:

  1. XML (Tag based format )
  2. JSON (Curly brackets format)

Not every database can handle both file formats. In general, it does not matter which format you choose. Link blog 3. But it is important to determine what value you want to get from this and how you want this data visualized.

The disadvantage of a Document Store compared to a Relational Database is that requesting data takes more time and is less standardised. (Links more about this in the week 22 blog “SQL vs NoSQL).

3. Hadoop (Ecosystems)

Hadoop is a collection of all possible file formats. It is similar to a folder on your computer where you can put any file type you want. Hadoop works with the Hadoop Distributed File System (HDFS). This allows you to scale up linearly in terms of capacity and performance (scale out). Hadoop is also Open Source. You therefore do not need any licenses to use Hadoop, and that saves a lot of money.

Hadoop Ecosysteem

Figure 5: Hadoop Ecosystems

In order to unlock the data stored in Hadoop, you need other tools. Commonly used tools for this are Hive, Pig, Sprak and MapReduce (link more about this in the blog of week 22 “SQL vs NoSQL).

4. Graph Database

With a graphical database you can easily query and store complex hierarchical and complex structures. The data is stored with a graphical structure, see figure 6.

Graven structuur

Figure 6: Graven structure

The main question

Now that you know which database types there are and how they differ from each other, you can make a choice. The main question is what you want to do with your data and what context you want to give your data. The answer to this question will help you decide which database is best suited for your organization.

Any Questions? Please feel free to contact us.

This form is protected with Google reCAPTCHA (Privacy Policy - Terms of Service)