There is a wide range of options for where and how you want to store your data. You can spend hours trying to find out which supplier is the best for your organization. Is it MongoDB, MarkLogic, Hadoop or maybe Azure or Google Cloud? In a previous blog ‘Which database is most suitable for storing data‘ I developed a matrix for assessing the various suppliers. And in this blog I share the outcome with you.
MONGODB EN KEY VALUE STORES
MongoDB is a document store based on json structure. It has its own language that queries the database. The big advantage of this tool is that it has a large community and connects to most of the development languages used. But if you want to use it more broadly, you have to make a number of translations in order to extract value from your data. The provider also has a community edition that you can install for free. And the installation itself is also quite simple.
A disadvantage is that after the installation you only get the database and no development environment. Furthermore, in my experience it is difficult to integrate with other applications without the intervention of other tools.
The price of MongoDB for 1 TB is 10,000 euros per year. It is therefore on the expensive side for a fairly simple database set up.
PRODUCTS SIMILAR TO MONGODB
Comparable systems are Key Value stores instead of document stores (object). These are databases that store data based on a key and a value (key and value). The advantage of this structure is that you have no lost storage and that all data is stored in a large table. A disadvantage is that, just like with a relational database, you cannot store photos, audio or video files. You must use a Hadoop or another BLOB (Binary Large Object) store to do this.
COMPLETENESS AND INTEGRATION
After installation you get a complete web oriented development environment. The advantage of this environment is that you do not have to install anything on the client environments. It is therefore very suitable for thin clients (Citrix environments). In addition, MarkLogic has gone to great lengths to enable integration with relationally oriented systems (ODBC drivers), reporting tools (Power BI, Tableau and Cognos) and Hadoop
The price per node is 18,000 euros per year per machine. They also have a version for developers that can be used for free.
MongoDB and MarkLogic are based on a standalone architecture. This means that no other applications are required to get started. For Hadoop, you also need other applications to get started. Personally, I find this a disadvantage, because knowledge of the individual applications is required. You also need to be able to configure for it to be able to communicate with Hadoop and display meaningful information.
The Hadoop Ecosystem relieves developers. They do not have to worry about the configuration and installation of these systems. These ecosystems can only be obtained in the cloud. And you have to pay for each individual application.
The following (Cloud) products use the Hadoop architecture:
- SAP Vera;
- Azure Data Lake;
- Oracle Big Data;
COMPLETENESS AND INTEGRATION
Getting the Hadoop Ecosystem up and running is fairly simple if you have experience with cloud services. In addition, the applications can also talk to other apps and services that you purchased from the same cloud provider.
To connect data streams (sources from the internet) to these Cloud Providers, you must purchase other apps. You also have to pay for certain applications (elastic search, geo search). As a result, unforeseen costs will eventually be added. In this way, cloud suppliers create a dependency so that organizations are forced to purchase other services and products from them (cross selling and up selling).
GOOGLE CLOUD VERSUS AZURE CLOUD
In my opinion, Google is the most affordable and flexible cloud provider. You can purchase almost any application you can think of from Google. It is also possible to support hybrid architecture (partly in the cloud and partly in-house). To be able to use these services, you must have some experience with developing (REST) API and using it.
On the other hand, Azure is more user-friendly, but has fewer applications than Google. Azure is also relatively more expensive than Google Cloud.
COMPLETENESS AND INTEGRATIE
In terms of completeness, Azure beats Google. Google requires much more configuration than Azure. Plus that there is more knowledge about Microsoft than about Google APIs.
To connect data stream (sources from the internet) to these Cloud Providers you have to purchase other apps. You must also pay for certain applications (elastic search, geo search). And there will be additional costs. In this way, Microsoft creates a dependency so that organizations are forced to purchase other services and products from them (cross selling and up selling).
Data Lake and Verticals
My advice from a software architect perspective depends on the purpose for which you want to use the database. I see two tracks emerging from here:
- Data Lake;
- Verticals (Specific applications).
The purpose of a Data Lake is to store as much data as possible without giving it context. It is important to be able to store and index all forms of data in this respect. Two systems are eligible for this, namely: Hadoop Ecosystems and MarkLogic.
If I rate this for completeness, I get more from MarkLogic than from Hadoop after the install. My advice, therefore, is MarkLogic.
VERTICALS (SPECIFIC APPLICATIONS)
With regard to the Verticals, it is important to be as close as possible to the language in which you will also be programming. All the intermediate steps you have to take to translate will (for example from xml to json) slow down the process. So look for a database that supports you in this as well as possible. Based on the different development languages, I recommend the following database suppliers:
- Data Science and Machine Learning : MarkLogic
- Java : MongoDB
Hopefully, in this way, you will succeed in making a choice for a supplier that suits you best.