5 Things to Consider When Choosing a Database

A database is the fundamental building block for any data-based initiative, as it is involved in all stages. Databases are used when collecting, storing, processing and analyzing data. A database is the silent, almost invisible component that drives business decisions, operational improvements or simply keeps track of your inventory. 

As much as the database should be the invisible part of this, it is crucial to make the right choice. While it might look easy to select a suitable database, there are a few things to consider when making a choice. 

So, let’s look at five things you need to consider when selecting a database for your next data project.

Flexibility

You might think of a database as something static and, for a long time, you would have been correct in making this assumption. Databases used to be static storage that provided you with a selectable data retrieval option. But as databases are being used in more places and the amount of data that needs to be handled by them is growing every second, they need to be flexible as well. 

But what does flexibility mean in a database context?

Any database can store numbers and text. Most things, if not everything, you want to store can be represented in either of the two. While this, in theory, is true, it will only cover the pure storage element but won’t handle the data part. You need to consider what type of data your database can handle and how. Object or document data is becoming an ever-present type of data you want to store. While this type of data could be stored in text-only format or be deconstructed and stored with numbers and text, you lose information and create an additional overhead. Besides, doing so means not being able to handle complete objects easily in queries.

So, take a good look at what type of data you can store and use with your potential database. And of course, also consider what you might need in the future.

Data types are not the only area where your new database needs to be flexible, though. Another critical element is scalability. While a database might fit your needs today, a growing amount of data and a growing demand for data-driven decisions require your new database to grow with your needs. While most database solutions out there allow you to add additional capacity as you grow, look at how this happens. Do you need to change your entire database architecture just because you need more storage capacity? What if you need more performance to perform queries? Does the database scale linearly, or is there some overhead? Your next database solution should be flexible and elastic. 

We now have covered data types and scalability, but we are not done with flexibility just yet. Your database should also be flexible with regard to where you can run it. Can you run it at the edge, in a public cloud, a private cloud? While the answer to this in most cases would also be yes, there is more to consider than just running the database. What do you get in either of these deployment models? Are there differences in functionality that you might need to consider? Once again, do not just look at what you want to do today; think of what might happen in the future. 

We are almost done with flexibility, but there is one more thing to consider. We all fear it, yet we have all fallen for it: The infamous vendor lock-in. When choosing your next database solution, you need to be on the lookout for the available types of interfaces and how you can access it. Are there proprietary or special languages used for it? Will you need to change everything around the database if you switch? Will you be able to get all your tools interoperable, both today and tomorrow, or might this be a challenge due to a nonstandard query language? SQL is your best choice, in this case, as it is a widely known standard that is supported everywhere, and no NoSQL does not necessarily mean no SQL.

Functionality

What kind of functionality the database system offers is another question to consider. While most databases provide a similar set of functionalities, specific areas might require close attention. 

You must consider what built-in functions the database offers in terms of aggregation and scalar functions depending on your use case. Using built-in database functions can significantly increase the performance of external systems and help optimize your data storage requirements. 

When it comes to the functionality of a database, there is always access to data that must be considered—or, more precisely, how you can access the data. Is there a specific query language that is being used; a nonstandard API? It is important to consider what query interface you have available and what it offers, to determine if it fits your needs.

Finally, there is functionality in terms of availability. When you are looking at a cloud offering, you need to consider SLAs, but also determine whether there is a single point of failure in the architecture, which also applies to other deployment methods. What high-availability options are available? How is the database handling failure on one node? Will it automatically re-balance, or is data lost in case of a node failure? Will high availability only be achieved with a copy, or will additional nodes add further performance gains? Finally, evaluating backups and how to recover from an error are availability concerns that need to be evaluated to determine if everything fits your needs. 

Usability and Performance

Usability and performance are additional considerations. Usability considerations around the access method or query language we’ve already covered; but is the API or query language that the database system offers appropriate for your desired use case? Are the tools and applications you plan to use compatible with the chosen database system? Will you need to develop a custom solution or is everything integrated in an effective way? 

Of course, you also need to take operational usability into account. Is logging and auditing functionality offered sufficient for your use case? Can the database system be integrated into your current monitoring and operations tools? 

Performance also plays a role in usability—if you do not get the right performance out of your new database system it will not be usable. However, there are multiple performance angles to consider that might have an impact on your choice of database. You must consider where you will need the best performance and what the database system offers in this area. Will ingesting data be the most performance-sensitive area or is retrieving data key for your use case? Of course, if both are important you must pay attention to both. So, consider if you have large or very frequent queries, many records that need to be stored, or both. There can be significant differences in both cases between different database systems. 

The other performance question is around the timeframe between ingesting data and querying this data. Does your new database system deliver what you need in this case? Is it real-time enough for you? 

Finally, when it comes to performance, also consider your future requirements in these areas and what your new database system can offer you in terms of scalability in the future. How easily can new nodes be added? Will additional nodes impact other nodes? How will performance increase?

Security

Security must be a major consideration when choosing any new IT system, and a database is no exception. Data breaches are very costly and assessing the impact and what has been breached are key considerations for any mitigations. 

The most prominent security consideration is access control. What access control measures are available, how can you make sure you can restrict access to data to only what is needed, and how can you audit who accessed what data when? The same should also be possible in terms of identifying where data is coming from, as false data can also cause a problem and you want to locate the source. 

Besides these specific considerations, there are, of course, further considerations around security that need to be considered. How are connections to the database secured? Is data encryption possible? How can the database system be integrated into the overall IT security infrastructure?

Cost

Cost is always a huge factor and something to consider for your database system. But the cost goes beyond the obvious license costs, which, of course, need to be considered and evaluated based on the value that is added. 

For instance, you need to consider how much infrastructure you need to buy to run the database system, or if considering a cloud service, whether it adds enough value compared to an on-premises installation. These fixed infrastructure costs, however, are not everything—you also must consider staffing. How much operational overhead is added with the new database system? Do I have skilled staff to handle these systems, or do I need to add additional staff or invest in further training? Cloud services can help mitigate some of these challenges, but still need to be evaluated closely for all points. 

We have already touched on compatibility with your planned or existing tools and considered the cost of changes to these tools or the costs of changing to a separate set of tools, but it’s worth mentioning here as this could incur additional costs.

Conclusion

Choosing a database system is not an easy task, and there are many things to consider: Flexibility for current and future uses scalability, performance, security and costs. There are many database options available, and it might be challenging to find the right solution for your current needs. Cloud services offer flexibility and reduce operations and staffing costs, but might not be suitable in all cases. Predicting the future is always tough so always opt for the most flexible and scalable option.


Join us for KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain (and virtual) from May 16-20—the first in-person European event in three years!

Jan Weber

Jan Weber is a passionate Product Manager with over a decade experience in managing software products for SMB to Enterprise customers. As Product Manager at Crate.io he is responsible for the CrateDB product portfolio from strategy to execution.

Jan Weber has 1 posts and counting. See all posts by Jan Weber