Scalability, Allocation, and Processing of Data for Multitenancy
When developing a SaaS application and expanding business, companies are certain to eventually encounter large amounts of raw data which can lead to various technical difficulties.
Data allocation is especially important in applications based on multi-tenant solutions.
As we’ve mentioned in the previous post Introduction to Multi-Tenant Architecture, they, with only one instance of the software, serve multiple customers.
There are numerous aspects of data processing in multi-tenant applications, such as scalability, optimization, and performance.
Today we are going to go a little more in-depth into this topic.
In this article, we are going to answer the following questions:
- How do types of data differ from one another?
- Why characteristics of analyzed data are so important in multitenancy?
- How do different types of multi-tenant databases influence system performance?
How to allocate, scale and process data in a multi-tenant architecture
In our previous article from the Multi-Tenant Architecture series, we explained what multitenancy is using the shepherd metaphor.
We followed the adventures of an ambitious shepherd who’s decided to set up their own sheep-keeping business.
Now the shepherd is facing a tough decision. What is the best approach to labeling and storing all the sheep from various owners on the pasture?
This, of course, is a broad analogy to different types of databases in a multi-tenant architecture in a SaaS application.
It shows that there are multiple approaches to data processing where multiple tenants are connected to a shared environment.
But how to decide which approach will work best in your organization? In order to make the most informed decision, it is advisable to look at the data that you are anticipating to receive.
There are a lot of specific characteristics of various datasets that need to be taken into consideration by developers working on big data multi-tenant solutions.
Explore this topic with us! Follow up reading: What is Multitenancy? Definition of Multi-Tenant Architecture.
Multi-tenant architecture — characteristics of the datasets
First, we should start with the basics.
A database is a systematic collection of data. It supports electronic storage and manipulation of stored information.
In the following parts, we are going to focus on relational databases.
Relational databases (RDB) consist of multiple arrays of data that can work together. Relational databases have internal programming languages (SQL), which can be used by software developers to create custom menus and various advanced data handling functions. In such databases, all data values are based on the simple data types and are represented in the form of two-dimensional tables.
Scalability, allocation, and processing
Ok, if we have all the basics covered, let’s move on further.
If we look at the raw data stored in the databases themselves, we can point out 3 main characteristics of these datasets: scalability, allocation, and processing.
When starting to develop a data allocation solution for a system in which a single software instance can save multiple users, software architects have to consider the above characteristics. Only then any serious resources can be dedicated to the development process.
Each data characteristic is equally important, particularly from an economical standpoint. Understanding them can vastly improve eventual vertical and horizontal scaling in the future, as the system will be built with the specific type of client data in mind.
1. Data scalability
Data scalability is one of the most essential aspects to take into consideration when making a data warehouse for a system future-proof. It enables easy and seamless horizontal scaling.
It is even more critical when developing a system in multi-tenancy architecture. The reason why is because a single instance of software serves multiple customers and users can influence each other’s performance.
The size of datasets
The first thing to consider when designing a data storage system for a client or an internal project is the expected number of tenants and overall raw data volume.
Planning and developing a highly scalable data storage solution might turn out to be a waste of time if you:
- have a relatively small number of tenants (under 10),
- store a small number of entities for each tenant.
In database administration, an entity can be a single thing, person, place, or object.
On the other hand, when you have, say, around 60 tenants, or if one of your strategic goals is to reach that level in a feasible timeframe, building a scalable solution is becoming a key point of consideration in cloud computing.
This is probably one of the main reasons why companies are pursuing multi-tenant architectures in their big data solutions. They are dividing users’ data into multiple servers just because they are unable to store them on one machine.
As your business grows, the benefits of having a clear data storage strategy, and applying automation solutions to database management, will become more prevalent.
Data growth
The next characteristic is the future growth of the sheer volume of stored data. Under some circumstances, we have data whose volume is relatively stable with time.
Take, for instance, a database of states in the USA. The last change happened in 1959 when the state of Hawaii joined the Union.
There has been no indication that this may change in the foreseeable future.
If we are managing this kind of database, we can be sure that it won’t change in size in any meaningful way.
But on the other hand, there are modern SaaS applications or services like social media sites that can grow incredibly quickly, providing that they have a smart business plan and strategy. In this case, each new client and user means more memory requirements for the company’s servers.
That is another aspect that influences companies’ decisions about moving into multi-tenant architecture.
Multitenant architecture allows developers to relatively quickly add new shards to the database in order to serve different tenants and maintain the overall system performance.
2. Data allocation
When planning a data storage solution for a project in which a single application instance serves multiple users, you need to decide on an approach for sharing or isolating your tenants’ data.
Data is often considered to be one of the most valuable parts of a SaaS application since it provides invaluable business information for management teams.
It is worth mentioning that some data is more closely connected to specific tenants than others. Because of that, we can divide all the available user data into two main categories based on this characteristic.
The degree of tenant and data association
We can point out two approaches to differentiating databases when it comes to the degree of data association between them, and their owner.
1. type: a strong data association with tenants
This type of data is highly associated with its owner, which means that it is specific to only one tenant.
In reference to our shepherd metaphor from the introductory article, let’s take the owner of the herd of sheep who is a Tenant. The Sheep from a given herd would always be associated with this specific tenant. The degree of association between the Tenant and Sheep is strong.
2. type: optional association of data with tenants
On the other hand, there is a type of data that is only optionally associated with a specific tenant.
Imagine a brave shepherd dog that guards the sheep. The dog can be owned either by a specific tenant, and it is supposed to look only after their sheep, or by the shepherd – the owner of the farm. In the second case, the dog is not linked to the specific tenant, but it guards all the sheep from various tenants.
It means that the association of the dog with a given tenant is only optional.
Why are all of these things so important?
It all comes down to the choice of the multi-tenancy type that we’ve covered in the previous blog post.
For instance, if we have an application that deals with large volumes of data, a much better option would be to choose type 3 (One database per tenant). However, if this data is mostly non-tenant-specific, and we are usually processing it together, a much better option would be type 1 (Shared database).
It is up to the developers and software architects to weigh all of these aspects and make the most informed decision. It will influence the future prospects of horizontal and vertical scaling.
Shared data allocation
Modern data allocation algorithms can utilize the knowledge of user moving patterns for the proper allocation of shared data in a computing system.
By employing these algorithms in practice, the need for costly remote accesses can be minimized. Also, the overall performance of a mobile computing system is thus improved.
The aspect of shared data allocation is one of the key factors that should influence the decision of the type of multi-tenant architecture.
3. Data processing
Last, but not least, we have to cover data processing aspects of multi-tenant architectures.
The main problem to solve in this field is trying to answer the question of how to perform operations on multi-tenant databases faster?
A tenant’s data processing
Let’s say that we want to perform a fairly straightforward operation that will sum up all the entries of a certain type in the entire database.
Which multi-tenant database type will perform this operation the fastest? That is a question worth considering.
If you need to consult what type of multi-tenant database is best for your business, do not hesitate to reach us. We are a team of high-performance software experts experienced in introducing a multitenancy solutions to SaaS applications. Feel free to reach out!
Shared and separate database processing – what to choose
The first type is shared database multitenant architecture. In this type, performing such an operation is fairly quick.
On the other hand though, if we perform the same operation but only refer to a single user instead of the entire database, our overall performance may be slightly hurt.
In type two (shared database with separate schemas), the performance in these two situations will be pretty similar.
In the third type (separate databases) though, clear differences in the performance will be visible. Counting the entries specific to a single tenant will be the fastest solely because of the fact that this data can be physically separated on different server machines.
This database will have dedicated CPU threads and memory modules, which will increase the performance and improve latency.
The third type may encounter problems when trying to sum up all of the entries across multiple databases. The speed of this operation will depend on how much of the computing resources will be dedicated to the database management process.
How to perform operations on data fast and efficiently
Problems begin to arise when we add more complexity to our instructions.
If we, for instance, want to select a specific range of entries with a maximum value, from the entire database, the type three of multi-tenancy will lead to serious performance hurdles.
Nevertheless, server machines have a limited amount of processing cores and cache. Therefore, advantages gained by selecting the first type of multitenant architecture can be diminished in the long run by problems with optimization and latency.
The third type can be much more easily scaled vertically, which is especially important for large enterprises focused on rapid development and big volumes of processed data.
This example clearly shows how planning ahead is a crucial part of the development process when working on multi-tenant solutions.
Are you considering implementing a multi-tenant architecture to your SaaS application?
Summary
Understanding the characteristics of data in multi-tenant applications is crucial at the very beginning of the whole development cycle.
This issue is especially significant to smaller companies and start-ups, which cannot be entirely sure what volume of data they may need to process as the business grows.
Being prepared for the future expansion enables software developers to better allocate hardware resources, which are typically substantially limited.
Did you find this article useful? Don’t forget to share it!