Introduction to latency
Before we move on to analyzing latency in applications, it is helpful to know that each system software performs many operations that can be measured at any given time.
An example of an operation in a web application might be sending a search query to a search engine from a browser and displaying the results of that query.
In a trading system application dedicated to e-commerce, on the other hand, it could be the automation of sending information about the event of a user placing an order for a given product.
The less time these operations take, the more user-friendly they are.
These issues unquestionably come as no surprise to anyone. Today we are simply used to looking for information online, banking or entertainment applications that don’t make us wait for them.
The lack of delay in accessing searches was part of Google’s competitive advantage and what initially gave them the edge over other search engines.
The same is true for the e-commerce market, which has been snowballing in recent years. Fast online store engines are driving virtual commerce and are an element of competitive advantage for many companies.
This is why creating low latency applications is so important and often determines the be or not to be of a given business.
What is low latency in applications?
The term latency usually refers to the speed of networked computer programs in the IT world.
If the topic of latency is brought up, it means in practice that an application does not respond in a sufficiently short time. So, we can say that latency determines how long it takes for a data packet to move from one designated point to another. Under optimal conditions, the latency should be as close to zero as possible.
However, this is not a simple task.
In the case of applications running in distributed systems, which are, for example, Internet applications, the cause of latency is often communication delay, i.e. the time required for a sent data packet to reach the other end of the communication channel and for the return response to get back to the sender.
Application users suffer the effects of latency in the form of long loading times for a web page, interrupted video streams or applications taking too long to launch.
All of this negatively affects UX and causes users to choose third-party solutions.
Equally important, each operation that applications perform has its latency. Because of this, many options for measuring latency (such as the number of operations per second or the number of seconds per operation) are used to analyze just a single operation run. For this reason, latency in applications is best determined in terms of centiles.
What does it mean?
Percentile, a.k.a. centile, is a measure indicating the value for which n percent of the collective is equal to or less than this value and 100-n more extraordinary.
So, for example, when we learn from the analysis that the 80th percentile of latency is 65 ms, it means that 80 out of 100 operations have a latency of at most 65 ms, and the rest, i.e. 100 – 80 = 20, operations suffer from a latency of at least 65 ms.
Let’s consider this as an example: imagine your app with 90th, 95th and 99th percentile latency of 1, 2 and 25 seconds, respectively. This means that if any subpage or functionality of your app has a million page views per day, then 10000 of those page views take longer than 25 seconds.
We probably don’t need to explain how this negatively affects the user experience and what steps users will take when an app or website performs so poorly.
If you want to know more data and numbers related to Java latency, read our stand-alone article about it.
What are the sources of latency?
The causes of application delays depend on many factors. Unfortunately, there are many of them, and they are usually random in nature.
Among the most important ones should be pointed out:
- Hardware interruptions,
- Network or I/O delays,
- Hypervisor interruptions,
- Operating system activity (including rebuilding internal structures or flushing buffers),
- Context switching,
- Memory access,
- Garbage collector,
- CPU/Cache/Memory Architecture,
- JVM functionality,
- Network protocols,
- Cache misses,
- How the application is designed – concurrency, data structures and algorithms, caching.
How to decrease your app latency?
Google Vice President Marissa Mayer brought up the topic of latency back in 2006 at the Web 2.0 conference.
In a test conducted by the dawn giant, it turned out that an additional 500ms of latency reduced Google’s search traffic by as much as 20%.
Amazon specialists came to similar conclusions. In the A/B tests conducted, page speed was delayed by 100 milliseconds. It turned out that even such minor delays generated revenue drops.
So in her speech, Marissa Mayer firmly stated that “users really respond to speed”.
These events were groundbreaking, and since then, developers have constantly been racing to make their applications have as ultra low latency as possible.
How to achieve this? What are the best practices in low latency system software development? Here are the most critical elements to keep in mind when designing low latency systems.
1. Choose the correct programming language
When choosing a programming language to create a service or application, it is worth knowing that there are two main groups – scripted and compiled.
Scripted programs are text files that you run with an additional program. When run, the code is interpreted into a version the computer understands and executes.
Although we can see the effects of changes relatively quickly with this group of programming languages, in most cases, applications written in scripting languages have lower performance and more significant latency.
If you choose compiled languages, such as Java, this problem disappears.
In the case of Java, programs are run by the Java virtual machine – simply put, a compiler tool translates Java code into virtual machine code understandable by the JVM.
Therefore, no matter what JVM programming language we use, our code will always be translated into virtual machine code, called bytecode. Only the JVM itself further translates this code into a form our computer’s operating system and hardware understands.
As a result, low latency code in Java is compiled once before the program is run, making it run faster and without significant delays.
2. Remember about memory and garbage collection
Input/output operations are a common cause of latency, so you must ensure all necessary data is stored in memory.
In practice, this means managing your own in-memory data structures and maintaining a persistent log, but it positively reduces latency.
These issues are also crucial in the context of garbage collection, which in Java handles memory management. While this is useful, memory management cannot be 100% forgotten. This is because there are problems with memory leaks, which negatively affect both performance and application latency.
3. Provide a system buffer
To avoid latency, constantly provide and maintain resources for processing application requests.
Operating at the limits of hardware or software capacity in a situation of increased processing demands can result in bottlenecks and thus negatively affect the user experience.
4. Context switching
Changing the process being executed involves the complicated operations of saving the previous process’s computational state and recreating the new process’s computational state.
This represents a significant overhead on the operation of the system or application since, during this time, the system does not perform valuable anything but only “administrative activities.”
So, process scheduling should not change the process being executed too often because most of the working time will be spent on constant context switching. In practice, this means you do more computational work than you have resources for, generating latency.
5. Take care of sequential reads
Each form of storage performs far better when used sequentially.
This is because sequential reads to a memory trigger the use of prefetching at the RAM level, as well as at the CPU cache level. By performing these actions correctly, every next packet of data that the application needs at any given time is available at hand in the cache, reducing latency.
6. Pay attention to caching
When analyzing Java app latency problems, it is also essential to keep in mind caching.
The speed and performance of a Java application largely depend on the type of tasks it has to perform and what they involve. Although caching is primarily related to Java application performance, it also reduces the load on the system, consequently reducing latency.
Therefore, speeding up processes by caching data minimizes the burden on limited resources and positively affects application speed.
How to decrease your Java app latency?
Nowadays, a lot is required from consumer, business or entertainment applications. For this reason, developers are constantly racing to write low latency systems that achieve response times in microseconds.
1. Keep your application simple
If you aim to minimize application latency, try to keep the Java code and system as simple as possible. The fewer tasks an application has to perform, the less time it takes to complete them, reducing latency.
How to accomplish this?
First, remember at all times what kind of functions the necessary code in your application is supposed to perform. The KISS rule, or Keep It Simple, Stupid, is helpful for this. The KISS practice originated in the 1960s among US military engineers.
Over time, it has also found its way into development approaches, as it thoroughly defines the essence of the products being developed – striving to maintain a clear structure without adding unnecessary elements.
2. Remember about garbage collection
As a rule, memory has a limited size and cannot be expanded indefinitely. Its insufficient resources affect the latency.
Java programmers are helped by the garbage collector, which independently removes no longer used objects from JVM memory. Knowing how the GC algorithms work is crucial to tuning the software system. So, if you suspect that latency problems in your application lie on the memory side, it’s a good idea to verify the garbage collection log immediately.
3. Analyze all processes, not just Java
Low latency in Java is not solely due to the capabilities of this programming language.
To be able to implement ultra low latency-reducing entitlements, you need to understand the entire process on which your code executes thoroughly.
What should be taken into account?
First, the operating system and the choice of hardware with the proper parameters are essential. To reduce latency in Java, you also need to ensure that the system software and device drivers are correctly aligned. You should also check lock free algorithms and processing time.
4. Take care of memory and I/O operations
To reduce Java latency, prepare your application so that all necessary data is stored in memory or cache.
Also, constantly monitor I/O operations so that if anything goes wrong, you can react right away.
5. Rely on specialists
Finally, one more piece of advice – for Java latency optimization work, engage someone with experience.
At Stratoflow, we develop high-performance Java apps and provide our clients with the help of expierienced in building low latency software systems. This enables us to reduce the development time of applications that respond to market needs and support business processes.
An example is the travel search engine we created, which handles up to 300 million daily queries, where 95% of them reach users in less than one second.
How do I make my Java application run faster?
Java applications are expected to run fast and be helpful.
This is important for both system developers and users. To make a Java app run fast, you need to pay attention to:
- Eliminating any slow database queries, which occur most often as the application load increases;
- Speeding up processes using cached data, which definitely reduces the load on limited resources;
- Garbage collection operation and memory leaks;
- The quality and accuracy of design guidelines;
- Code quality and proper object allocation.
Combining all of the above-mentioned “puzzle” elements increases the likelihood of creating an efficient Java app that runs without latency.
How do I create a low latency application in Java?
Google constantly notes inquiries about how to write a low latency application in Java.
It’s hardly surprising since a lot depends on the speed of the system’s performance – first of all, the application’s business usability and positive user experience.
For this reason, Java programmers are in demand constantly on the labor market, and their knowledge of low latency programming is invaluable.
Among the pro tips pointed out by experienced Java developers on low latency programming are such as:
- Choose the best and well-suited programming language for the project you are working on;
- Keep all necessary data in memory to avoid I/O problems;
- Take care of adequate system buffer and cache so that all operations are performed smoothly and without delay;
- Systematically check garbage collector logs and data structures;
- Create and plan a tuning strategy (use tuning systems software only when you
- Always keep in mind the business purpose of application development;
- Rely on experienced Java developers who know where to look for latency causes.
How to build ultra low latency systems in Java – summary
The speed and efficiency of the application realistically translate into business benefits. It is worth knowing that some applications, such as those dedicated to high-frequency trading companies or created for financial sectors, require efficient and smooth handling of even millions of transactions per second. Minimizing the latency of even a single transaction is really of great importance. So before you start developing an application, define your business goals and carefully verify the software house. Remember that only high-performance applications will allow you to succeed in business!