Working With Large Data on Frontend

9 October 2023

Sooner or later every frontend developer will run into the issue that the beautiful application they built is facing performance issues because of unexpectedly large amounts of data.

The symptoms might be mild, amounting to nothing more than a slow down of the app or they could be severe, where your application freezes and has to be forcefully terminated.

The most common prescribed solution for this is pagination. Fetch your data from the server in chunks and display only one page at a time. This is a perfectly fine solution and has been blogged to death by this point.

The second solution, also blogged to death, is infinite scrolling. This is basically a pretty version of pagination where you load data as you reach the bottom. Today we are going to take a look at a few other ways to improve your app’s performance when dealing with large data.

Let’s define our problem

Before we get into the suggestions we first have to define our problem. Imagine we have a server that returns a huge amount of data in json form, we are talking hundreds of thousands of records. The server does not support pagination of results and you have to display the data in the browser.

What is our real problem with large data on Frontend? Is memory usage a bottleneck?

Browsers have access to huge amounts of memory these days, so do our apps really crash because of huge data? Not really.

It’s not the data that’s the biggest problem, the two most expensive parts of your process will be getting the data onto the browser and rendering elements to the DOM.

If you have to get a few hundred megabytes first before your application initialises then the user is going to have to wait a while to do anything. Also if you have to render everything out first, each element gives you a small performance hit, these hits are not visible when you have a small amount of records but as they grow in number those small hits add up pretty fast.

Possible Solutions

Cache what you can cache

Caching Data is a process that stores multiple copies of data or files in a temporary storage location—or cache—so they can be accessed faster.

Caching allows you to speed up subsequent calls to the same data and if done right can increase responsiveness dramatically.

Imagine a scenario where a user has to wait 10 seconds the first time when they get to your UI, it loads, and then they accidentally click refresh and have to wait another 10 seconds. They will not be happy.

But if you cache the calls the first time, they can reload the application and that second time it loads much faster. If your data isn’t changing per second then there is really no reason to reload all that data again at every refresh or when you navigate to another page.

The good news is browsers these days already cache API calls if they are similar but this requires that your headers are set up correctly on the server. So if the server sets up cache control correctly you should be fine. If it doesn’t you can still cache the data in your application, initialize the UI and then run a call to get the updated data in the background.

Lazy load your data

Lazy loading, or asynchronous loading, is when you wait to load resources until they are needed. It’s the IT equivalent to restaurants sending out the starter while they are still busy making the main course. If you are working with multiple calls and some of those calls aren’t needed to get the user working initially, delay calling those endpoints. Focus on getting the user interactive first then load the rest in the background when it’s clear they are needed.

This is a common trick used with images where images further down the page and not visible to the user are not loaded until the user starts scrolling towards them.

If you can delay loading some of your data that’s not immediately needed it will go a long way in improving your performance.

Optimize your data

Just because the API sent you all the data it has doesn’t mean all of it is relevant to the user. If the API gives you data separated into 7 categories and your user only cares about 3 of them, showing the other 4 is a waste of time and resources.

Filter these results out and don’t even write them to the DOM. It will make your app more responsive. Be sure however to leave a way for the user to access these hidden records, nothing is more irritating for users to know that data exists but they have no way of accessing it.

Use pagination

Yes, I know I said that the API doesn’t handle pagination, but that only applies to server side pagination. If you’ve already got the data on the browser in a timely manner you can still use pagination.

Frontend only pagination is a thing and is pretty easy to set up. You only need to calculate the number of “pages” your data will be displayed over and for each “page” display a subset of the data.

Honestly it’s simple, here’s a look at the code you will need to get this to work and I barely put in any effort.

This code gives you something that looks like this:

Use Infinite Scrolling

Infinite scrolling is a slightly fluider way of doing pagination. What happens is that you initially display a few results and as the user scrolls to the end of the list or table you add more records. Unlike pure pagination the user does not have to explicitly select a page number to go to and it feels more user friendly.

The downside is that if you have things like links or contact details at the bottom of the page the user will not be able to see them until all the records have been displayed. Another downside is that once elements are written to the DOM they will stay there for the lifetime of the page and if the page get’s large enough performance will degrade.

Use data virtualization

Microsoft Excel has over a million rows and over 16 thousand columns but it doesn’t show all of them on screen at the same time.

Data virtualization on frontend takes a page out of excel’s book and allows you to render only the data that is currently visible on the screen, rather than rendering the entire dataset.

It is a great technique to allow users to access large datasets in a way that's efficient and intuitive. It combines the best parts of Pagination and Infinite scrolling in that it’s very easy to use and keeps only a limited amount of elements on screen. Another difference with Pagination and Infinite Scrolling is that when done right it doesn’t always write a full page worth of records to the DOM, so if you scroll three records it will only add three new ones to the DOM while removing three others.

Implementing it well on frontend is a bit of a nightmare but luckily there are plugins for that, if you are in the React ecosystem you can use plugins like react-virtualized and react-window, if you use Vue you can use vue-virtual-scroller or vue-virtual-scroll-grid. The other frameworks all have plugins that do the same thing so I won’t list them all here but suffice to say, it’s easy to look up and implement.

Conclusion

The thing you have to remember is that the network and the DOM are two of your biggest issues, and solving things there goes a long way in solving your problems. The size of the data itself can be the problem but with the ever increasing power of modern PCs it’s less of an issue than other factors.