If you haven't read our previous posts on Data Maps, embeddings, and dimensionality reduction, we recommend starting there for some useful background!
Throughout this series, we've demonstrated how data maps powered by embeddings and dimensionality reduction unlock powerful ways to explore massive datasets. But having great algorithms isn't enough - you also need an interface that makes this exploration intuitive and accessible.
In this post, we'll explain why we believe web browsers are really well suited for interactive data exploration, making powerful data mapping algorithms available not just to experienced developers, but to anyone who wants to understand their data.
The browser is the right way to reach people. Everyday users already have browsers on their computer, and they don't need to install additional software to take advantage of all the tech built into their browsers that make data exploration a smooth and rich experience.
Modern web browsers are incredibly fast, in no small part thanks to years of optimization by companies like Google and Mozilla on making JavaScript a high-performance language for the express purpose of making web browsers fast.
The browser also has the ability to use your computer's GPU (graphics processing unit) automatically. GPUs are great at running many calculations at the same time in parallel, which is much more efficient when you want to run the same kind of operations for each point in large datasets.
Another reason, less about the underlying tech and more about the user experience, is that the web browser as an environment for rendering data comes with natural ways to get important metadata like images, TikToks, and Tweets (Xeets?) to show up alongside the data you are exploring.
To fully leverage these browser capabilities and overcome traditional visualization limitations, we are building Deepscatter, our custom web graphics engine. It's designed specifically to handle the challenges of visualizing millions of data points while taking advantage of modern browser technologies.
Traditional web visualization libraries struggle when dealing with massive datasets, often becoming sluggish or unresponsive. Deepscatter takes a different approach - it operates entirely client-side and loads only the data necessary for what you're currently viewing.
One of Deepscatter's key advantages comes from its use of Apache Arrow, which enables efficient memory management through contiguous memory blocks. Modern JavaScript's support for typed arrays means we can transfer data directly from tools like DuckDB, pandas, or Polars without costly serialization or deserialization steps.
All data in Deepscatter is transmitted using the Apache Arrow feather format, organized in a custom quadtree structure that enables selective loading based on the current zoom level. A quadtree recursively subdivides space into smaller and smaller regions in a hierarchical structure where each level provides increasingly fine-grained detail. This allows us to efficiently load only the appropriate level of detail needed for the current view, similar to how map applications like Google Maps show more detailed information as you zoom into specific areas.
The quadtree data structure, as seen on the left in rendering Homer Simpson (image credit to Alex Wakeman), demonstrates how space can be recursively subdivided into smaller regions only when more detail is necessary. Larger squares are used for areas with less detail, while smaller squares provide higher resolution where needed - similar to how modern map applications adjust detail levels based on zoom level. You can think of it like a champagne tower, where each level represents an new level of granularity the data structure starts to flow data into when necessary (image credit to Filaos)
Using typed Arrow arrays enables efficient communication with the GPU through optimized memory management. Deepscatter's rendering system leverages WebGL, with REGL handling efficient GPU buffer management. We've carefully optimized the rendering pipeline by minimizing draw calls between the CPU and GPU, which create a significant performance bottleneck. By executing most visual transformations directly on the GPU, we achieve smooth, parallel processing for transitions and animations. Internally, we're looking at WebGPU as the future once Firefox and Safari offer native support for WebGPU.
We'll use the same data map from Part 1 and Part 2 of this blog series to show some of the capabilities that the Deepscatter engine enables when exploring data maps in Atlas.
This clip demonstrates one of the unique advantages that Deepscatter enables for data analysis using Atlas. Initially, data is plotted geographically, allowing users to interact with the data by selecting specific areas with the mouse. Subsequently, the projection can be changed to a semantic view, utilizing embeddings and dimensionality reduction as discussed in our previous blogs. This transition enables users to observe the different semantic categories within the geographically selected data in the context of the broader dataset's semantic categories discovered by our models running in Atlas!
Data exploration should be as natural as web browsing itself. Modern web browsers provide near-universal accessibility and leverage advanced graphics computation for fast performance, making them ideal for interactive data visualization with tools like Deepscatter and Atlas.