From 30 Seconds to 80ms: Scaling Geospatial Data to Millions of Points

In modern SaaS architecture, handling data at scale is a fundamental requirement. However, visualizing that data—specifically millions of geospatial data points—presents a unique set of engineering challenges that standard CRUD patterns cannot address.

The Business Requirement

We were tasked with architecting a real-time geolocation analytics dashboard for our enterprise clients. The objective was to provide deep visibility into product usage by tracking QR code scans globally. The system required a highly interactive map interface capable of displaying usage density (clustering) at a macro level while allowing administrators to drill down into specific, street-level data points (pins) with zero latency.

The Initial Architectural Approach

Our initial proof-of-concept utilized a traditional client-side rendering architecture, a standard pattern for most web applications:

Data Retrieval: The frontend initiates a REST API call to the backend.
Query Execution: The backend executes a standard SELECT query against the database.
Data Transfer: The result set is serialized into JSON and transmitted to the client.
Visualization: The frontend parses the JSON and instantiates DOM elements (markers) using a standard mapping library.

Performance Bottlenecks and Failure Modes

While this architecture performed within acceptable limits for small datasets (under 5,000 records), load testing with larger subsets revealed critical structural failures:

Network Latency: The JSON payload size ballooned, resulting in download times upwards of 30 seconds.
Main Thread Blocking: The JavaScript engine consumed excessive memory resources parsing the massive JSON string, causing significant input delay.
DOM Rendering Overhead: Attempting to render tens of thousands of individual DOM elements overwhelmed the browser's layout engine, resulting in UI freezes and eventual application crashes.

It became evident that client-side rendering is inherently unscalable for high-volume geospatial data. The bottleneck was not in the database's ability to fetch data, but in the browser's ability to ingest and render it. To achieve our performance targets, we needed to shift the computational burden from the client to the server.

The Solution Architecture: Server-Side Vector Tiles

To address the inherent scalability constraints, we re-evaluated our rendering strategy by analyzing patterns used by high-scale mapping platforms. The solution required a fundamental paradigm shift: moving from fetching raw data to serving Vector Tiles (MVT).

The Architectural Pivot

Instead of transferring the entire dataset to the client, the Vector Tile specification divides the global map into a hierarchical quadtree grid. When a user navigates the map, the client requests data only for the specific square tiles (defined by Zoom, X, and Y coordinates) currently within the viewport.

Dynamic Aggregation Strategy

A critical component of this architecture is Dynamic Level-of-Detail (LOD). We implemented logic to alter the data resolution based on the user's zoom level:

Macro View (Low Zoom): The system aggregates millions of points into statistical clusters server-side.
Micro View (High Zoom): The system delivers precise geometry for individual data points only when the user zooms in sufficiently.

This ensures that the payload size remains constant and lightweight—typically under 50KB per tile—regardless of whether the underlying dataset contains ten thousand or millions of records.

The Revised Tech Stack

We restructured the application into a decoupled pipeline to optimize for throughput and rendering speed:

Database (PostgreSQL + PostGIS): Promoted from a passive data store to the primary computational engine. It handles all spatial math, bounding box calculations, and point clustering.
Backend (Spring Boot): Re-purposed as a thin streaming proxy. It accepts tile coordinates, queries the database, and streams binary Protocol Buffers (PBF) to the client without object deserialization overhead.
Frontend (MapLibre GL): Replaced the DOM-based rendering engine with a WebGL-based renderer. This allows the browser to offload the visualization work to the client's GPU, enabling smooth 60 FPS performance even with complex vector data.

Database Engineering: Optimizing PostGIS for High-Throughput

The central architectural decision was to minimize application-layer overhead by offloading spatial computation entirely to the persistence layer. Transferring millions of rows into the application memory for processing would introduce unacceptable serialization costs. Instead, we leveraged PostGIS to perform filtering, transformation, and serialization directly at the data source.

Schema Optimization and Pre-Computation

Standard spatial queries often suffer from the CPU cost of coordinate system transformations (ST_Transform) during runtime. To mitigate this, we adopted a strategy of spatial denormalization:

Pre-Projected Geometry (geom_web_mercator): We stored a secondary geometry column pre-projected to EPSG:3857 (Web Mercator). This aligns the data storage with the map rendering projection, eliminating the need for mathematical transformation during query execution.
Geohash Indexing (geohash): We generated a geohash string for every record upon insertion. This enables highly efficient text-based aggregation, allowing the database to group millions of points using standard B-Tree indexes rather than expensive spatial distance calculations.

Indexing Strategy

Performance at this scale relies heavily on precise index selection. We utilized Generalized Search Tree (GiST) indexes for the spatial columns to enable logarithmic time complexity for bounding box intersection queries (&&), alongside composite B-Tree indexes for clustering keys.

-- Schema Optimization
ALTER TABLE scan ADD COLUMN geom_web_mercator geometry(Point, 3857);
ALTER TABLE scan ADD COLUMN geohash VARCHAR(7);

-- Indexing Strategy
CREATE INDEX idx_scan_geom_web_mercator ON scan USING GIST (geom_web_mercator); 
CREATE INDEX idx_scan_geohash ON scan (geohash);

Logic Encapsulation via Stored Procedures

We encapsulated the tile generation logic within a stored procedure (get_scan_tile). This function acts as a dynamic tile generator that alters its execution plan based on the requested Zoom Level (Z):

Macro-Aggregations (Low Zoom): The function executes a GROUP BY operation using a substring of the geohash. This rapidly aggregates millions of records into a manageable set of weighted cluster points.
Micro-Precision (High Zoom): The function leverages the GiST index to perform a spatial intersection, retrieving precise geometry for points strictly contained within the user's viewport.

-- Dynamic Tile Generation Logic
IF z < 12 THEN
    -- Macro View: Fast Clustering via Geohash Prefix
    SELECT count(*) as point_count, ST_AsMVTGeom(...) 
    FROM scan
    GROUP BY substring(geohash from 1 for 4);
ELSE
     -- Micro View: Exact Points via Spatial Index 
    SELECT ST_AsMVTGeom(...) 
    FROM scan 
    WHERE geom_web_mercator && tile_bounds;
 END IF;

Crucially, the function utilizes ST_AsMVT, a native PostGIS function that serializes the result set into the binary Protocol Buffer (PBF) format directly within the database engine. This ensures the data is fully prepared for wire transmission immediately upon retrieval, bypassing the need for any data transformation in the application layer.

The Backend: Constructing a High-Performance Streaming Proxy

With the computational logic offloaded to the database, the role of the Spring Boot application layer shifted from data processing to high-throughput streaming. The primary engineering objective was to minimize latency between the database cursor and the HTTP response stream.

Data Access Strategy: Bypassing the ORM

While Hibernate (JPA) is standard for transactional CRUD operations, it introduces significant overhead when handling binary data. Object-Relational Mapping requires result set parsing, object instantiation, and state management—all unnecessary costs for a read-only binary stream.

To optimize for raw throughput, we utilized Spring's JdbcTemplate. This allowed us to execute the stored procedure and retrieve the resulting bytea (binary array) directly, bypassing the JVM's object allocation heap entirely for the payload data.

The Tile Controller

The API endpoint acts as a thin gateway. It accepts the standard Vector Tile coordinates (Z/X/Y) along with strict filtering parameters. The implementation emphasizes two critical performance factors:

Zero-Copy Streaming: The binary data from the database is written directly to the HTTP response body.
HTTP Caching: We implement aggressive Cache-Control headers. Since map data for a specific tile and timestamp is immutable, we instruct downstream CDNs and browser clients to cache the tile, significantly reducing load on the database for recurring views.

The Frontend: High-Performance Rendering via WebGL

The final component of the architecture was the visualization layer. To achieve our target of 60 FPS performance while rendering potentially millions of data points, we migrated from DOM-based libraries to MapLibre GL. This shift allowed us to leverage WebGL, offloading the rendering pipeline directly to the client's GPU.

Integration Strategy

The integration with the backend was implemented via the MapLibre vector source specification. Instead of loading a static dataset, the map client was configured to request tiles dynamically from our Spring Boot endpoint. This establishes a highly efficient data pipeline where the frontend requests only the visual data required for the current viewport and zoom level.

Client-Side Data-Driven Styling

A key advantage of the Vector Tile specification is the ability to separate data from styling. Rather than writing complex JavaScript logic to determine marker colors or sizes, we utilized MapLibre's expression language.

We implemented an interpolation strategy for the cluster layer. The renderer dynamically adjusts the circle colors and radius properties based on the point_count attribute embedded in the vector tile. This ensures visual density is communicated immediately without the overhead of client-side calculation loops.

Production Deployment and Cloud Architecture

Moving from a local development environment to a production cloud infrastructure required addressing the final bottleneck: compute cost. Calculating vector tiles is CPU-intensive. Executing the PL/pgSQL function for every single pan and zoom operation by thousands of concurrent users would place unsustainable load on the primary database instance.

Database Layer: Managed PostgreSQL

We deployed the persistence layer on AWS RDS for PostgreSQL. This managed service provided automated backups and high availability. Crucially, we verified that the PostGIS extension was enabled and optimized the instance class to favor compute (e.g., db.m5 series) to handle the MVT generation efficiently.

Edge Caching Strategy (CDN)

To scale the tile delivery globally, we implemented an aggressive caching strategy using AWS CloudFront as a Content Delivery Network (CDN) in front of the API.

The Spring Boot application emits a specific HTTP header: Cache-Control: public, max-age=600. This instruction tells CloudFront (and downstream browsers) to cache the binary tile for 10 minutes.

Cache Key: The cache key is composed of the URL path (/api/tiles/z/x/y.pbf) plus the unique query parameters (orgId, productId).

Impact: The database only computes a specific tile once per 10 minutes. All subsequent requests for that same geographic area—regardless of user location—are served directly from the CDN edge location. This reduces database load by over 99% for high-traffic regions and ensures consistent low latency globally.

Conclusion and Final Benchmarks

The architectural migration from client-side processing to server-side vector tiles yielded measurable, order-of-magnitude improvements across all key metrics:

Throughput Capacity: Successfully scaled from a hard limit of ~5,000 points to supporting datasets exceeding millions of records.
Load Time Latency: Reduced initial map load time from 30+ seconds (JSON parsing) to < 100ms (PBF streaming).
Bandwidth Efficiency: Achieved a 95% reduction in network payload size by switching from verbose JSON to binary Protocol Buffers.
Rendering Performance: The MapLibre WebGL engine maintains a consistent 60 FPS frame rate during panning and zooming, regardless of data density.

This implementation demonstrates that for geospatial applications at scale, the browser is not the place for data processing. By leveraging the database for spatial computation and the GPU for rendering, we delivered a system that is robust, secure, and highly performant.