Geospatial indexing, or Geocoding, is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data.
Though the first popular geospatial indexing technique “Geohash” was invented as recently as 2008, indexing latitude-longitude pairs to manageable subdidivisions of space is hardly a new concept. Governments have been breaking up their land into states, provinces, counties, and postal codes for centuries for all sorts of applications, such as taking censuses and aggregating votes for elections.
Rather than using the manual techniques used by governments, we data scientists use modern computational techniques to execute such spatial subdividing, and we do so for our own purposes: analytics, feature-engineering, granular AB testing by geographic subdivision, indexing geospatial databases, and more.
Geospatial indexing is a thoroughly developed area of computer science, and geospatial indexing tools can bring a lot of power and richness to our models and analyses. What makes geospatial indexing techniques further exciting, is that a look under their proverbial hoods reveals eclectic amalgams of other mathematical tools, such as space-filling curves, map projections, tessellations, and more!
This post will explore three of today’s most popular geospatial indexing tools — where they come from, how they work, what makes them different from one another, and how you can get started using them. In chronological order, and from least to greatest complexity, we’ll look at:
- Geohash
- Google S2
- Uber H3
It will conclude by comparing these tools, and recommending when you might want to use one over another.
Before getting started, note that these tools include much functionality beyond basic geospatial indexing: polygon intersection, polygon containment checks, line containment checks, generating cell-coverings of geographical spaces, retrieval of geospatially indexed cells’ neighbors, and more. This post, however, focuses strictly on geospatial indexing functionality.
Geohash, invented in 2008 by Gustavo Niemeyer, is the earliest created geospatial indexing tool [1]. It enables its users to map…