Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Efficient Distance Querying in MySQL (aaronfrancis.com)
61 points by aarondf on Nov 10, 2021 | hide | past | favorite | 16 comments


This is an interesting article about strategies to use when traditional indexes just won't do, but for the love of the index please use MySQL's (or postgres' or sqlite's) built in spatial index for this particular class of problems. It will does this sort of thing much, much more efficiently than 99% of in house solutions.

https://dev.mysql.com/doc/refman/8.0/en/spatial-types.html

http://postgis.refractions.net/

https://sqlite.org/rtree.html


With mysql 5.7, range searches on indexes (BETWEEN() or equivalent) are notoriously slow with large datasets.

We discovered that spacial index (flatten to 1 dimension) can be used to solve a common issue: ip range queries.

The query costs dropped and all the query metrics improved.


Does it improve in newer versions of MySQL?

I'll need to so similar work soon.


Great post. Just to be pedantic for a moment:

> it gives you results that are 100% correct, which is pretty important!

Not quite 100% correct, as the `ST_Distance_Sphere` function uses a spherical earth model, & in fact the earth is an oblate spheroid :) The slower-but-more-accurate `ST_Distance_Spheroid` also exists if you care about such things.


It's hardly pedantic, the differences can be on the order of kilometers over large distances. The GPS uses WGS 84 as the reference ellipsoid model, for example.

(if my memory serves)


A few years back I was working at a real estate firm where you could search for available rentals by location. The houses were saved with their latitude and longitude values. Their original search implementation used to fetch all the records from the database and then filter them based on some parameters - IIRC translate back from the lat-long to some location and check if it fell within the area. The number of rows was not too high - 10k or so, so this used to run in single-digit seconds.

At that time MySQL had neither spatial indices nor a native haversine method. I switched out the search method to translate the area name to a lat-long on the frontend (based on google maps) and then use a haversine in the query to retrieve results. It was still a couple of orders of magnitude faster than the previous solution, and more accurate to boot.


This was super informative – I especially like that you showed your thinking through a step by step process, instead of just "here's the best thing to do". That makes it possible to apply this same lesson to other areas as well.


If for whatever reason you can't[1] use a spatial index, you can use something like geohash[2] as a generalized version of the "generated prefix columns that you then use a prefix function on" concept mentioned in the article.

[1] TBH the first thing to look at is whether this is a real constraint

[2] https://en.wikipedia.org/wiki/Geohash


This is clever and a great read. Reminds me a bit of finding hashes, for example, that are within a certain Hamming distance to a target value (hash). If the distance you're interested in is small (e.g., 1 bit), then it might be faster to check for 64 exact target values than to check all values for off-by-1 bit. Very cool. Thank you for posting.


To get around not using the more of the index, the query could be split into parts, one for each tranche combined with UNION ALL. Each part can examine a narrow subset of rows, disjoint from the other parts.


If you are choosing a backend for a project, and you think there's a chance that your use case could require spatial querying: use postgres, so that if the need arises, you can enable postgis. You will get all of the features described here natively, with support for indexing and other improvements as well.


I feel like I have had this in oracle for a dozen years without anything custom. Interesting approach when there's no built in spatial package.


The row estimates in EXPLAIN are just estimates and often very wrong. You should count actual rows read instead via something like SHOW HANDLER.


semi related...I once spent hours working on a bug only to the later realize that mysql uses longitude latitude in spatial methods instead of the other way around..fun times!


Author is quite clever in reinventing solutions for solved problems.


title should be updated to say 2d distances.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: