MySQL Spatial Functions for Geo-Enabled Applications
Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL spatial functions. The common tasks for such applications are:
- Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile application when we know his/her approximate location. (This usually means we need to calculate a distance between 2 points on Earth).
- Find a ZIP code (U.S. Postal address) for the given location or determine if this location is within the given area. Another example is to find a school district for the given property.
MySQL spatial functions were originally included (implementation follows a subset of OpenGIS standard). However, there are 2 major limitation of MySQL spatial functions that can make it difficult to use those functions in geo-enabled applications:
- Distance between 2 points. The “distance” function was not implemented before MySQL 5.6. In addition (even in MySQL 5.6), all calculations (e.g. distance between 2 points) are done using a planar coordinate system (Euclidean geometry). For the distance between 2 points on Earth this can produce incorrect results.
- Determine if the point is inside a polygon. Before MySQL 5.6 the functions that test the spatial relationships between 2 geometries (i.e. find if the given point is within a polygon) only used a Minimum Bounding Rectangle (MBR). This is a major limitation for example #2 above (I will explain it below).
In my old presentation for the 2006 MySQL User Conference I showed how to calculate distances on Earth in MySQL without using the MySQL spatial functions. In short, one can store the latitude and longitude coordinates directly in MySQL fields (decimal) and use a haversine formula to calculate distance.
New MySQL Spatial Functions in 5.6
The good news is:
1) MySQL 5.6 adds a set of new functions (some of them are not 100% documented though) that use the object shapes rather than the MBR to calculate spatial relationships. Those new functions begins with “ST_”, i.e.
- contains(g1, g2) uses MBR only (not exact!)
- st_contains(g1, g2) uses exact shapes
2) MySQL 5.6 implements st_distance(g1, g2) function that calculates the distance between 2 geometries, which is currently not documented (I’ve filed the feature request to document the st_distance function in MySQL)
The bad news is:
1) All functions still only use the planar system coordinates. Different SRIDs are not supported.
2) Spatial indexes (RTREE) are only supported for MyISAM tables. One can use the functions for InnoDB tables, but it will not use spatial keys.
Example of MySQL’s MBR “false positives”
To illustrate why we do not want to use MBR-based functions for geospatial search, I’ve generated 2 polygons that represent 2 zip code boundaries in San Francisco, CA and placed it on Google Maps.
The blue rectangle represents the Minimum Bounding Rectangle of Zip code “91102” (I’ve used envelope() mysql function to obtain coordinates for the MBR). As we can see it covers both zip code 94103 and 94102. In this case if we have coordinates of a building in the city’s “south of market” district (ZIP 91103) and try to find a zip code it belongs to using the “contains()” function we will have a “false positives”:
In this particular example we got 3 zip codes as the MBR of 94158 also overlaps this area. Another point in “south of market” can actually produce 4 different zip codes. However, in MySQL 5.6 we can use the new st_contains function:
As we can see st_contains() produces the correct results.
Find a ZIP code for the given location
Starting with MySQL 5.6 one can use the MySQL spatial functions st_contains or st_within to find if the given point is inside the given polygon. In our scenario we will need to find the zip code for the given latitude and longitude. To do that in MySQL we can perform the following steps:
- Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts). The data will be stored as MySQL Geometry object, to convert it to text we can use astext(geom) function.
- Use the st_contains() or st_within() functions:
or
Spatial Index for “ST_” functions
MyISAM tables support Spatial indexes, so the above queries will use those indexes. Example:
As we can see our spatial index is used for those functions. If we ignore or remove the index, the query will run significantly slower:
The InnoDB engine does not support spatial indexes, so those queries will be slow. As zip boundaries does not change often we can potentially use MyISAM tables for them.
Find all coffee shops in a 10-mile radius
MySQL 5.6 supports st_distance functions with 2 drawbacks:
- It only supports planar coordinates
- It does not use index
Given those major limitations, it is not very easy to use st_distance function for the geo enabled applications. If we simply need to find a distance between 2 points it is easier to store lat, lon directly and use harvesine expression (as described above).
However it is still possible to use the st_distance() if we do not need exact numbers for the distance between 2 points (i.e. we only need to sort by distance). In our example, to find all coffee shops we will need to:
- Get the 10 mile radius MBR and use “within()” or “st_within()” function
- Use st_distance function in the order by clause
First, we will calculate an envelope (square) to include approximately 10 miles, using the following approximations:
- 1 degree of latitude ~= 69 miles
- 1 degree of longitude ~= cos(latitude)*69 miles
@lat and @lon in this example are the coordinates for the San Francisco International Airport (SFO).
This will give us a set of coordinates (points) for the lower left and upper right corner of our square. Then we can use a MySQL’s envelope function to generate the MBR (we use linestring to draw a line between the 2 generated points and then envelope to draw an square):
The “envelope” will look like this:
This is not exactly a 10-mile radius, however it may be close enough. Now we can find all points around SFO airport and sort by distance.
As we can see from the explain it will use the spatial key on SHAPE and will only scan 430 rows, rather than millions of POIs.
The query does not show the exact distance (this may be ok if we only need to output the points on the map). If we need to show the distance we can use the harvesine formula to calculate that. For example we can create the following stored function to implement the calculations:
And then use it for both order by and to displaying the distance. This query will also filter by “coffee”:
Conclusion
MySQL 5.6 implements an additional set of functions that can help create geo-enabled applications with MySQL. Storing polygons boundaries (ZIP code boundaries for example) is efficient and the new spatial functions (st_within, st_contains, etc) will produce correct results and will use spatial (rtree) indexes (for MyISAM tables only). The OpenGIS standard is very common and it is easy to obtain the data in this format or use the standard application which can “talk” this language.
Unfortunately, st_distance function is not very usable for calculating distance between 2 points on Earth and it does not use an index. In this case it is still more feasible to calculate distances manually using the harvesine formula. Hopefully this will be fixed in the next mysql release.
There are also some other limitations, for example st_union() function only supports 2 arguments and does not support an array, so it can’t be used in a queries like “select st_union(geom) from zipcodes group by state”.
Links
- New spatial functions in MySQL 5.6
- Harvesine formula in Wikipedia
- My 2006 MySQL User Conference talk: geo search in MySQL and how to effectively calculate distance between 2 points in MySQL using harvesine formula
- Henric Ingo compares the sptatial functions in MySQL. MariaDB, Postgress and MongoDB: talk at Percona Live conference, blog post
- Opensource POI database download
- Google maps API: I’ve used it to generate pictures for the examples in this post
- Distance between to points on Earth: online distance calculator
And finally, let me know in the comments how you use MySQL for geo enabled applications. In my next post I will talk more about basics of the MySQL geo spatial extension as well as Sphinx Search‘s implementation of the Geospatial functions.
Hi Alex,
Thank you for this post. I am having this problem above as I would like to use your query concept above but on a database of 800,000 records. Do you think it will perform well?
I posted my query here. Any help would be stellar!!!
http://stackoverflow.com/questions/20393104/mysql-spatial-indexing-with-multiple-left-join-on-large-database
Hi Matt,
Here is what you need:
1. MySQL 5.6
2. MyISAM table with the index on field you will use MBRContains() or st_within() functions.
3. Where condition which will filter out only needed rows.
For example:
select astext(shape), name from waypoints
where
order by st_distance(point(@lon, @lat), shape) limit 10;
Will scan ALL rows (800K rows in your case).
And this query:
select astext(shape), name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), shape) limit 10;
Will use an index on the shape field and only scan a number of rows.
So, to make it work firts you will need some filtering in the where clause that will use an index, the distance/st_distance does not use a spatial index
Please let me know if this help.
Hi Alex,
Thank you for the help. Much appreciated. Please contact me as we are interested in working with you to help optimize some of our geo location tables for our directory software platform.
Hi Matt,
I will be happy to work with you using Percona Consulting offering (https://www.percona.com/products/mysql-consulting/overview). You can also contact me by email or my linkedin (http://www.linkedin.com/in/alexanderrubin).
Meanwhile please feel free to post any questions here.
Awesome write up, thanks!
Nitpick, it’s haversine, not harvesine.
Hu Alex,
I’ve followed your brilliant guide to create a db which contains poi. Unfortunately once I’m explaining the within distance query I see that if there are more then 4 items explain shows ALL as a type of a query. Could you please spell it out for my why is that and how should I overcome it?
My table is:
create table marketing_action
(id int primary key auto_increment, times_seen mediumint,likes mediumint, ranking smallint,geo_position geometry not null, title varchar(35), descrption varchar(60) )
default character set utf8 default collate utf8_general_ci
engine = myisam;
alter table marketing_action add spatial index geo_position_spatial_index (geo_position);
and query which I use is:
select harvesine(y(geo_position),x(geo_position),@lat,@lon) as dist ,X(geo_position),Y(geo_position), title from marketing_action
where st_within(geo_position, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))));
the data I populate is:
truncate table marketing_action;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 30, geo_position = GeomFromText(‘POINT (4.82677 52.27516)’),title = ‘Nieuwe Schoenen bij Yanaike en Ron’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (4.82677 52.26516)’),title = ‘Menu van de dag bij Smerige Bloem restaurant’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 38, geo_position = GeomFromText(‘POINT (4.82677 52.22345)’),title = ‘Super korting bij Wijnen van Jordy’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.01252 52.30230)’),title = ‘OP=OP bij Yofkes Niks’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.00235 52.27000)’),title = ‘OP=OP bij kruiden’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 15, geo_position = GeomFromText(‘POINT (4.82677 52.20123)’),title = ‘Een concert bij het restaurant happje snappje’;
As I said once I remove 2 of random items MySQL starts using spatial index.
Thanks up front for your answer!!
Hi Alex,
very informative article, thanks
You’ve mentioned that you might be able to cover in more details in the future:
“Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts)”
is there any way you can publish a quick walkthrough or perhaps a link reference to the process
Thanks a lot
Val, just published the followup post with the description, please let me know if you will have any questions. See here: http://www.mysqlperformanceblog.com/2014/03/24/creating-geo-enabled-applications-mysql-5-6/
Aleksander, sorry I’ve missed you comment. I believe this just too small amount of rows and MySQL will prefer to use full table scan instead of index. Have you tried with the larger data amount?
Thanks for your great post. I have following scenario:
The following query takes 22 sec in table. I wanna mention than in my business table I have 10+ million data. Also, by doing a count(name) I found it got 1992312 data (~2 million) . Is it expected or Im doing anything wrong? Im using mysql 5.6.16 and I do have a spatial index on spatial_location field.
Query:
set @lon = -74.00597309999999;
set @lat= 40.7143528;
set @dist = 200;
set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);
select name from business
where st_within(spatial_location, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), spatial_location)
limit 10;
This is the result of explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE business range spatial_location spatial_location 34 NULL 941211 “Using where”
Hi Alex,
That’s indeed a solution with more rows the query starts to be executed with an index. Thanks!
Sakib,
Yes, you are right, in this example if you have a lots of rows in the business table it can have to scan too many rows. For 200 miles it may scan a lot.
For example, here is my test for Openstreetmaps data:
mysql> select name from points where st_within(SHAPE, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2)))) order by st_distance(point(@lon, @lat), SHAPE) limit 10;
+—————————-+
| name |
+—————————-+
| Techno Tourist |
| Sun Building |
| Broadway Chambers Building |
| NULL |
| NULL |
| Blue Spoon |
| NULL |
| City Hall (R) |
| NULL |
| New York Ladder Company 1 |
+—————————-+
10 rows in set (17.59 sec)
Explain:
id: 1
select_type: SIMPLE
table: points
type: range
possible_keys: SHAPE
key: SHAPE
key_len: 34
ref: NULL
rows: 522409
Extra: Using where; Using filesort
1 row in set (0.01 sec)
In this case it will scan (estimated) 500K rows and will only need 10. If you have large amount of rows in the table I would suggest:
1. Use shorter distance. Example:
set @lon = -74.00597309999999;
set @lat= 40.7143528;
set @dist = 10;
id: 1
select_type: SIMPLE
table: points
type: range
possible_keys: SHAPE
key: SHAPE
key_len: 34
ref: NULL
rows: 492
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Result:
10 rows in set (0.39 sec)
2. Use other filters in the where clause and filter by restaurants only for example.
Hello Alex,![🙁]()
Thanks for quick reply. So, it looks like its a performance issue of st_within() method. Prob is that I cant keep a smaller value for @dist as for rural area the amount of data would be less. I tried the same query in NoSQL database (MongoDB) and for the same query the response time is 0.3 second. But, NoSQL has poor fulltext search functionality which is better in MySQL. So, looks like Im kind of stuck in middle of MySQL and NoSQL
Sakib,
One potential way of fixing this problem will do the “incremental” approach, first try with 10 miles then with 25 miles, etc and stop when you will get 10 results back.
Another way is to try SphinxSearch, it will support both fulltext search and geodistance function/
Ya thats is what I was thinking too. But the main problem is to create an algorithm to get the desired number of data with as less increment step as possible. Lets think we need to fetch 1000 data. and with @dist =10, we get 150 data. So, on the next step, we should increment by 1000/150 = 6.66 ~ 7 times. Means, next query will run with @dist = (10 * 7) =70.
I know this may not be a perfect solution but what do you think? Can you suggest me any better idea?
Sakib,
Looks ok. Another option will be to filter by type of business for example: “where st_within(…) and type=’restaurant'” (but it will not be a combined index) or “partition” the table by the type of business.
Hello Alex,![🙂]()
The app is a directory listing site and I need to show all the businesses within a radius around a given point. So, unfortunate I cant use any additional filter here. I will create a stored procedure and go for the incremental process that we discuss here. Thanks for helping me out
Hi Alexander Rubin,
I need to get distance of a ponit (x, y) from a polygon.
Do we have any function in mysql to calculate the same??
Regards,
Rahul
It seems like the Polygon-in-Polygon-test does not work most of the time.
SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
+————-+
| ST_CON|
+————-+
| 0 |
+————-+
SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
+————-+
| ST_CON |
+————-+
| 0 |
+————-+
But It gives true answer while using MBRCONTAINS
SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘MBRCON’;
+————-+
| MBRCON|
+————-+
| 1 |
+————-+
SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘MBRCON’;
+————-+
| MBRCON |
+————-+
| 1 |
+————-+
SELECT MBRContains(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((3 3,3 4,4 4,3 3))’)) as ‘MBRCON’;
+————-+
| MBRCON |
+————-+
| 0 |
+————-+
you can also try my variant. that is how to determine point or points inside circle
>>> SELECT * FROM
locator
WHERE SQRT(POW(X(center
) – 49.843317 , 2) + POW(Y(center
) – 24.026642, 2)) * 100 <radius
http://dexxtr.com/post/83498801191/how-to-determine-point-inside-circle-using-mysql
Hope this helps.
Hello Alexander,
I would like to combine a spatial index with a Btree index on a varchar. For example I would like to know where all airports are within a certain region, and to speed up the query I don’t want to consider rows in my table that represent coffeeshops.
So in addition to the spatial index I also want to provide a type of the place, is that possible? And could you give me some hints on where I could find the information to do so?
My question is also here:
http://stackoverflow.com/questions/24088337/how-can-i-combine-a-btree-and-geograhpical-or-spatial-index-to-speed-up-my-query
Thanks a lot in advance.
Sebastian
Sebastian,
Unfortunately, it does not work – MySQL will try to choose the best index, based on index statistics, either spatial or btree. What you can do thou is completely avoid using spatial index, calculate the MBR manually.
Here are 2 way to do it.
a) Create a table with airports only
b)
1. Add lat, lon declared as double (not a point) and populate it
2. Calculate MBR in the code approximately
3. Use the query
Select …
where point_type = ‘airport’ and lat between X1 and Y1 and lon between X2 and Y2
order by haversine_distance() … desc limit 10;
Where X1, Y1, X2, Y2 are constant that you have calculated manually to “draw” a rectangle, similar to contains does.
Please let me know of this make sense
Rahul Kumar Mehta,
st_distance can calculate the distance between point and polygon. You can also use centroid() function to get the center of the polygon.
Hi Alexander,
Thanks for your help and quick reply. I tried option b) the approach using a multicolumn BTREE index, however it appeared to be twice as slow as my original approach using the spatial index. Therefore I will stick with the spatial index for now and split up my table for different types of places, which I think is the best solution in terms of performance.
hi mr.alex, my english not good but i need ur help
i will to show polyline to leaflet map from mysql database, my coordinate type is: linestring,
values ex: -7.98478 112.63213,-7.90000 112.60000 (two point).
this leaflet function for showing polyline to map
L.polyline([[-7.94432, 112.64917],[-7.94993, 112.66085],[-7.95401, 112.69947]]).addTo(map);
what query can use?
Hello Alexander,
Thank you for the great article.
I have a problem when I am inserting simple spatial data into mysql. Instead of actual data I am getting some weird symbols in the rows. These are the queries that I am running, directly taken from the mysql documentation:
CREATE TABLE geom (g GEOMETRY);
INSERT INTO geom VALUES (GeomFromText(‘POINT(1 1)’));
This is the result of the query:
‘�?�?@@’
‘�?�?’
‘�?�?’
‘�?�?’
Do you have any idea why is this happening?
Thank you a lot in advance!
Hello Alex,
Thank you for the great article.
How can I fetch polygons in parent child manner
http://stackoverflow.com/questions/25115565/how-to-design-a-databases-table-to-store-mysql-spatial-data-in-a-parent-child-r
Hello sir,,, i dont know waypoints field….plz create table with starting from end,,,i want to find points of interest using mysql spatial queries,,,plz help me
Hi Alex,
thank you for this great post! It looks like InnoDB supports spatial indexes now with MySQL 5.7.5:
http://dev.mysql.com/doc/refman/5.7/en/creating-spatial-indexes.html
“For MyISAM and (as of MySQL 5.7.5) InnoDB tables, MySQL can create spatial indexes using syntax similar to that for creating regular indexes, but using the SPATIAL keyword.”
Hi Alex
First of all I must admit that I am a ‘Silver Surfer’ and am not a coder as such, so if the answer to my question is positive I will need to find some kind so soul to help me with the coding!
My question is based on the known positions of aircraft.
I collect position data from aircraft using an ADSB receiver and I would like to know if it is possible to work out if they are within a set distance (10 miles) from a known point (an airfield)
Please be gentle with me
Thanks and Regards
Dave
David Guy,
If you load your points to MySQL table (aircraft_positions) you can then run the query:
select * from aircraft_positions where st_distance(pos, point(@lat, @lon)) < 10 Where @lat, @lon is the lat and log of the known point (an airfield) and pos should be stored as geography in MySQL. Hope that helps.
i am using following query to get distance from given lat lng and sorted.
but i am getting empty resultset even when i have a value with same lat lng (29.39,81.31) in my db.where am i wrong
Query is ..
set @lat= 29.39;
set @lon = 81.31;
set @dist = 10;
set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);
select harvesine(x(pt), y(pt), @lat, @lon ) as dist, id from position
where st_within(pt, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by dist limit 10;
Structure is http://postimg.org/image/o90p7ycpf/
Thanks for this post – it was exactly what I needed on a project I am working on. Good stuff and greatly appreciated.
Alexander Rubin,
I need to select records which present around 30 miles circle from current location, I used geohash and also long and lat. I have millions of record in cluster. Which will be the best option to query data with high performance.
Hi everyone,
I would like to draw a 500m buffer around a GPS point with mySQL.
I done the following query :
set @lat= 48.856614;
set @lon = 2.352222;
set @dist = 500;
select st_astext(st_buffer(point(@lat,@lon), @dist));
Unfortunately, it seems that all points returned by MySQL are not GPS point…
What am I doing wrong ?
Dear Alex good Night, see if you can help me, can you check what is wrong in my code above, because is not working.
my column geom -22.91602,-41.97678/-22.94322,-42.11171/-23.02634,-41.99945/
my query $sql = “select * from cerca_gps where ST_Contains(geon, point(-22.96850902, -42.0272541));”;
Spatial is supported with TokuDB?
Hi Alex!
Can you please share the “waypoints” table structure?