Overture Maps Foundation released their first version of their open Map Dataset last week, and many blog posts and articles have been written about it. Even I wrote a Post on how to access the data in this release.
There have been a few posts about Overture, and what it does like this one,
but I find that they don’t really go into depth about what Overture is really doing, and what they are doing differently.
At a glance, if you go to the Overture Website, you will see that it has Internet Behemoths, like Amazon, Meta & Microsoft on one hand, and GeoSpatial Leaders like TomTom and ESRI behind it. But what are they trying to do? Is beating Google the only goal? How would you do that? What do you need to do differently? What role does OSM play in this? What does Overture do differently, as compared to the data released by OSM?
Let’s start by looking at OSM, and where it falls short. OSM is the largest Open Source Geospatial Data Source, which has been created by contributors and volunteers around the world. It forms the basis on top of which many organisations, such as MapBox, Stadia, Felt, Graphopper and MapTiler build their business. They can do so, because OSM freely offers their data under a highly permissive ODBL License.
But there are several impediments to using the data. The data is available in OSM’s Protocol Binary Format, and the data model that it follows is OSM’s flexible Tagging structure. The flexibility comes at a price, and the data model can be quite difficult to query, understand and normalise.
Then, there is the question of completeness. OSM by no means contains all the roads in the world, or all the buildings in the world. It depends on the contributors adding, annotating, and editing those records. The OSM community began with a focus on Outdoor Mapping ( Field mapping, or in-situ mapping), and armchair mapping is still frowned upon in many communities.
Lastly, the needs of the community and the needs of the large Corporates often do not overlap. Individual volunteers (including me) are often interested only in the areas around them, and may not want to spend too much time completing the map in far off areas, and tend to spend their time perfecting the map in their area of interest. This tends to create hotspots of areas with lots of edits, and the remaining part of the world (which would be the majority) see very few editors. Because of this, often the Areas where mapping does not happen might be the places which need the most urgent mapping focus.
Meta & Microsoft have spent a lot of effort in releasing their open datasets, and offering the RapID editor to allow human in-the-loop based automated editing, but has faced some pushback on including this machine generated inputs in OSM.
This probably led to (and I am speculating here) Meta and Microsoft realising that this way of working was not sustainable, and there should be a better way of creating data that is more usable, and has better coverage. This is why I think that these companies came together to create the Overture Maps Foundation.
If you have been following what they are posting, mentioning in various conferences and documents, a picture has started to emerge. They are doing things in an intentional way, to be better than what is currently available (i.e. OSM’s data release).
Based on my view, the differentiator when it comes to Overture Maps is the following:
Data standardisation
One of the main challenges of the OSM data model is the flexibility that it offers. Depending on what you need, you will have to query for different tags, all of which are not documented to the same details. There is no hierarchy, and things which expect to be quite similar to each other, might require completely different tags which you have to query. For example, a carpenter’s workplace is tagged with craft=carpenter, while a doctor’s is tagged with amenity=doctors. Then there are the ambiguous tags. A Pharmacy could be tagged with amenity=pharmacy, or shop=chemist, or healthcare=pharmacy depending on who you ask. This data needs to be standardised and when consuming it, the users should have only one place where they can search for what they need. To solve this problem, Overture had standardised and released their data specifications in June of 2023. This I think, is the most impactful way in which Overture is distinguishing itself over OSM
Data QC
When it comes to using crowdsourced data, Large organisations like Meta and Amazon have valid concerns about the quality of data. How do you know that the data is correct? You don’t want to send your customer down a path that doesn’t exist? You don’t want to show Duplicate shops with the same name. No one wants to show a Slur in the place of one of the largest cities in the world.
Solving this problem requires spending a lot of time and effort doing QC on the data. Much of this can be Automated, and this is something that Meta has been working on for quite some time, as can be seen from their various blog posts over these ages: https://engineering.fb.com/2019/09/30/ml-applications/mars/
https://engineering.fb.com/2023/02/07/web/basemap-facebook-instagram-whatsapp-improvements/
These efforts are something that require a lot of money to be spent, and would lead to usage of this data by more organisations around the world.
Referencing system
This is something that I have seen no one talk about. In June of this year, Overture had also released ‘The Overture Global Entity Reference System’, which should aid in ‘interoperability by providing a system to structure, encode, and reference map data to a shared universal reference (GERS ID)’. In simple terms, this would allow you to build and consume datasets from other providers and apply them on top of Overture’s Map Release. For example a provider could generate data on Road quality, such as surface smoothness and portholes and you can just apply it on top of the road data that Overture releases. This seems to be related to Tom-tom’s previous work on Open LR
I also suspect, that going ahead, Tom-tom may not sell road geometry data, but would only spend money and efforts in collecting and selling Road Attribute data, such as Names, speed limits, lanes etc, which can be conflated with the roads released by Overture (Again, this only my speculation. I have no insider knowledge about Tom-tom’s internal plans)
This reference system will allow the ingestion and merging of road related datasets from different sources, a problem which is not solved today.
Programmer friendly data format
The OSM’s PBF format is not the most easy to use formats, and you have to use specific tools like osmosis, and osmium to query it, and most developers ingest the data into a RDBMS by using tools like osm2pgsql.
Releasing the data in parquet format will definitely make it easier for developers to access and query the data, especially since the current process of converting data to Parquet hasn’t seen any updates in quite some time.
AI created data
This was one of the points which was holding back the completeness of the data. It’s been several years since Microsoft released their building dataset, but it’s been a slow process of manually checking each and every building, and adding it to OSM. It’s the same case when it comes to Meta’s Road dataset. Meta had been releasing the data separately as sidecar files in their Daylight Distribution , but the users had to choose to add these AI created datasets. Overture has decided to add the AI generated records directly, and it will help by making it a simple process of consuming this data from one place.
Open data from sources like Meta
This is where Overture’s Data is most different when compared to OSM. This alpha release contains never before seen data, which was sourced from Meta, and Overture also plans to ingest data from local governments and other open data sources, like you can currently see in the RapID editor.
Where does this leave the Open Street Map project? Will the project die? I don’t think so.
I foresee that
the community will continue editing and adding data in their neighbourhoods.
Overture will take that data, Standardise it, pass it though a Quality process.
The Data release will be used by applications which are used by common consumers.
The first and the last happen today, and Overture fills the gaps, which will lead to increase in usage of the data by more and more Applications and services.