As a quick side project, I’ve started working on the problem that postal code area information in Openstreetmap is often insufficient. Why? Because it’s a nice show case of how flexible the Core SDK is, allows me to stress test the 2D / orthogonal handling code paths and it is a good opportunity to potentially contribute to this great, crowd sourced project.
The Problem
If you have ever tried to search for an address on openstreetmap.org, you may have noticed that the postal code information is quite often missing or incorrect. The German coverage of postal code areas is actually pretty impressive and the error in incorrect postal area association seems to be in the region of tenth of a precent. But Austria and Switzerland have large gaps and Italy, the Netherlands and Polen seem to have none at all.
There are many reasons why the quality of postal code areas in Openstreetmap is often way less than the quality of river banks, streets or other features:
- Address information isn’t visible on satellite images. Hence someone who is tracing an image to capture buildings or streets has no way of knowing what the name of the street let a lone the number of the house is.
- There is usually no publicly available information that specifies where exactly the boundaries of postal code regions lie. For most countries, its even hard to find a list of street/point of interest with postal code information that are compatible with the strict Openstreetmap license.
- For many use cases, postal code information is not necessary. As a consequence, the interest in getting this information into Openstreetmap is less than for example the interest for getting correct street names.
- …
But as soon as one tries to do reverse geo-coding or use Openstreetmap for address validation, the postal code becomes a crucial piece of information.
The Situation in Switzerland
There is however some hope: we don’t need perfect information, because nothing geo-related is really ever perfect. Data gets outdated, people enter information incorrectly into Openstreetmap, etc. We just need something that is good enough… or at least better than what we have right now.
Let’s take a closer look at the situation in Switzerland as an example. The following image shows the OSM boundary relations of type postal_code.
Note that the blue areas at the top of the map are actually not part of Switzerland but part of Germany. They have only been part of the OSM extract data set of Switzerland. As one can see, the actual coverage of postal code areas is pretty appalling.
Things look better when considering admin relations that also have a postal code meta tag:
The shape of the areas are actually the administrative districts but someone has added the extra postal code attribute to them. The way the postal system is setup in Switzerland and Austria, this somewhat works but in the vast majority of cases the administrative region and postal code region are actually not the same. For example, in larger cities there is a single level-8 admin boundary for the whole city although it contains multiple postal codes, so adding a postal code to the admin boundary cannot be correct.
Voronoi Diagrams
One fairly obvious idea to reconstruct the postal code areas is to take data points where we know both the location and the postal code and then associate each part of the map with the closest know point. This is know as a Voronoi diagram and there are algorithms to compute such a diagram efficiently. By merging cells with the same postal code, we can get something that mimics the postal code areas.
The easiest source of data points is Openstreetmap itself. If we take all the buildings and points of interests (churches, memorials, …) that have a postal code attribute, we get the following map:
Note that this is already filtered against a list of known, valid postal codes. There is a garbage in Openstreetmap where someone has used the postal code attribute but entered additional whitespace, added to many digits, etc.
Calculating the Voronoi Diagram and merging cells produces the following result:
This is actually not bad. Not bad at all! There are a couple dozens of cases where someone has incorrectly tagged a building or POI, but that is fixable. I’ll probably just do a lookup table where I take the OSM-ID as source and enter what the correct value would have been. One nice property is that the density in data points is higher in large cities, which is also where having precise postal code boundaries has the highest importance.
But doing an inverse check reveals a problem: For approx. 10-15% of the valid postal codes, OSM has fewer than ten data points. This might or might not be a problem because in sparsely populated areas, a single data point can cover a whole village and where exactly the boundary between two postal codes lies is not really important because there is “empty” space (e.g. mountains, forests, …) between the villages.
Implementation
One of the reasons why I started this quick side project is because a lot of the required functionality already exists in Shapeflow3D / the Core SDK. All the screenshots and results in this post have been generated by a new, separate tool that I created in a couple of hours. I did add an implementation of Fortunes Algorithm (I will probably write a post about implementation details and tips at some other point) to the Core SDK which took a while to get numerically stable but now does the Switzerland calculation in less then a minute (in Debug mode, release mode would be even faster). I wrote a new .osm.pbf importer for the app (will not become part of the SDK) which was easily done using libosmium and it took a few hours of UI work to get a dedicated Openstreetmap editor of sorts!
But I was able to re-use all the zooming, picking, translating, mesh structures, rendering right out of the Core SDK. Due to the nature of Openstreetmap, I was sure that the whole solution could never work completely automatic. I therefore designed the app for a semi-automatic workflow and want to add capability to do manual edits to the generated borders.
All in all, the whole project is a nice show case of how flexible the Core SDK is. I really was surprised how quickly this new app was up an running. Perhaps I’ll post the source code for the app itself (without the SDK) as a demonstration at some point.
Future Tasks
So what remains is:
- Create the lookup table to override incorrect postal code attributes
- Intersect the reconstructed areas with the already existing postal code areas (because they will be more precise than the re-constructed ones) as well as the country border. I’ll probably implement some form of boolean operator for this and add it to the SDK.
- Smooth/simplify the boundaries where possible
- Add code to be able to contribute/import the information into Openstreetmap
- … and most importantly: Find a source of further data points that is compatible with the Openstreetmap license to improve the quality of the reconstructed regions. If any reader has some recommendations, please let me know. Not that the Openstreetmap license is very strict!
As far as I can see, the same tool/workflow should work for adding postal code relations to Italy as well, have to check with other countries.
2 Comments
jeremy rutman
did you ever post the code for postcodes?
alex
No, unfortunately not. The Openstreetmap communities for the countries I was interested in were not keen on using generated/derived data. In their opinion, OSM should just contain real data and it should be the responsibility of the system that uses OSM to add/extrapolate missing data.
The tool heavily relied on the Shapeflow base code (for UI and scenograph structures) so I cannot publish the whole app. But if you’re interested, drop me a mail and I can send you the relevant algorithmic part though.