World map stat counter
If you happen to visit this page: https://planet.gnome.org/ (disclaimer: I'm not affiliated at all with GNOME), at the bottom of the page you will see a small picture of a world map with dots. The dots represent where the people in the GNOME community come from. I have seen this in other websites too, but they use the dots on the world map to represent where the sites' visitors come from, so it is a kind of stat counter.So! I want one too, because it looks good and it's interesting. It's actually nothing new and many others have created them. One can even easily register for this widget on many web analytics sites. But I don't want to use a third party analytics - I mean, this blog already uses Google webfont and a few other online javascripts providers (which are useful), adding another one just to show a silly stat counter sounds ... well, silly.
So I rolled up my sleeves a bit and started my google-fu --- to find nothing. I really need to sharpen my google-fu.
Then I sat down and thought about it - how difficult is it? All that is needed is to convert all those IP addresses from my webserver log into some sort of location identifiers (cities, countries, etc) and from there, to find the latitudes, longitudes (popularly called as 'geocoding'), and then transform them to x,y coordinates so we can draw them on a map. Sounds simple, right?
So I rolled up my sleeves a bit higher and went on to find the information. Firstly, I need to find out how to convert a bunch of IP addresses to a 'location' (=be it cities, countries, or may directly in latitude/longitude). As it turned out, there are quite a few who provides the service to do this conversion. A few examples that I find are: MaxMind's GeoIP, HostIP, IpInfo, geonames, and a few others. Most of them offer web-services - some are RPC-based, some are REST-ful web-services with JSON or XML payload (Again, I'm not affiliated with them - this is just the result of my searches).
I won't comment on the quality of their services (which obviously vary from one another) because at the end I decided not to use any of them. The reason is simple. I need to geocode a bunch of IP addresses, not only one. Geocoding one IP address through web-service is nice and good, geocoding hundreds of them - unless they have a specific bulk-geocoding service, is going to be a hassle and slow, not to consider that I may violate the terms of service. So, no, I need a better way to do it.
Fortunately, some of the excellent sites not only offer geocoding web services; they also offer the geocoding database. That's right, a database that contains a mapping between IP addresses and locations. The 'locations' here varies, some provides the countries in which the IP addresses are located, some are based in cities, some provided the longitude/latitude directly, etc. In the end, I decided to choose MaxMind's GeoIP. They provide a 'lite' version of the geocoding database under CC 3.0 license, which is nice of them. Even this 'lite' version consists of over 50MB worth of data in the CSV format - more than enough for a silly stat counter.
With their database, the 'geocoding' service because as a simple as search problem - given an IP address, find a row in the database which includes that IP in the row's IP address range. Once done, you've got the lat/long directly. Problem solved! Hey, I could use scripts to do this, awk will do nicely, something that naively looks like this:
awk -v myip=$IP ' $1 >= myip && $2 <= myip { print $6, $7 }' geoip.data
.
It does work, and it worked fast - for one IP address. But on my laptop it took me about 120ms to do one search for one IP address. With a hundred IP addresses, that will translate to 1.2s ... that's slow. I can guess why it is slow - string operations, conversion of strings to integers, etc - most of the time that only needs to be done once. So why don't we do that? In the end, I wrote a bunch of simple C programs pre-process the geoip database, converting it from text into a fixed-record binary format. Once I have *that*, I can load them into memory and access them as arrays, using binary search to look for matches. As it turned out, I don't even need to load them to memory - I can just access the data file as memory-mapped file. And that cuts down the search time from 120ms to under 2ms. Not bad!
Well after I have lat/long, I need to convert them into xy coordinate of my world map. I got the world map from Wikimedia here. I chose the 310px version. From there a simple read of Mercator Projection will tell you what you need to know on how to do the conversion.
Okay, I have the geocoding, I have the map, I have the xy coordinate markers to be put on the map. The last component I need is some scripted drawing software to put these marks into the map image. There are many ways to skin this cat, but the simplest one is to use the famous "Gd" library (old home page here, new one here. So I got it, compiled it, and ... well it is a library. There was no tool to do it from the shell, one is supposed to write a C program to use them! There are wrappers for them: for Perl (Perl::Gd), PHP (PHP-Gd), and a few others - but I'm not using any of them (except Perl, and despite the beauty of the language, I've decided to stay away from it). Oh no, the other options is to use ImageMagick, but I really prefer not to use a chainsaw just to cut a strawberry branch. Fortunately, a kind soul has written a scripting tool for Gd, called 'Fly', available here. It is dated 2009 but still works beautifully today.
The rest is just a few scripts to glue all these together, and this is the final output :)
This map will be a permanent addition to the 'Stat' section of the menu block at the right hand side of this page.
I will make the code available later after I tidy it up a little.
Edit - Delete
No comments posted yet.