Thursday, August 09, 2012

Speed Scripting - Grabbing Olympic Flag Images

I've always liked heraldry (coats of arms) and vexillology (flags).  Well, I was browsing the website of the London Olympics, and I noticed that they had flags of all participating countries.  Well, I immediately thought of the wallpaper or screensaver I might design with those images.  However, there are 200+ countries in the Olympic movement; there's NO WAY I'm grabbing all those by hand.

Enter wget.  This wonderful little open-source utility (source code here, Windows binary here, check your Linux distribution) provides a command-line interface for downloading HTTP, HTTPS and FTP URLs, up to and including full webpages.  So, all I had to do was knock together a quick shell script and let wget do the work.

A quick bit of browser work, and I had my URL; a bit of testing showed that all the flag images were in the same directory.  So, off I went with:

$ wget http://www.london2012.com/imgml/flags/l/nep.png
...
HTTP request sent, awaiting response... 403 Forbidden
...
$

What?  But it works fine in a browser!  Ah, I see; their website must be looking for a User-Agent string to check/log what browsers are hitting the site...ok, wget can handle that, I'll just use the --user-agent option and pretend to be Firefox:

$ wget --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" http://www.london2012.com/imgml/flags/l/nep.png
...
HTTP request sent, awaiting response... 200 OK
...
$

There we go!  Now, I need a list of those three-letter "Olympic country codes"...gee, thanks, about.com! Now, I have a 'countries' file that looks like this:

Afghanistan - AFG
Albania - ALB
Algeria - ALG
etc...

A little bit of cleanup to make sure that the country code is the last thing on each line, and then I can awk it down to size.  One of awk's best features is its builtin variables; in this case, NF is the number of fields in the current line, so $NF refers to the last field of each line, regardless of how many fields it may contain.  My one-liner awk script:

$ awk '{ print $NF }' countries > abbreviations

gives me an 'abbreviations' file with one country's Olympic code per line.  Now, I just need to feed that 'abbreviations' file to wget, one country at a time:

!/bin/sh
#
# getflags -grab flag images from olympic website
#
ABFILE=/home/wmorgan/olyflags/abbreviations
#
cat $ABFILE | while read country
do
echo
echo Getting $country...
echo
wget --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3"  http://www.london2012.com/imgml/flags/l/$country.png
done

A few minutes later, and presto!  I have the flags of all 204 participating Olympic nations.  (Note to purists: the Republic of China competes as "Taipei" and has an Olympics-specific flag.)  There are also four different sizes available, in the "s", "m", "l", and "xl" subdirectories under /imgml/flags, like so:

                 

so you can edit the URL in the script to grab the size you want...

Whatever your flag need may be, I'm guessing that this may be your best shot at grabbing consistent, high-quality flag images.  Enjoy.

 


 

No comments: