Sunday, March 27, 2011

Flags, detail

A quick note about the flags. There are enough now that it's hard to be sure I got them all. So.. I went to the Flag Counter site (this page, and the next). Rather than do anything fancy, I just copied the text to a file and then processed it with Python. My database of flag images is from Wikipedia, and I shortened the file names by the country code. Since I can't remember a number of them, I wrote a Python script to harvest the Flag Counter entries and match them with the country codes (from here).

I checked the directory with the flag images by eye, which is almost certainly a mistake.

Here is the list of countries from which visitors to this site have come, in alphabetical order. The script is at the end. It shows a number of typical issues you run into with this kind of processing.


AE  United Arab Emirates
AL Albania
AN Netherlands Antilles
AR Argentina
AT Austria
AU Australia
BB Barbados
BE Belgium
BG Bulgaria
BH Bahrain
BR Brazil
BY Belarus
CA Canada
CH Switzerland
CL Chile
CN China
CO Colombia
CR Costa Rica
CS Serbia
CV Cape Verde
CY Cyprus
CZ Czech Republic
DE Germany
DK Denmark
DZ Algeria
EC Ecuador
EE Estonia
EG Egypt
ES Spain
FI Finland
FR France
GH Ghana
GR Greece
HK Hong Kong
HR Croatia
HU Hungary
ID Indonesia
IE Ireland
IL Israel
IN India
IS Iceland
IT Italy
JM Jamaica
JP Japan
KR South Korea
LT Lithuania
LU Luxembourg
MA Morocco
MD Moldova
MT Malta
MU Mauritius
MX Mexico
MY Malaysia
NL Netherlands
NO Norway
NZ New Zealand
PA Panama
PE Peru
PH Philippines
PK Pakistan
PL Poland
PR Puerto Rico
PT Portugal
QA Qatar
RO Romania
RU Russia
SA Saudi Arabia
SE Sweden
SG Singapore
SI Slovenia
SK Slovakia
SV El Salvador
TH Thailand
TN Tunisia
TR Turkey
TT Trinidad and Tobago
TW Taiwan
UA Ukraine
UK United Kingdom
US United States
UY Uruguay
VE Venezuela
VN Vietnam
ZA South Africa



from utils import load_data

specials = { 'South_Korea':'Korea_(South)',
'Russia':'Russian_Federation',
'New_Zealand':'New_Zealand_(Aotearoa)',
'Serbia':'Serbia_and_Montenegro',
'Croatia':'Croatia_(Hrvatska)',
'Vietnam':'Viet_Nam' }

data = load_data('country-codes.txt')
D = dict()
for line in data.strip().split('\n'):
L = line.strip().split()
D['_'.join(L[1:])] = L[0]

cL = list()
data = load_data('scraped.txt')
for line in data.strip().split('\n'):
L = line.strip().split()
i = len(L) - 4
country = '_'.join(L[1:i])
cL.append(country)
if country in specials:
k = specials[country]
D[country] = D[k]

def f(k): return D[k]
for country in sorted(cL, key=f):
print D[country],'\t', country.replace('_',' ')