This morning brought the news that Angela Merkel has decided to add her voice to the chorus of criticism directed at Pope Benedict for his decision, on January 21, 2009, to lift the excommunication of holocaust denying bishop Richard Williamson. Pope Benedict was himself a member of the Hitler Youth as a young man, which obviously complicates matters for him when he starts ex-excommunicating holocaust deniers.
I figure a very rough proxy for anger about the issue has to be the number of Google hits for “Nazi Pope” in a particular period of time. Of course, we should expect a baseline number of hits as a result of the controversy surrounding Pope Pius XII, and there’s bound to be a lot of noise (people angrily objecting to the term “Nazi Pope” for example). Anyway, this chart is rough and crappy, but it gives you an idea.

It’s even more remarkable if you assume that the vast bulk of the increase comes from the period after January 21, 2009. I wonder what February will look like.
Description of how I made the chart is below the fold, in case anyone wants to check its accuracy.
First, the chart depends on my getting the conversion of Gregorian to Julian dates right (needed to query google properly), which I’m not at all confident I managed to do. The script below is unbelievably hackish. I just used it to generate a list of queries to run. I ran them by hand because there were so few, and because I’ve heard horror stories of Google shutting out people who use automated queries. The function Greg_2_Jul() is basically ripped off from elsewhere and adapted to Python. It sorta kinda seems to work:
import time
import datetime
import calendar
import matplotlib.pyplot as plt
def Greg_2_Jul(y, m, d):
A = y/100
B = A/4
C = 2-A+B
E = int(365.25 * (y + 4716))
F = int(30.6001 *(m+1))
return int(C + d + E + F - 1524.5)
def queryConstructor(query, start, end):
while start < end:
days_in_month = calendar.monthrange(start.year, start.month)[1] #monthrange return tuple of first day of month and number of days in month
greg_start = Greg_2_Jul(start.year, start.month, start.day)
moving_end = start + datetime.timedelta(days_in_month - 1)
greg_end = Greg_2_Jul(moving_end.year, moving_end.month, moving_end.day)
print 'Query for the dates %s to %s' % (start, moving_end)
print '"%s" daterange:%d-%d' % (query, greg_start, greg_end)
print '\n' * 2
start = moving_end + datetime.timedelta(1)
start = datetime.date(2008, 1, 1)
end = datetime.date(2009, 1, 31)
queryConstructor('Nazi Pope', start, end)
Then I just used matplotlib to draw the results I had recorded by hand:
months = range(13)
hits = [113, 186, 179, 303, 156, 153, 187, 174, 193, 199, 304, 390, 803]
plt.xlabel('Month, starting Jan, 2008 and ending Jan, 2009')
plt.ylabel('Google results for "Nazi Pope" during the month')
plt.title('Google results for "Nazi Pope" over time')
plt.plot(dates, hits)
plt.show()


Chris | 04-Feb-09 at 3:07 pm | Permalink
It guess it wouldn’t be all that hard to write a little tool that took a search string and a date range, constructed a list of appropriate urls, extracted the number of hits from the resulting pages, and then plotted hits over time. I’d want to run it from the public library though, just to be on the safe side.
Chris | 04-Feb-09 at 3:08 pm | Permalink
Or am I just being silly?
ben wolfson | 04-Feb-09 at 3:29 pm | Permalink
One of the pages you link says that a Julian date is “the number of days that have passed since January 1, 4713 B.C.”. Trial and error suggests that this is equal to:
1721425 + (datetime.date(year,mon,day) – datetime.date(1,1,1)).days
Chris | 04-Feb-09 at 3:42 pm | Permalink
Thanks! Skimming the new results and comparing them with the old results suggests that they produce roughly the same Julian dates. But your way is much nicer.
Chris | 04-Feb-09 at 4:23 pm | Permalink
Oh, should have known it wouldn’t be so easy to write a script to grab anything from google.
This, as an experiment:
import urllib
html_data = urllib.urlopen(’http://www.google.com/search?hl=en&%3Aen-US%3Aofficial&hs=wOT&q=%22Nazi+Pope%22++daterange%3A2454466-2454496&btnG=Search’)
for line in html_data:
print line
Gets me a long message including this:
Your client does not have permission to get URL
/search?hl=en&%3Aen-US%3Aofficial&hs=wOT&q=%22Nazi+Pope%22++daterange%3A2454466-2454496&btnG=Searchfrom this server.You can hardly blame them. I imagine people are constantly trying to harvest info from them.
ben wolfson | 04-Feb-09 at 5:00 pm | Permalink
There are officially-approved ways to do what you want, but I can’t remember them. It is possible. You need an API key or whatever.
Steve Laniel | 04-Feb-09 at 5:09 pm | Permalink
Wouldn’t Google Trends do the same thing?
http://www.google.com/trends?q=nazi+pope
Or maybe I misunderstand.
Chris | 04-Feb-09 at 5:09 pm | Permalink
Oh cool.
Chris | 04-Feb-09 at 5:11 pm | Permalink
Doesn’t the Google Trends figure out how many times the term has been searched for? I was looking at how many results the search returned in a given period.
Chris | 04-Feb-09 at 5:13 pm | Permalink
That’s kind of an amazing spike in 2005, though, isn’t it? And of course it coincides with the month he was named Pope.
Steve Laniel | 04-Feb-09 at 6:48 pm | Permalink
If you want to fake Google out, and have them not block urllib, I think you just need to fake the user-agent string. Try setting it to something like this:
“Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.5) Gecko/2008122010 Iceweasel/3.0.5 (Debian-3.0.5-1)”
I think they just automatically block anything that looks like a spider. But on the Internet, no one needs to know you’re a spider, you know?
Steve Laniel | 04-Feb-09 at 6:48 pm | Permalink
Oh, and yes: Google Trends compares search volumes, not page counts. Quite so. So I *did* misunderstand.
Chris | 04-Feb-09 at 10:22 pm | Permalink
Ah, thanks. I figured there had to be a way to do that.