Several years ago I read xkcd.com/903 and the alt-text got me thinking.
Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”.
I wrote up a python script to play around with this idea of all wikipedia pages
reaching philosophy. It ended up
being a lesson in frustration with parsing the strange mediawiki markup format.
I ended up with a long, brittle function which kept track of brackets, curly
braces, and parentheses levels to try to find the first link within a page.
Corner cases mostly handled, I threw it up on my site as just
~cgi-bin/xkcdwiki.py with some minimal work to output html.
After sharing on facebook to friends, it was posted to some other places and started receiving traffic. Mostly embarrassed at how poorly it worked, I later took it down. However I always wanted to revisit it, if for nothing else to stop serving 404s.
Now it runs on app engine, and the source code is on github.
See it in action here: http://philosophy.ryanelmquist.com
It’s fun returning to the same problem many years later, to see how your approach differs. I really was hoping for a way to restructure the problem to avoid the tedious string processing. However as you can see, I haven’t done that yet. The biggest change I noticed is I quickly built a test harness to add end-to-end tests, and a utility to make adding them easy.
Play around, and let me know if you find any more corner cases.