quantifying the complexity of complexity with wikipedia

I wanted an offline copy of wikipedia’s coverage of complexity theory, so I could read up when I’m not connected to the internet. such is my life. I set my web downloader to start at the “self-organizing criticality” page and follow all the outbound links, as long as they didn’t leave wikipedia, downloading each page as it went. I decided it should be sufficient to grab everything within 2 links of that SOC page.

So how many pages are within two links? At present it’s got 512 items downloaded and another 4791 are queued up. Even allowing that about half of what is being downloaded are things like “bullet.gif”, that’s still a lot of pages.

Glancing through the list, I’m guessing some of the culprits for the immensity of the task are nodes like “wars”, “the_universe” and “1955”. Still, there’s quite a diversity of long-tail or non-intuitive concepts: “lagrangian_mechanics”, “socialism”, “schrodinger’s_equation” and “quantum_entanglement” have all been sucked down so far. I guess I shouldn’t be suprised.

leave a comment