Visualizing the New York Times, for Example

I wanted to post about these amazing visualizations of links between people and organizations in the New York Times. But in the course of looking up their author I discovered that almost everything he creates is that or more amazing, which makes it difficult to choose one example to highlight. Nonetheless, here’s the 1984 version of the NYT viz:

NYTimes: 365/360 - 1984 (in color)

Check it out large, or see the whole set.

Jer Thorp apparently works by visualizing the output of data mining executed on publicly available data streams. He mostly uses Processing to do it, which he calls “an electronic sketchbook for developing ideas”. Interesting. Using information techniques to digest and synergize the increasing amount of public (spatial) data sets is becoming a natural part of GIS, and if I were a more ambitious programmer I might be inspired by this stuff to try and use Processing in a spatial context.

(Also, he appears to have good taste in neighbourhoods.)

I Want a Personal Cloud

I seem to like computing in clouds. I don’t want to: I don’t like the idea of putting my business or academic data into someone else’s for-profit servers, and I think it’s nutty in a special way to put your private photographs and social relationships in there too. But that’s just ideology, in practice I keep on opening up new documents sporting the Google logo, day-dreaming about the science computing I could do with a few hundreds dollars worth of clock cycles on an Amazon-hosted hadoop cluster, and contemplating moving my email address over to Google Apps for Your Domain. It’s all just so useful. It works across computers, it works across people, and nowadays it even sometimes works when you don’t have internet. The benefits are immediate and tangible (if cloud computing can be called tangible), and the drawbacks are longer-term and probabilistic.

Thus I was excited when the words “private cloud” started cropping up. A private cloud is web-based applications that run on your own server, instead of running on theirs. Advantages without drawbacks. For now private clouds are for corporations to run on their internal intranets. So the words I especially want to see are “personal cloud”. I already rent space on a web server, now I want to be able to install a calendaring service on hughstimson.org, in the same way I’ve already got blogging and photo gallery apps. And I especially want to install Mozilla Docs there. Mozilla, are you making Mozilla Docs?

Big question: if everybody has their own personal cloud running, can they work together? One of the major advantages of current cloud computing is collaboration. If I open a new Google Docs document here in Vancouver, my collaborators over the straight in Victoria can see it and edit it right away, using an interface they’re familiar with. If I were running a document application on hughstimson.org I could create that file, but other people probably don’t want to open an account on hughstimson.org to edit it, nor do they want to learn to use the interface for whatever editing application I’m running there.

I’m guessing there are technical solutions to this technical problem. People already care very much about standard formats in existing cloud computing, and if all of our clouds are able to speak to each other in a common language, then maybe collaboration across them isn’t such a big deal. I open a new spreadsheet, stored in .ods format on my own server, and start editing it on my web interface in my browser. Then I send out an invitation to an email address at Pink Sheep Media, and they open that document up in their own browsers using their own editing application running on the Pink Sheep Media cloud. Or maybe they’re still using Google Docs, and they access the file from hughstimson.org/docs, but edit it in the Google Docs interface. Maybe login access is handled using OpenId. Why not? It would mean having not just open standards for file formats, but also some common commands for editing functions. The editing could be done on their servers, and then the document would be saved back to mine, staying in the open standard file format the whole time. Is that hard? Does someone know?

As far as I know, Mozilla is not working on Mozilla Docs. But they are doing some cool stuff in cloud computing. This one looks like a big opportunity to me. At least, I know I want it very much. So somebody, please, build me a personal cloud.

Jason Scott Is In Your Geocities, Rescuing Your Sh*t

Some time back, Jason Scott — the computing documentarian who hughstimson.org readers may remember from King of Kong controversy — “got angry like a fire gets burning” because AOL hometown was shutting down and leaving its users without many options to save off their home pages. This is part of Jason’s abhorrence of “the cloud”, a general point of view I share. My way of doing something about that distrust is to soldier on operating a personally administered website and email account while even my own aging generation is consumed by Facebook. Jason’s way of doing something about it has been to get ever angrier and found the Archive Team, a loose affiliation of data wonks who are pledged to archiving all the nominally doomed data of the world. They take as their motto “We Are Going to Rescue Your Sh*t”.

So when the call went up that Geocities, perhaps the oldest and creakiest of the early-era personal website providers was being shut down by now-owner Yahoo, the eyes of the world swiveled suddenly to Jason. Could he and the Archive Team rescue two decades worth of websites on Yahoo Geocities? Literally millions of websites? Despite that Yahoo presumably had no interest in him doing so?

Well, Jason?

“And the answer, which I hope you would expect, is OF COURSE WE ARE.”

Good man. Go team. And yes I did. If you’ve spent much time around Geocities, you might now be asking, is it really worth saving? To which he offers this answer:

“Not because we love it. We hate it. But if you only save the things you love, your archive is a very poor reflection indeed.”

I suppose so. All of two days later, the Archive Team is now deep into the process, and offer an update, which I warn you is even more profane than some other Jason Scott discourses on computing and computing history. He reports that large swaths of the Cities appear to have simply been purged over the decades, and those may be forever gone, but many more chunks are there and are being consumed into posterity as we speak. In fact, he estimates that they now have on their harddrives every pre-1999 site that hadn’t already been deleted.

Which made me wonder, was the first website I ever made still there? After all, I stopped updating it back in 1997, which was well before archive.org was doing really comprehensive internet mining. And indeed, it looks like it must have disappeared in the subsequent purges.

But don’t worry world, and don’t worry Archive Team, I performed a search of my own system and discovered I do indeed have a full backup of Where Even Richard Nixon Still Has Soul, manifestos, poems, and correspondence with Richard Nixon buffs intact. He’s still got it. Soul.

nofram3 stamp blink

At the Height of My Programming Renown

I’ve submitted this photo to the Dork Yearbook:

“By the weight of honours strewn about me and my dad’s Franklin ACE 1000 (including, yes, the IBM Regional Computer Techonology Award) you can tell I had just qualified for the Canadian National Science Fair. But they took me aside and explained that the National Fair was meant to be a gathering of peers, and since I was 2 years younger than anyone else and grotesquely small, I wasn’t anybody’s peer. They sent someone bigger. Childhood was a hard time for nerds, yes?”

You know, I don’t actually recall being bitter about that incident. Just a little confused and even relieved. Maybe I didn’t believe that writing a fairly straightforward bit of BASIC code could really put me in the National league of dorkness, and maybe I’ve just never been very competitive. As I recall, there was some weird deal where every province but Ontario had a provincial competition before the national level, and if I went I was going to be up against the hard-bitten survivors of the All-Manitoba and Trans-Yukon science fairs without having gone through that level of seasoning myself. Tough as I look, maybe I was just scared. Maybe I just wanted to go home and work on my real masterpiece, a generic text-adventure engine that was frankly too elegant for the judges to ever understand.

Roll You Own Supercomputing Cluster

Amazon has activated a beta version of a new “Elastic MapReduce” service. The naming is a little obscure, but it looks like it’s essentially the Hadoop distributed computing framework running on their existing “Elastic Compute Cloud“, which is a distributed virtual computing environment running in turn on top of their Simple Storage Service, which is distributed server network. In other words, Elastic MapReduce is a huge, amorphous operating system designed for large computation tasks like web indexing or heavy science, running on top of a huge, amorphous processor and harddrive farm.

Calling these things “elastic” is Amazon’s way of pointing out that you can buy any amount of operating system/processor/storage that you care to pay for, as little or as much, with the only notional limit being the entirety of Amazon’s many large data center buildings. No fuss, no muss, give them some money and you can run your computing application for a while. Give them more and you can run it faster, longer, bigger. All you need is an internet connection to interface with it all. They won’t bother you with details like what state of the union your processes are running in, or on what hardware, or how many people are employed servicing it and swapping out burnt out drives and sweeping up the diet pepsi cans left by the programmers. It’s commodity supercomputing.

I wonder if I had been aware of this option if it would have changed how I structured my thesis research. Maybe I could have got another little research grant and bought myself $1000 worth of supercomputing (I don’t actually know how much supercomputing that would be). What could I have done with it? Then again, most of the software I ran for that project is Windows-only and presumably locked out of hadoop clusters. ArcGIS, Imagine, fragstats, patch analyst, Terrain Analysis System, GeoDA, all Windows and Windows only. Landscape ecology really needs to get off the Windows, it doesn’t scale.

NCSA Mosaic and the Truly Vintage Web

Note: This post is the second part in a two-part “comprehensive history of computing” series, begun here.

These folks offer us “the Vintage Web”, websites that look like they haven’t noticed the last ten years. That’s great.

But 1998 is late-on in internet World Wide Web history. What if you wanted to re-experience the truly vintage www? Even if we’ve forgotten it, a central tenet of HTML is that the display of content should be decided principally by the browser, not by the author. And there was a time, the truly vintage time, when that was still regarded as a feature of the web and not a bug. To really see the vintage web, you would need a truly vintage web browser.

Like, say NCSA Mosaic. You’ll recall that NCSA was the first browser for the World Wide Web to feature a Graphical User Interface (well, the first one not for lawyers). It opened the web to tens of thousands of newcomers in 1993, back before the browser wars had even begun. Netscape? Yet to be built on the bones of Mosaic. Internet Explorer? Isn’t “Explorer” the name Microsoft uses for both their file browser and their interface shell? Surely they aren’t using that name for a third application? Using Mosaic was a learning process: you weren’t just learning the interface of another browser, you were learning that a program could fetch hypertext markup language-encoded text pages from other computers on the Internet network, and display them on your own computer screen.

Well hey, here it is to download and install.

A caveat: only a version updated somewhat in 1997 is available for download, you can’t get v2.0 from 1993. But there’s plenty of 1993 to be felt here. For instance, the download is available as a 3MB file, or as smaller “DISKS”. It will install by default to C:\mosaic\; an entirely sensible place. When you try to connect to a site, it will first advise you that it is “looking up Domain Name Server” (true enough, so it is). The top item in the drop-down selection box of recommended Web sites is “Gopher Servers”, followed by “Home Pages”. “World Wide Web Info” is 5th down. And this line in the user’s manual I downloaded:

“For full functionality, you need access to the Internet. If you do not have access, see Appendix D, “Access Providers” for information.”

Sadly, the one thing it won’t do is load a remote web page for me. I’m not sure why not, and I couldn’t find an active Mosaic user’s forum to help me with my technical issue.

It will load local web pages. Here’s what one of those looks like, albeit with with a modernized “bgcolor” tag set:

But I remember the way the web first looked through that grey window. Square 16bit beveled icons, black serif text on a grey background, and the promise of universal access to the all the geographically dispersed information in the world.

(Here’s a genuine Mosaic screen cap that has survived from almost that far back. Novell’s service and support web site, circa early 1995, held on to by Nathan Zeldes.)

The Long Road to Linux

I don’t remember the first time I installed linux. The earliest memories I can still call on are of installing a version of RedHat on the Dell laptop I used for the last year of undergrad. I was using the same physical setup I do today: a laptop plugged into an external monitor, with attendant mouse/keyboard and stereo system. I was having trouble getting it working with the external monitor, and hand-editing fstab files in root, and so on. At one point I pressed the FN-F7 combination to switch the display over from the monitor screen to the laptop screen, and heard a delicate “pop”, and the laptop screen flashed white, and stayed that way. I guess I had summoned a little too much current through the display adapter. Linux was willing to let me channel that extra power to my laptop screen, even if the screen couldn’t really handle it. Linux does what it’s told. My laptop screen never worked again, although I carried the damn laptop around for years, using it with an external monitor.

Every year or so since, I’ve checked back into the world of GNU/linux to see if the time has yet come when the evergreen promises of grandmother friendliness, or at least non-CS-student friendliness, have come true. They never have yet, although they really do get closer every year. Ironically, it was Windows XP’s inability to consistently link my laptop and monitor’s display that most recently drove me back into linux-land. These days, Ubuntu is the hotness, and it is indeed getting close to general-purpose usability. I have a list of must-haves before I switch fully over, and Ubuntu 8.10 (“Intrepid Ibex”) crosses several of those needs off the list:

A working “suspend” mode.

  • This capacity is not only present in “Intrepid”, it occasionally works, unlike previous releases. But not always.

The ability to use a laptop screen with an external monitor, preferably at the same time.

  • This is working a charm in Intrepid. And doing it better and more stably than Windows XP, in fact.

The ability to use an external soundcard.

  • Baked right in. I didn’t even have to configure it. The first time Ubuntu loaded up, it played the logon sound through my full stereo system.

So most of the hardware stuff seems to be getting ironed out. If only the same could be said for software.

Music management.

  • Not music playing, mind you. There are now plenty of open source, linux-native music players, many of them as good or better than the iTunes standard, and all of which will treat your system with more respect. I like Listen, Amarok, and Songbird very much for the playing of music files. But none of them seriously pretend to be music managers. What I need is something to replace MediaMonkey. To be fair, there is really only one software in the world that does persistent monitoring, user-controlled auto-file-organization and mass-metadata-manipulation of music files stored across disparate directories and harddrives well (i.e., MediaMonkey). But until linux has a MediaMonkey equivalent (or MediaMonkey itself), yo soy Windows-locked. Songbird, are you guys listening?

Photographic workflow.

  • Similar to the situation with music, there are good linux-based applications available to display (and edit) photos, but not to manage them. GIMP never stops improving as an image editor, although it still doesn’t seem to quite keep up with Adobe in that regard. But for workflow: cataloging, mass-editing of metadata, and so on, there just isn’t anything to replace or even touch proprietary, non-linux programs like Lightroom.

Easy software installation.

  • This is one where linux now wins, hands down, no contest. Once upon a time, installing software on linux was an overwhelming task. Lots of open source software build on bits and pieces of existing software to make something new: that’s one of the great advantages of working in open source, you can just do that. It’s encouraged. Unfortunately, if you want to install a little proggy that happens to depend on 4 other proggys, each of which depend on a few others… insanity lurks low over your poor head. But the linux-people fixed that years ago, and oh how they fixed it. The first time you try to install software in a linux environment can be confusing, because it’s so different than Windows (or Mac). But after that first time, it’s hard to go back. Trust me, try it. And you won’t have to reboot, either.

GIS tools.

  • I could be wrong about this one soon enough. I know there are many smart people working days and weekends on moving open-source GIS towards being the ESRI-killer we so desperately want and need. But for a general purpose spatial analysis workstation, you still today need ArcGIS, and that means you still need Windows.

Science and stats tools.

  • R runs in linux of course, and so if you have the R skills, statistics is covered. But if you don’t have the R skills (and who does?), you’re screwed. And for all those random sciencey applications for habitat modeling, or PCR analysis, or radio collar telemetry, or what have you, there’s only a chance that someone will have released a linux version.

For many of these complaints there exists an active project holding out hope of an eventual solution. For none of them is that solution going to arrive in the next point release. In some cases, the solutions are probably several years away from being equal to the respective Windows options.

This is not a pejorative complaint about GNU/linux. I understand that the entire ecosystem of open-source software is an extraordinary volunteer effort and an exemplar of non-profit capacitance, and it has not ceased to blow my mind that linux exists at all, never mind whether it fulfills my personal computing needs. And I think it will happen: a few years ago basic configuration and operation of linux was still an esoteric enough excercise that it wasn’t great even for basic internet and word processing, and that has since changed. But it looks like a few more years until I can freely download a full bodied multimedia processing and scientific analysis workstation. The road is longer than I had hoped a decade ago. But we’ll get there one fine day.

← newer posts