Up Till 5am Scripting a Webmail Extractor

I have an old email account at fastmail.ca. I don’t use it anymore, but I did for years, and consequently there are years worth of emails from friends partners and family in there. I can still access the account no problem, but it has no feature for searching for specific emails. Neither does it offer POP or IMAP access, so I can’t offload the emails for storage or easy reading. If fastmail.ca ever goes away my email would go with it. I went and got my personal data stuck in somebody’s proprietary database. Again.

This problem has bugged me literally for years, ever since I stopped using that account. I have searched the interwebs in vain to find a ready solution. Late one evening last week I finally got annoyed enough that I decided to do something about it.

I got a copy of the AutoHotkey scripting platform and went to work. There are a number of scripting options out there, but AutoHotkey is open source and seems to have some decent commands for controlling mouse movement and program window interaction and such, so it seemed like a good option.

Mind you, it’s been a long time since i did any coding or scripting of any kind. Back in elementary school I was pretty good. I had a successful science fair program that did some elementary user-controlled graphics on my dad’s Franklin Ace 1000, but my personal proudest accomplishment was a general-purpose BASIC program I wrote for making text-adventure games. It’s heart was a GOTO routine which could read variables from something like a flat file to generate descriptions of places and user options within those places. Change the flatfile contents, change the game from a mediaevel forest to a future spaceship. Pretty cool for grade 7. Since then I’ve intentionally avoided programming as much as possible as something that could easily overtake my life. Not what I want to do professionally.

But apparently there’s something of riding a bicycle about doing grade-7-style programming. By midnight I had the syntax figured out and was getting the program to control my webbrowser sufficiently to flip through my emails as long as they didn’t have a varying number of recipients. By 2 my little script was able to copy emails out of my web browser, switch to my text editor, paste them there, flip back to the browser and occasionally even advance to the next email. By 4 it was chugging through dozens of emails at a time before hitting a snag it couldn’t handle. By 5 I had built enough error-checking loops and failsafes in that it could deal with a certain amount of variability in where the “next” link was going to show up on the screen and would at least bail out if something went wrong instead of endlessly copying the same email into my text file. By 5:30 I was ready to leave it running and go to bed. When I woke up it had moved about 500 emails into my text editor, it got the rest out over the course of a few more runs through the day.

So there it is. An all-night coding marathon, my first since I had to take Programming for Biologists in university. I’ve forgotten the satisfaction of writing and debugging a bit of code. And this is the first time I’ve written a truley useful program (okay script). And now I’ve got all my old emails, with at least fragmentary headers intact, in a big honking text flat file that I can search and scan through. Maybe I should write a script to parse the headers and recopy them into a .csv spreadsheet. Maybe not.

A really elegant solution to this problem would be something that could submit httpd: requests directly to the web server and parse the responses to slot the emails into a database with “from” “to” “date” “subject” and “contents” fields. If anybody does that let me know.

In the meantime, If anyone out there is looking for a way to extract their old webmail emails, here’s an AutoHotkey script you could maybe modify to work with your webmail service. The biggest trick would be to figure out what range of pixels to search for the “next email” link within, and what colour it would be.

Here then, is my kludgey, unannotated, Fastmail Scraper Script for AutoHotKey:

saver := 1
loop
{
IfWinExist, External Protocol Request,
{
WinClose, External Protocol Request,
}
IfWinNotActive, Mozilla Firefox, , WinActivate, Mozilla Firefox,
WinWaitActive, Mozilla Firefox,

ClipboardOld := Clipboard

Send, {CTRLDOWN}a{CTRLUP}{CTRLDOWN}c{CTRLUP}

if (Clipboard = ClipboardOld)
{
Pause
}

IfWinNotActive, UltraEdit-32 – [C:\Documents and Settings\Owner\Desktop\fastmail.txt*], , WinActivate, UltraEdit-32 – [C:\Documents and Settings\Owner\Desktop\fastmail.txt*],
WinWaitActive, UltraEdit-32 – [C:\Documents and Settings\Owner\Desktop\fastmail.txt*],

Send, {CTRLDOWN}v{CTRLUP}{ENTER}
Sleep, 1000
Send {ENTER}>>>>>>>>>{ENTER}{ENTER}
Sleep, 1000,
saver := saver+1

if (saver>9)
{
Send, {CTRLDOWN}s{CTRLUP}
Send, {ENTER}
}
IfWinNotActive, Mozilla Firefox, , WinActivate, Mozilla Firefox,
WinWaitActive, Mozilla Firefox,

PixelSearch, NextBoxX, NextBoxY, 760, 140, 760, 700, 0xAA080A, Fast

If (ErrorLevel=1)
{
Pause
}

NextBoxYSearch := NextBoxY + 3

PixelSearch, NextTextX, NextTextY, 730, NextBoxYSearch, 800, NextBoxYSearch, 0xFFFFFF, Fast

MouseMove, NextTextX, NextTextY
Click, left
Sleep, 2500

If (ErrorLevel>0)
{
Pause
}

PixelSearch, Foo, Bar, 780, 140, 790, 800, 0xAAAAAA, Fast

If (ErrorLevel=0)
{
Send, {CTRLDOWN}s{CTRLUP}
Send, {ENTER}
Exit
}
}

leave a comment