June 19, 2007


Will Baird

I think that you need to write a perl script to get your data moved across. wget might work though if you have a linux box.


There are Windows versions of wget. Problem is, it captures all the HTML, and it's a major pain to sort out the main text, each individual comment, and the metadata from that mishmash. Worse than cutting and pasting and cutting and pasting.

Also, some hidden information gets lost, like the location of the cut-line and the IP addresses of the comments. (True, that's going to happen with any web extraction utility.)

Ideally, I'd want something that extracts the pages already into Movable Type format, so I don't have to re-invent the wheel. I may end up using wget for emergency backup, but that's going to leave me with a lot of bracketed gibberish.

Charlie Stross

Is the MT installation running on MySQL, or something else (Berkeley DB)?

If you're on MySQL it should be possible to drag all your comments out by running an appropriate SELECT against the database.

Also, if you want to back up your MT installation, having a MySQL based setup makes things a bit easier. Something I haven't done yet (and need to, on my own blog) is to set up nightly MySQL dumps to a serial SQL file, then download it using rsync (so that only changed/new data is copied over the network).


Charlie, I don't know whether it's Berkeley DB or MySQL, because I don't have administrative access to that level. (From error messages, I believe it's MySQL.) I have a secondary Movable Type user's front end -- no Import & Export Entries option -- and no deeper access to bookcase.com. I don't have any way to get to the actual MT files. All I can do is go to the edit screen for each individual post of mine, and cut and paste and cut and paste.

It's a little frustrating. (That's Wisconsin understatement.)

James Angove

Yeah, your pretty much screwed. (Although it seems to me strategies of the style "Doug. Douugg! Help!" would be useful here).

You're also probably pretty much done by now, I'd think. You're prolific, but your not crazy prolific. If its still worth the effort, you could probably do something like one mighty wget on all your post+comments, redirect the whole thing into a single file and it should open in a browser, letting you C&P that whole mass into a flat file of some sort and then edit it as desired into something you like.


You're also probably pretty much done by now, I'd think.

Oh, no. There are nine pages of entry titles, not counting 2007's. Most entries have at least a few comments, and it's the longest threads where I most want to keep all the information.

