Monday, October 26, 2009

Going Parallel in R

I recently had the need to get some parallelism going in R. I was doing some largeish-scale monte carlo and markov chain monte carlo simulations of latent space network models for NIPS. Anyway, my needs were quite simple, basically, I just need lots of repetitions and I wanted to easily spread the repetitions across a few processors/cores, all on a single machine.

I found a number of options and started out using the multicore package. This worked wonderfully and easily when I tested it out on my desktop and ported my code over to use it. Of course, I neglected to notice the not-so-small print on the manual:
SystemRequirements: POSIX-compliant OS (essentially anything but Windows)
Now, I had never really encountered system-specific packages for R before, so this kind of caught me by surprise when I scp'ed things over to our big ole' windows machine to run and found that I couldn't install multicore. Well, eventually wised up and replaced mutlicore with snowfall when on a windows system. Beginning to think I should have just used foreach, but that's another story.

Editing Mendeley Citation keys

Mendeley is a nifty new tool for organizing and sharing academic references. I've been test driving it a bit, mainly using it to import files that I then transfer over to JabRef. This was working out alright, since Mendeley allows you to export to bibtex, which is the backend for JabRef and what I ultimately end up using to write my papers. I've been looking for a way to streamline this process, since I like Mendeley's interface a lot better and it has better search features, e.g. full-text search of all attached pdfs.

Unfortunately, Mendeley doesn't use bibtex for its backend, but it does use SQLite and will even let you export your database as a SQLite zip file. I can't seem to figure out where Mendeley stores my user files so far, so exporting seems the way to go.

Using the SQLite Database Browser, I managed to load up my Mendeley database. Things seem pretty obvious, with most of the things you'd want to futz with in the "Documents" table. For now I just removed all the citationKey values and generated my own the way that I wanted them, but it's nice to know that it's not too miserable to get at the raw data from Mendeley.

Note: You can also restore an edited zip of sqlite data, but you have to have it all flat, exactly the way that Mendeley exported it.