I am Mr. Rourke...er...Bill Dueber, your host.

2009 August 14
For the last few months, I've been working on rolling out a ridiculous-modified version of Vufind, which we just launched as our primary OPAC, Mirlyn, with a slightly-different version powering catalog.hathitrust.org, a temporary metadata search on the HathiTrust data until the OCLC takes it over at some undetermined date. (Yeah, the HathiTrust site is a lot better looking.) [Our Aleph-based catalog lives on at mirlyn-classic) -- I'll be interested to see how the traffic on the two differs as time goes on. ... More
2009 May 11
Refworks has some okish documentation about how to deal with its callback import procedure, but I thought I’d put down how I’m doing it for our vufind install (mirlyn2-beta.lib.umich.edu) in case other folks are interested. The basic procedure is: Send your user to a specific refworks URL along with a callback URL that can enumerate the record(s) you want to import in a supported form Your user logs in (if need be) gets to her RefWorks page RefWorks calls up your system and requests the record(s) The import happens, and your user does whatever she want to do with them Of course, there are lots of issues with doing this well (quick! ... More
2009 April 15
After a medium-sized discussion on #code4lib, we’ve collectively decided that…well, ok, no one really cares all that much, but a few people weighed in. The new format is: A list of arrays. If it’s got two elements, it’s a control field; if it’s got four, it’s a data field. SO….it’s like this now. { "type" : "marc-hash", "version" : [1, 0], "leader" : "leader string" "fields" : [ ["001", "001 value"] ["002", "002 value"] ["010", " ", " ", [ ["a", "68009499"] ] ], ["035", " ", " ", [ ["a", "(RLIN)MIUG0000733-B"] ], ], ["035", " ", " ", [ ["a", "(CaOTULAS)159818014"] ], ], ["245", "1", "0", [ ["a", "Capitalism, primitive and modern;"], ["b", "some aspects of Tolai economic growth" ], ["c", "[by] T. ... More
2009 April 15
Why do I ever, ever think that MARC might not rely on order? I don’t know. In any case, control fields will now be just an array of duples: control: [ ['001', 'value of the 001'], ['006', 'value of the 006'] ['006', 'another 006'] }
2009 April 13
In my first shot at MARC-in-JSON, which I appropriately (and prematurely) named MARC-JSON, I made a point of losing round-tripability (to and from MARC) in order to end up with a nice, easy-to-work-with data structure based mostly on hashes. “Who really cares what order the subfields come in?” I asked myself. Well, of course, it turns out some people do. Some even care about the order of the tags. “Only in the 500s…usually” I was told today. ... More
2009 March 30
[Only, of course, if you’re using Solr. Otherwise, that’d be dumb.] We’ve been working on Mirlyn2-Beta, our installation of VuFind for some time now (don’t let the fancy-pants name scare you off), and the further we get into it, the more obvious it is that I want to move as much data normalization into Solr itself as possible. Arguments about how much business logic to move into the database layer, in the form of foreign-key requirements, cascading inserts and deletes, stored procedures, etc. ... More
2009 March 18
OK. I’m done with it, and this time I mean it. I’ve updated and improved the lc normalization code, documented the algorithm, and put it all into Google Code. In the next couple weeks, I’ll be turning it into a Solr text filter so we can do some decent sorting on call-number search results.
2009 February 12
The good folks at ticTocs heard the call for open data, and they responded…exactly as I asked them to. Which makes me think I should have asked for a pony, too, but I’m still very, very happy! Anyone can now download a simple tab-delimited text file describing all the journal table of contents RSS files they’ve assembled, for use however anyone wants. The data include issns and eissns (where available), the title of the journal, and of course the URL of the RSS/Atom/Whatever feed. ... More
2009 February 2
For those who haven’t heard, ticTOCs is a service that provides web-based access to a database of Journal RSS/Atom Table of Contents feeds. Awesome. In their blog at News from TicTocs, a post titled I want to be completely honest with you about ticTOCs notes that: As for the API - yes, we’ve been asked this several times, and the answer is that it is currently being written and should be available very soon. ... More
2009 January 25
[I’ve noticed that a sure way to get people to look at stuff (as measured by, say, digg) is to include a number. So I did. Five. ] Over at Bibliographic Wilderness, Jonathan Rothkind has a great followup to an ongoing discussion on the Blacklight list called How to build shared open source in which he tackles some of the differences between open-sourcing your code (a legal and distribution issue) and actually making it so someone else can usefully contribute to your code. ... More