I am Mr. Rourke...er...Bill Dueber, your host.

2010 November 3
There’s a common problem among developers of websites that paginate, including OPACs: how do you provide a single item view that can have links that go back to the search (or to the prev/next item) without making your URLs look ugly? The fundamental problem is that as soon as your user opens up a couple searches in separate tabs, your session data can’t keep track of which search she wants to “go back to” unless you put some random crap in the URL, which none of us want to do. ... More
2010 September 29
Ross Singer recently updated ruby-marc to include a #to_hash method that creates a data structure that is (a) round-trippable without any data loss, and (b) amenable to serializing to JSON. He’s calling it marc-in-json (even though the serialization is up to the programmer, it’s expected most of us will use JSON), and I think it’s the way to go in terms of JSON-able MARC data. I wanted to take a quick look at the space/speed tradeoffs of using various means to serialize MARC records in the marc-in-json format compared to using binary MARC-21. ... More
2010 September 16
A couple weeks ago, representatives from UMich (that’d be me), Purdue, Notre Dame, UChicago, and our hosts at Western Michigan got together in lovely Kalamazoo to talk about our VuFind implementations. Eric Lease Morgan already wrote up his notes about the meeting, and I encourage you to go there for more info, but I’ll add my two cents here. So, in light of that meeting, here’s what I’m thinking about VuFind of late: ... More
2010 September 13
I needed some code to deal with ISBN10->ISBN13 conversion, so I put in a few other functions and wrapped it all up in a gem called library_stdnums. It’s only 100 lines of code or so and some specs, but I put it out there in case others want to use it or add to it. Pull requests at the github repo are welcome. Functionality is all as module functions, as follows: ... More
2010 August 18
[Note: I’ve since made a better explanation of, and solution for, this problem.] Here at UMich, we’re apparently in the minority in that we have Mirlyn, our catalog discovery interface (a very hacked version of VuFind), set up to find records that match only a subset of the query terms. Put more succinctly: everyone else seem to join all terms with ‘AND’, whereas we do a DisMax variant on ‘OR’. ... More
2010 April 23
[Note: edited for clarity thanks to rsinger’s comment, below] Doomed, I say! DOOOOOOOOOOMMMMMMMED! My reasoning is simple: RDA will fail because it’s not “better enough.” Now, those of you who know me might be saying to yourselves, “Waitjustaminute. Bill doesn’t know anything at all about cataloging, or semantic representations, or the relative merits of various encapsulations of bibliographic metadata. I mean, sure, he knows a lot about…err….hmmm…well, in any case, he’s definitely talking out of his ass on this one. ... More
2010 April 20
Jonathan Rochkind, in response to a long (and, IMHO, mostly ridiculous) thread on NGC4Lib, has been exploring the boundaries between a data model and its expression/serialization ( see here, here, and here ) and I thought I’d jump in. What this post is not There’s a lot to be said about a good domain model for bibliographic data. I’m so not the guy to say it. I know there are arguments for and against various aspects of the AACR2 and RDA and FRBR, and I’m unable to go into them. ... More
2010 April 13
Library of Congress Subject Headings (LCSH) in particular. I’ve always been down on LCSH because I don’t understand them. They kinda look like a hierarchy, but they’re not really. Things get modifiers. Geography is inline and …weird. And, of course, in our faceting catalog when you click on a linked LCSH to do an automatic search, you often get nothing but the record you started from. Which is super-annoying. ... More
2010 March 12
[Edit 2011-July-1: I’ve written a jruby_specific threach that takes advantage of better underlying java libraries called jruby_threach that is a much better option if you’re running jruby] Lately on the #code4lib IRC channel, several of us have been knocking around different versions (in several programming languages) of programs to read in a ginormous file and do some processing on each line. I noted some speedups related to multi-threading, and someone (maybe rsinger? ... More
2010 March 4
[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.] What’s the question? The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to solr was still, at best, taking at least as long as the processing stage. ... More