• VuFind Midwest gathering

    A couple weeks ago, representatives from UMich (that’d be me), Purdue, Notre Dame, UChicago, and our hosts at Western Michigan got together in lovely Kalamazoo to talk about our VuFind implementations.

    Eric Lease Morgan already wrote up his notes about the meeting, and I encourage you to go there for more info, but I’ll add my two cents here.

    So, in light of that meeting, here’s what I’m thinking about VuFind of late:

  • Simple Ruby gem for dealing with ISBN/ISSN/LCCN

    I needed some code to deal with ISBN10->ISBN13 conversion, so I put in a few other functions and wrapped it all up in a gem called library_stdnums.

    It’s only 100 lines of code or so and some specs, but I put it out there in case others want to use it or add to it. Pull requests at the github repo are welcome.

    Functionality is all as module functions, as follows:

    ISBN

    ... <more>
  • Solr: Forcing items with all query terms to the top of a Solr search

    [Note: I’ve since made a better explanation of, and solution for, this problem.]

    Here at UMich, we’re apparently in the minority in that we have Mirlyn, our catalog discovery interface (a very hacked version of VuFind), set up to find records that match only a subset of the query terms.

    Put more succinctly: everyone else seem to join all terms with ‘AND’, whereas we do a DisMax variant on ‘OR’.

    ... <more>
  • Why RDA is doomed to failure

    [Note: edited for clarity thanks to rsinger’s comment, below]

    Doomed, I say! DOOOOOOOOOOMMMMMMMED!

    My reasoning is simple: RDA will fail because it’s not “better enough.”

    Now, those of you who know me might be saying to yourselves, “Waitjustaminute. Bill doesn’t know anything at all about cataloging, or semantic representations, or the relative merits of various encapsulations of bibliographic metadata. I mean, sure, he knows a lot about…err….hmmm…well, in any case, he’s definitely talking out of... <more>

  • Data structures and Serializations

    Jonathan Rochkind, in response to a long (and, IMHO, mostly ridiculous) thread on NGC4Lib, has been exploring the boundaries between a data model and its expression/serialization ( see here, here, and here ) and I thought I’d jump in.

    What this post is not

    There’s a lot to be said about a good domain model for bibliographic data. I’m so not the guy to say it. I know there are arguments... <more>

  • Stupid catalog tricks: Subject Headings and the Long Tail

    Library of Congress Subject Headings (LCSH) in particular.

    I’ve always been down on LCSH because I don’t understand them. They kinda look like a hierarchy, but they’re not really. Things get modifiers. Geography is inline and …weird.

    And, of course, in our faceting catalog when you click on a linked LCSH to do an automatic search, you often get nothing but the record you started from. Which is super-annoying.

    So, just for kicks, I ran... <more>

  • Why bother with threading in jruby? Because it's easy.

    [Edit 2011-July-1: I’ve written a jruby_specific threach that takes advantage of better underlying java libraries called jruby_threach that is a much better option if you’re running jruby]

    Lately on the #code4lib IRC channel, several of us have been knocking around different versions (in several programming languages) of programs to read in a ginormous file and do some processing on each line. I noted some speedups related to multi-threading, and someone (maybe rsinger?) said, basically,... <more>

  • Pushing MARC to Solr; processing times and threading and such

    [This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.]

    What’s the question?

    The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to solr was still, at best, taking at least as long as the processing stage.

    I’m interested... <more>

  • ruby-marc with pluggable readers

    I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this:

     require 'marc' require 'my_marc_stuff' mbreader = MARC::Reader.new('test.mrc') # => Stock marc binary reader mbreader = MARC::Reader.new('test.mrc' :readertype=>:marcstrict) # => ditto MARC::Reader.register_parser(My::MARC::<more>
            
  • New interest in MARC-HASH / JSON

    EDIT: This is historical -- the recommended serialization for marc in json is now Ross Singer's marc-in-json. The marc-in-json serialization has implementations in the core marc libraries for Ruby and PHP, and add-ons for Perl and Java. C'mon, Python people!

    For reasons I’m still not entirely clear on (I wasn’t there), the Code4Lib 2010 conference this week inspired renewed interest in a JSON-based format for MARC data.

    When I initially... <more>