• How good is our relevancy ranking?

    For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important – and, in many ways, most opaque – questions is, “How good is our relevancy ranking?”

    Research from the UMich Library’s Usability Group (pdf; 600k) points to the importance of relevancy ranking  for both known-item searches and discovery, but mapping search terms to the... <more>

  • Ruby gem library_stdnums goes to version 1.0

    I just released another (this time pretty good) version of my gem for normalizing/validating library standard numbers, library_stdnums (github source / docs).

    The short version of the functions available:

    • ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit)
    • ISSN: get checkdigit, validate, normalize
    • LCCN: validate, normalize

    Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically... <more>

  • A short ruby diversion: cost of flow control under Ruby

    A couple days ago I decided to finally get back to working on threach to try to deal with problems it had – essentially, it didn’t deal well with non-local exits due to calls to break or even something simple like a NoMethodError.

    [BTW, I think I managed it. As near as I can tell, threach version 0.4 won’t deadlock anymore]

    Along the way, while trying to figure out how threads affect the behavior... <more>

  • ISBN parenthetical notes: Bad MARC data #1

    Yesterday, I gave a brief overview of why free text is hard to deal with.

    Today, I’m turning my attention to a concrete example that drives me absolutely batshit crazy: taking a perfectly good unique-id field (in this case, the ISBN in the 020) and appending stuff onto the end of it.

    The point is not to mock anything. Mocking will, however, be included for free.

    What’s supposed to be in the 020?

    ... <more>
  • Why programmers hate free text in MARC records

    One of the frustrating things about dealing with MARC (nee AACR2) data is how much nonsense is stored in free text when a unique identifier in a well-defined place would have done a much better job.

    A lot of people seem to not understand why.

    This post, then, is for all the catalogers out there who constantly answer my questions with, “Well, it depends” and don’t understand why that’s a problem.

    Description vs Findability

    ... <more>
  • Corrected Code4Lib slides are up

    …at the same URL.

    I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong.

    I presented a bunch of statistics drawn from nearly a year of Mirlyn logs. The most outlandish of my assertions, and the one that eventually turned out to be the most... <more>

  • [RETRACTED] Code4Lib 2011 Lightning Talk Slides

    DANGER! I was trying to re-verify my numbers and found a glaring and hugely important mistake. I'll make a new post with the details, but basically I was counting about 180k sessions (out of only 735k) that I should have been ignoring. Please ignore my basic stats until further notice. See the new numbers and corrected slides for more accurate data.


    I did a little Lightning Talk at Code4Lib 2011 and cleaned... <more>

  • Four things I hate about Ruby

    Don’t get me wrong. I use ruby as my default language when possible. I love JRuby in a way that’s illegal in most states.

    But there are…issues. There are with any language and the associated environment. These are the ones that bug the crap out of me.

    • Ruby is slow. Let’s get this one out of the way right away. Ruby (at least the MRI 1.8.x implementation) is, for many things, slow. Sometimes not... <more>
  • Does anyone use those prev/next/back-to-search links?

    There’s a common problem among developers of websites that paginate, including OPACs: how do you provide a single item view that can have links that go back to the search (or to the prev/next item) without making your URLs look ugly?

    The fundamental problem is that as soon as your user opens up a couple searches in separate tabs, your session data can’t keep track of which search she wants to “go back to” unless... <more>

  • Size/speed of various MARC serializations using ruby-marc

    Ross Singer recently updated ruby-marc to include a #to_hash method that creates a data structure that is (a) round-trippable without any data loss, and (b) amenable to serializing to JSON. He’s calling it marc-in-json (even though the serialization is up to the programmer, it’s expected most of us will use JSON), and I think it’s the way to go in terms of JSON-able MARC data.

    I wanted to take a quick look... <more>