For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important – and, in many ways, most opaque – questions is, “How good is our relevancy ranking?”
The short version of the functions available:
- ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit)
- ISSN: get checkdigit, validate, normalize
- LCCN: validate, normalize
Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically... <more>
A couple days ago I decided to finally get back to working on
threachto try to deal with problems it had – essentially, it didn’t deal well with non-local exits due to calls to
breakor even something simple like a
[BTW, I think I managed it. As near as I can tell,
threachversion 0.4 won’t deadlock anymore]
Along the way, while trying to figure out how threads affect the behavior... <more>
Yesterday, I gave a brief overview of why free text is hard to deal with.
Today, I’m turning my attention to a concrete example that drives me absolutely batshit crazy: taking a perfectly good unique-id field (in this case, the ISBN in the 020) and appending stuff onto the end of it.
The point is not to mock anything. Mocking will, however, be included for free.
What’s supposed to be in the 020?... <more>
One of the frustrating things about dealing with MARC (nee AACR2) data is how much nonsense is stored in free text when a unique identifier in a well-defined place would have done a much better job.
A lot of people seem to not understand why.
This post, then, is for all the catalogers out there who constantly answer my questions with, “Well, it depends” and don’t understand why that’s a problem.
Description vs Findability... <more>
…at the same URL.
I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong.
DANGER! I was trying to re-verify my numbers and found a glaring and hugely important mistake. I'll make a new post with the details, but basically I was counting about 180k sessions (out of only 735k) that I should have been ignoring. Please ignore my basic stats until further notice. See the new numbers and corrected slides for more accurate data.
Don’t get me wrong. I use ruby as my default language when possible. I love JRuby in a way that’s illegal in most states.
But there are…issues. There are with any language and the associated environment. These are the ones that bug the crap out of me.
- Ruby is slow. Let’s get this one out of the way right away. Ruby (at least the MRI 1.8.x implementation) is, for many things, slow. Sometimes not... <more>
There’s a common problem among developers of websites that paginate, including OPACs: how do you provide a single item view that can have links that go back to the search (or to the prev/next item) without making your URLs look ugly?
The fundamental problem is that as soon as your user opens up a couple searches in separate tabs, your session data can’t keep track of which search she wants to “go back to” unless... <more>
Ross Singer recently updated ruby-marc to include a
#to_hashmethod that creates a data structure that is (a) round-trippable without any data loss, and (b) amenable to serializing to JSON. He’s calling it marc-in-json (even though the serialization is up to the programmer, it’s expected most of us will use JSON), and I think it’s the way to go in terms of JSON-able MARC data.
I wanted to take a quick look... <more>