I am Mr. Rourke...er...Bill Dueber, your host.

2011 July 1
[Yes, another post about ruby code; I’ll get back to library stuff soon.] Quite a while ago, I released a little gem called threach (for “threaded #each”). It allows you to easily process a block with multiple threads. # Process a CSV file with three threads FIle.open('data.csv').threach(3, :each_line) {|line| send_to_db(line)} Nice, right? The problem is that I could never figure out a way to deal with a break or an Exception raised inside the block. ... More
2011 May 26
I spent way too long asking my friend, The Internet, how to get a normal DBI connection to SQLIte3 using JRuby. Apparently, everyone except me is using ActiveRecord and/or Rails and doesn’t want to just connect to the database. But I do. Here’s how. First, get the gems: gem install dbi gem install dbd-jdbc gem install jdbc-sqlite3 Then you’re ready to load it up into DBI. require 'rubygems' # if you're using 1. ... More
2011 May 25
For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important – and, in many ways, most opaque – questions is, “How good is our relevancy ranking?” Research from the UMich Library’s Usability Group (pdf; 600k) points to the importance of relevancy ranking for both known-item searches and discovery, but mapping search terms to the “best” results involves crawling deep inside the searcher’s head to know what she’s looking for. ... More
2011 May 6
I just released another (this time pretty good) version of my gem for normalizing/validating library standard numbers, library_stdnums (github source / docs). The short version of the functions available: ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit) ISSN: get checkdigit, validate, normalize LCCN: validate, normalize Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically valid. ... More
2011 May 3
A couple days ago I decided to finally get back to working on threach to try to deal with problems it had – essentially, it didn’t deal well with non-local exits due to calls to break or even something simple like a NoMethodError. [BTW, I think I managed it. As near as I can tell, threach version 0.4 won’t deadlock anymore] Along the way, while trying to figure out how threads affect the behavior of different non-local exits, I noticed that in some cases there was still work being done by one or more threads long after there was an exception raised. ... More
2011 April 12
Yesterday, I gave a brief overview of why free text is hard to deal with. Today, I’m turning my attention to a concrete example that drives me absolutely batshit crazy: taking a perfectly good unique-id field (in this case, the ISBN in the 020) and appending stuff onto the end of it. The point is not to mock anything. Mocking will, however, be included for free. What’s supposed to be in the 020? ... More
2011 April 11
One of the frustrating things about dealing with MARC (nee AACR2) data is how much nonsense is stored in free text when a unique identifier in a well-defined place would have done a much better job. A lot of people seem to not understand why. This post, then, is for all the catalogers out there who constantly answer my questions with, “Well, it depends” and don’t understand why that’s a problem. ... More
2011 February 15
…at the same URL. I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong. I presented a bunch of statistics drawn from nearly a year of Mirlyn logs. The most outlandish of my assertions, and the one that eventually turned out to be the most incorrect, was that some 45% of all our user sessions consist of only one action: a search. ... More
2011 February 9
DANGER! I was trying to re-verify my numbers and found a glaring and hugely important mistake. I’ll make a new post with the details, but basically I was counting about 180k sessions (out of only 735k) that I should have been ignoring. Please ignore my basic stats until further notice. See thenew numbers and corrected slides for more accurate data. I did a little Lightning Talk at Code4Lib 2011 and cleaned up (and heavily annotated) my slides for anyone interested in them. ... More
2011 January 13
Don’t get me wrong. I use ruby as my default language when possible. I love JRuby in a way that’s illegal in most states. But there are…issues. There are with any language and the associated environment. These are the ones that bug the crap out of me. Ruby is slow. Let’s get this one out of the way right away. Ruby (at least the MRI 1.8.x implementation) is, for many things, slow. ... More