Skip to content

Year: 2011

Solr and boolean operators

[Summary: ALWAYS ALWAYS ALWAYS USE PARENTHESES TO GROUP BOOLEANS IN SOLR!!!] What does Solr do, given the following query? a OR b AND c I’ll give you three guesses, but you’ll get the first two wrong and won’t have any idea how to generate a third, so don’t spend too much time on it. Boolean algebra and operator precedence Anyone who’s had even a passing introduction to boolean alegebra knows that it specifies a strict order to how the operators are bound: NOT before AND before OR. So, one might expect the following grouping: a OR (b AND c) That’s…

Comments closed

A short personal note

We had another baby. 🙂 Shai Brown Dueber was born last Monday, the 3rd, at a very moderate 7lbs 7.2oz (his brothers were 9lbs and 9.5lbs). Mother, baby, and older brothers are all doing well. Father is freakin’ tired.    

Comments closed

Even better, even simpler multithreading with JRuby

[Yes, another post about ruby code; I’ll get back to library stuff soon.] Quite a while ago, I released a little gem called threach (for “threaded #each”). It allows you to easily process a block with multiple threads. # Process a CSV file with three threads FIle.open(‘data.csv’).threach(3, :each_line) {|line| send_to_db(line)} Nice, right? The problem is that I could never figure out a way to deal with a break or an Exception raised inside the block. The core problem is that once a thread trying to push/pop from a ruby SizedQueue is blocking, there’s no way (I could find) to tell…

Comments closed

Using SQLite3 from JRuby without ActiveRecord

I spent way too long asking my friend, The Internet, how to get a normal DBI connection to SQLIte3 using JRuby. Apparently, everyone except me is using ActiveRecord and/or Rails and doesn’t want to just connect to the database. But I do. Here’s how. First, get the gems: gem install dbi gem install dbd-jdbc gem install jdbc-sqlite3 Then you’re ready to load it up into DBI. require ‘rubygems’ # if you’re using 1.8 still require ‘java’ require ‘dbi’ require ‘dbd/jdbc’ require ‘jdbc/sqlite3’ databasefile = ‘test.db’ dbh = DBI.connect( “DBI:jdbc:sqlite:#{databasefile}”, # connection string ”, # no username for sqlite3 ”, #…

Comments closed

How good is our relevancy ranking?

For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important — and, in many ways, most opaque — questions is, “How good is our relevancy ranking?” Research from the UMich Library’s Usability Group (pdf; 600k) points to the importance of relevancy ranking  for both known-item searches and discovery, but mapping search terms to the “best” results involves crawling deep inside the searcher’s head to know what she’s looking for. So, what can we do? Record interaction as a way of showing interest One possibility is to look at those…

Comments closed

Ruby gem library_stdnums goes to version 1.0

I just released another (this time pretty good) version of my gem for normalizing/validating library standard numbers, library_stdnums (github source / docs). The short version of the functions available: ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit) ISSN: get checkdigit, validate, normalize LCCN: validate, normalize Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically valid. My plan in my Copious Free Time is to do a Java version of these as well and then stick them into a new-style Solr v.3 filter so…

Comments closed

A short ruby diversion: cost of flow control under Ruby

A couple days ago I decided to finally get back to working on threach to try to deal with problems it had — essentially, it didn’t deal well with non-local exits due to calls to break or even something simple like a NoMethodError. [BTW, I think I managed it. As near as I can tell, threach version 0.4 won’t deadlock anymore] Along the way, while trying to figure out how threads affect the behavior of different non-local exits, I noticed that in some cases there was still work being done by one or more threads long after there was an…

Comments closed

ISBN parenthetical notes: Bad MARC data #1

Yesterday, I gave a brief overview of why free text is hard to deal with. Today, I’m turning my attention to a concrete example that drives me absolutely batshit crazy: taking a perfectly good unique-id field (in this case, the ISBN in the 020) and appending stuff onto the end of it. The point is not to mock anything. Mocking will, however, be included for free. What’s supposed to be in the 020? Well, for starters, an ISBN (10 or 13 digit, we’re not picky). Let’s not worry, for the moment, about the actual ISBN and whether it’s valid or…

Comments closed

Why programmers hate free text in MARC records

One of the frustrating things about dealing with MARC (nee AACR2) data is how much nonsense is stored in free text when a unique identifier in a well-defined place would have done a much better job. A lot of people seem to not understand why. This post, then, is for all the catalogers out there who constantly answer my questions with, “Well, it depends” and don’t understand why that’s a problem. Description vs Findability I’m surprised — and a little dismayed — by how often I talk to people in the library world who don’t understand the difference between description…

Comments closed

Corrected Code4Lib slides are up

…at the same URL. I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong. I presented a bunch of statistics drawn from nearly a year of Mirlyn logs. The most outlandish of my assertions, and the one that eventually turned out to be the most incorrect, was that some 45% of all our user sessions consist of only one action: a search. Unfortunately, I’d missed a whole swath of things I should have…

Comments closed

[RETRACTED] Code4Lib 2011 Lightning Talk Slides

DANGER! I was trying to re-verify my numbers and found a glaring and hugely important mistake. I’ll make a new post with the details, but basically I was counting about 180k sessions (out of only 735k) that I should have been ignoring. Please ignore my basic stats until further notice. See the new numbers and corrected slides for more accurate data. I did a little Lightning Talk at Code4Lib 2011 and cleaned up (and heavily annotated) my slides for anyone interested in them. The focus was on some basic stats about usage of our OPAC, Mirlyn, in calendar 2010. I’ll…

Comments closed

Four things I hate about Ruby

Don’t get me wrong. I use ruby as my default language when possible. I love JRuby in a way that’s illegal in most states. But there are…issues. There are with any language and the associated environment. These are the ones that bug the crap out of me. Ruby is slow. Let’s get this one out of the way right away. Ruby (at least the MRI 1.8.x implementation) is, for many things, slow. Sometimes not much slower. Sometimes (e.g., numerics) a hell of a lot slower. Now, there’s nothing necessarily wrong with that. For what I do, MRI Ruby is usually…

Comments closed