Category Archives: Uncategorized

Why RDA is doomed to failure

[Note: edited for clarity thanks to rsinger's comment, below]

Doomed, I say! DOOOOOOOOOOMMMMMMMED!

My reasoning is simple: RDA will fail because it’s not “better enough.”

Now, those of you who know me might be saying to yourselves, “Waitjustaminute. Bill doesn’t know anything at all about cataloging, or semantic representations, or the relative merits of various encapsulations of bibliographic [...]

Data structures and Serializations

Jonathan Rochkind, in response to a long (and, IMHO, mostly ridiculous) thread on NGC4Lib, has been exploring the boundaries between a data model and its expression/serialization ( see here, here, and here ) and I thought I’d jump in.

What this post is not

There’s a lot to be said about a good domain model for bibliographic data. I’m [...]

Stupid catalog tricks: Subject Headings and the Long Tail

Library of Congress Subject Headings (LCSH) in particular.

I’ve always been down on LCSH because I don’t understand them. They kinda look like a hierarchy, but they’re not really. Things get modifiers. Geography is inline and …weird.

And, of course, in our faceting catalog when you click on a linked LCSH to do an automatic search, you [...]

Why bother with threading in jruby? Because it’s easy.

Lately on the #code4lib IRC channel, several of us have been knocking around different versions (in several programming languages) of programs to read in a ginormous file and do some processing on each line. I noted some speedups related to multi-threading, and someone (maybe rsinger?) said, basically, that to bother with threading for a one-off [...]

Pushing MARC to Solr; processing times and threading and such

[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.]

What’s the question?

The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to [...]

ruby-marc with pluggable readers

I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this:

  require 'marc'   require 'my_marc_stuff'     mbreader = MARC::Reader.new('test.mrc') # => Stock marc binary reader   mbreader = MARC::Reader.new('test.mrc' :readertype=>:marcstrict) # => ditto     MARC::Reader.register_parser(My::MARC::Parser, :marcstrict)   mbreader = MARC::Reader.new('test.mrc') # => Uses My::MARC::Parser now     xmlreader [...]

New interest in MARC-HASH / JSON

For reasons I’m still not entirely clear on (I wasn’t there), the Code4Lib 2010 conference this week inspired renewed interest in a JSON-based format for MARC data.

When I initially looked at MARC-HASH almost a year ago, I was mostly looking for something that wasn’t such a pain in the butt to work with, something that [...]

OCLC still not (NO! They are!) normalizing their LCCNs

NOTE 2: It turns out that I did find a minor bug in the system, but that in general LCCN normalization is working correctly. I just happened to hit a weirdness with a bad LCCN and a little bug in the parser on their end. Which is getting fixed. So…good news all around, and huge [...]

Indexing data into Solr via JRuby (with threads!)

[Note: in this post I'm just going to focus on the "get stuff into Solr" part. My normal focus -- MARC data -- will make an appearance in the next post when I talk about using this in addition to / instead of solrmarc.]

Working with Solr

I love me the Solr. I love everything about it except [...]

jruby_producer_consumer dead-simple producer/consumer for JRuby

Yea! My first gem ever released!

[YUCK! It was a disaster in a few ways! Don't look at this! It's hideous! There's a new jruby_producer_consumer gem on gemcutter that is slightly different from this in that it works. Ignore the stuff below.]

[In working on a threaded JRuby-based MARC-to-Solr project, I realized that my threading stuff was...ugly. [...]