Skip to content

Month: September 2009

An exercise in Solr and DataImportHandler: HathiTrust data

Many of the folks who read this blog (hi, both of you! Mom, say hello to Dad!) are aware, at least tangentially, of the HathiTrust. Currently hosted by us at the University of Michigan, the most public interface to its data is a VuFind installation you can access at (or, for you smart-phone types, at Once you do a metadata search, you get links into the actual page images or a chance to search the fulltext of the selected item (depending on its copyright status). It’s awesome. Seriously. Even in the absence of fulltext, being able to search…

Comments closed

Dead-easy (but extreme) AJAX logging in our VuFind install

One of the advantages of having complete control over the OPAC is that I change things pretty easily. The downside of that is that we need to know what to change. Many of you that work in libraries may have noticed that data are not necessarily the primary tool in decision-making. Or, say, even a part of the process. Or even thought about hard. Or even considered. For many decisions I see going on in the library world, the primary motivator is the anecdote. In fact, to be honest, the primary driver is the faculty anecdote. Those cliched three curmudgeonly…

Comments closed

The sad truths about journal bundle prices

[Notes taken during a talk today, Ted Bergstrom: “Some Economics of Saying Nix To Big Deals and the Terrible Fix”. My own thoughts are interspersed throughout; please don’t automatically ascribe everything to Dr. Bergstrom. Check out his stuff at Ted Bergstrom’s home page.] Journals are a weird market — libraries buy as agents of professors, using someone else’s money, in deals of enormous complexity and uncertain value from companies that basically have a monopoly. Similar to a few other situations: doctors prescribe drugs for patients using insurance money. Professors assign textbooks to students whose parents (in general) buy them. In…

Comments closed

More Ruby MARC Benchmarks: Adding in MARC-XML

It turns out that UVA’s reluctance to use the raw MARC data on the search results screen is driven more by processing time than parsing time. Even if they were to start with a fully-parsed MARC object, they’re doing enough screwing around with that data that the bottleneck on their end appears to be all the regex and string processing, not the parsing. Their specs for what gets displayed are complex enough that they want to do the work up-front. But I remain interested, at least partially because of the reason UVA is using MARC-XML: they have MARC records too…

Comments closed

Benchmarking MARC record parsing in Ruby

[Note: since I started writing this, I found out Bess & Co. store MARC-XML. That makes a difference, since XML in Ruby can be really, really slow] [UPADTE It turns out they don’t use MARC-XML. They use MARC-Binary just like the rest of us. Oops. ] [UP-UPDATE Well, no, they do use MARC-XML. I’m not afraid to constantly change my story. This is why I’m the best investigative reporter in the business] The other day on the blacklight mailing list, Bess Sadler wrote Yes, we do still include the full marc record, but the rule of thumb we’re currently using…

Comments closed