Category Archives: Uncategorized

Pushing MARC to Solr; processing times and threading and such

[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.]

What’s the question?

The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to [...]

ruby-marc with pluggable readers

I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this:

  require 'marc'   require 'my_marc_stuff'     mbreader = MARC::Reader.new('test.mrc') # => Stock marc binary reader   mbreader = MARC::Reader.new('test.mrc' :readertype=>:marcstrict) # => ditto     MARC::Reader.register_parser(My::MARC::Parser, :marcstrict)   mbreader = MARC::Reader.new('test.mrc') # => Uses My::MARC::Parser now     xmlreader [...]

New interest in MARC-HASH / JSON

For reasons I’m still not entirely clear on (I wasn’t there), the Code4Lib 2010 conference this week inspired renewed interest in a JSON-based format for MARC data.

When I initially looked at MARC-HASH almost a year ago, I was mostly looking for something that wasn’t such a pain in the butt to work with, something that [...]

OCLC still not (NO! They are!) normalizing their LCCNs

NOTE 2: It turns out that I did find a minor bug in the system, but that in general LCCN normalization is working correctly. I just happened to hit a weirdness with a bad LCCN and a little bug in the parser on their end. Which is getting fixed. So…good news all around, and huge [...]

Indexing data into Solr via JRuby (with threads!)

[Note: in this post I'm just going to focus on the "get stuff into Solr" part. My normal focus -- MARC data -- will make an appearance in the next post when I talk about using this in addition to / instead of solrmarc.]

Working with Solr

I love me the Solr. I love everything about it except [...]

jruby_producer_consumer dead-simple producer/consumer for JRuby

Yea! My first gem ever released!

[YUCK! It was a disaster in a few ways! Don't look at this! It's hideous! There's a new jruby_producer_consumer gem on gemcutter that is slightly different from this in that it works. Ignore the stuff below.]

[In working on a threaded JRuby-based MARC-to-Solr project, I realized that my threading stuff was...ugly. [...]

Still another look at MARC parsing in ruby and jruby

I’ve been looking at making a jruby-based solr indexer for MARC documents, and started off wanting to make sure I could determine if anything I did would be faster than our existing (solrmarc-based) setup.

Assertion: The upper bound on how fast I can process records and send them to Solr can be approximated by looking [...]

Beta version of the HathiTrust Volumes API available

MAJOR CHANGE

So, initially, this post listed that the way to separate multiple simultaneous requests was with a nice, URL-like slash (/) character.

Then, I remembered that LCCNs can have embedded slashes, e.g., 65063380//r85.

So, we’re back to using pipe (|) characters to separate multiple calls — the examples below have been updated to reflect this.

Introduction

I’ve put up [...]

Running Blacklight under JRuby

I decided to see if I could get Blacklight working under JRuby, starting with running the test suite and working my way up from there.

There was much pain. Much, much pain. Exacerbated by my almost complete lack of knowledge about what I was doing.

This is the procedure I eventually arrived at — if there are places [...]

Setting up your OPAC for Zotero support using unAPI

unAPI is a very simple protocol to let a machine know what other formats a document is available in. Zotero is a bibliographic management tool (like Endnote or Refworks) that operates as a Firefox plugin. And it speaks unAPI.

Let’s get them to play nice with each other!

How’s it all work?

Zotero looks for a well-constructed <link> [...]