[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.]
What’s the question?
The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to [...]
I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this:
require 'marc'
require 'my_marc_stuff'
mbreader = MARC::Reader.new('test.mrc') # => Stock marc binary reader
mbreader = MARC::Reader.new('test.mrc' :readertype=>:marcstrict) # => ditto
MARC::Reader.register_parser(My::MARC::Parser, :marcstrict)
mbreader = MARC::Reader.new('test.mrc') # => Uses My::MARC::Parser now
xmlreader [...]
February 26, 2010 – 12:29 am
For reasons I’m still not entirely clear on (I wasn’t there), the Code4Lib 2010 conference this week inspired renewed interest in a JSON-based format for MARC data.
When I initially looked at MARC-HASH almost a year ago, I was mostly looking for something that wasn’t such a pain in the butt to work with, something that [...]
February 18, 2010 – 10:58 am
NOTE 2: It turns out that I did find a minor bug in the system, but that in general LCCN normalization is working correctly. I just happened to hit a weirdness with a bad LCCN and a little bug in the parser on their end. Which is getting fixed. So…good news all around, and huge [...]
February 16, 2010 – 3:43 pm
[Note: in this post I'm just going to focus on the "get stuff into Solr" part. My normal focus -- MARC data -- will
make an appearance in the next post when I talk about using this in addition to / instead of solrmarc.]
Working with Solr
I love me the Solr. I love everything about it except [...]
February 5, 2010 – 3:46 pm
Yea! My first gem ever released!
[YUCK! It was a disaster in a few ways! Don't look at this! It's hideous! There's a new jruby_producer_consumer gem on gemcutter that is slightly different from this in that it works. Ignore the stuff below.]
[In working on a threaded JRuby-based MARC-to-Solr project, I realized that my threading stuff was...ugly. [...]
January 29, 2010 – 11:51 am
I’ve been looking at making a jruby-based solr indexer for MARC documents, and started off wanting to make sure I could determine if anything I did would be faster than our existing (solrmarc-based) setup.
Assertion: The upper bound on how fast I can process records and send them to Solr can be approximated by looking [...]
December 15, 2009 – 2:48 pm
MAJOR CHANGE
So, initially, this post listed that the way to separate multiple simultaneous requests was with a nice, URL-like slash (/) character.
Then, I remembered that LCCNs can have embedded slashes, e.g., 65063380//r85.
So, we’re back to using pipe (|) characters to separate multiple calls — the examples below have been updated to reflect this.
Introduction
I’ve put up [...]
November 17, 2009 – 11:35 pm
I decided to see if I could get Blacklight working under JRuby, starting with running the test suite and working my way up from there.
There was much pain. Much, much pain. Exacerbated by my almost complete
lack of knowledge about what I was doing.
This is the procedure I eventually arrived at — if there are places [...]
November 6, 2009 – 4:43 pm
unAPI is a very simple protocol to let a machine know what other formats a document is available in. Zotero is a bibliographic management tool (like Endnote or Refworks) that operates as a Firefox plugin. And it speaks unAPI.
Let’s get them to play nice with each other!
How’s it all work?
Zotero looks for a well-constructed <link> [...]