Lately on the #code4lib IRC channel, several of us have been knocking around different versions (in several programming languages) of programs to read in a ginormous file and do some processing on each line. I noted some speedups related to multi-threading, and someone (maybe rsinger?) said, basically, that to bother with threading for a one-off [...]
Monthly Archives: March 2010
Pushing MARC to Solr; processing times and threading and such
[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.]
What’s the question?
The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to [...]
ruby-marc with pluggable readers
I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this:
require 'marc' require 'my_marc_stuff' mbreader = MARC::Reader.new('test.mrc') # => Stock marc binary reader mbreader = MARC::Reader.new('test.mrc' :readertype=>:marcstrict) # => ditto MARC::Reader.register_parser(My::MARC::Parser, :marcstrict) mbreader = MARC::Reader.new('test.mrc') # => Uses My::MARC::Parser now xmlreader [...]