Librarian code – Robot Librarian

2015-02-19 / Bill Dueber

Reintroducing Traject: Traject 2.0

Traject 2.0.0 released! Now runs under MRI/RBX! traject is an ETL (extract/transform/load) system written in ruby with a special view towards extracting fields from MARC data and writing it out into Solr. [Jonathan Rochkind] (http://bibwild.wordpress.com) and I wrote this primarily out of frustration using other tools in this space (e.g., Solrmarc, or my own precursor to traject , marc2solr Note: Catmandu is another, perl-based system I don’t have any direct experience with. traject had its first release almost a year and a half ago (at least based on the date of my post introducting it), and I’ve used it literally…

Comments closed

2013-10-14 / Bill Dueber

Announcing “traject” indexing software

[Over the next few days I’ll be writing a series of posts that highlight a new indexing solution by Jonathan Rochkind and myself called traject that we’re using to index MARC data into Solr. This is the introduction.] Wow. Six months since I posted here. What have I been doing? Well, mostly parenting, but in the last few weeks I was lucky enough to get on board with a project started by Jonathan Rochkind for a new JRuby-based tool optimized for indexing MARC data into solr. You know, kinda like solrmarc, but JRuby. What’s it look like? I encourage you…

Comments closed

2011-05-06 / Bill Dueber

Ruby gem library_stdnums goes to version 1.0

I just released another (this time pretty good) version of my gem for normalizing/validating library standard numbers, library_stdnums (github source / docs). The short version of the functions available: ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit) ISSN: get checkdigit, validate, normalize LCCN: validate, normalize Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically valid. My plan in my Copious Free Time is to do a Java version of these as well and then stick them into a new-style Solr v.3 filter so…

Comments closed

2010-09-13 / Bill Dueber

Simple Ruby gem for dealing with ISBN/ISSN/LCCN

I needed some code to deal with ISBN10->ISBN13 conversion, so I put in a few other functions and wrapped it all up in a gem called library_stdnums. It’s only 100 lines of code or so and some specs, but I put it out there in case others want to use it or add to it. Pull requests at the github repo are welcome. Functionality is all as module functions, as follows: ISBN char = StdNum::ISBN.checkdigit(ten-or-thirteen-digit-isbn) boolean = StdNum::ISBN.valid?(ten-or-thirteen-digit-isbn) thirteenDigitISBN = StdNum::ISBN.convert_to_13(ten-or-thirteen-digit-isbn) tenDigitISBN = StdNum::ISBN.convert_to_10(ten-or-thirteen-digit-isbn) ISSN char = StdNum::ISSN.checkdigit(issn) boolean = StdNum::ISSN.valid?(issn) LCCN normalizedLCCN = StdNum::LCCN.normalize(lccn) Again, there’s nothing special here…

Comments closed

2010-03-02 / Bill Dueber

ruby-marc with pluggable readers

I’ve been messing with easier ways of adding parsers to ruby-marc’s MARC::Reader object. The idea is that you can do this: require ‘marc’ require ‘my_marc_stuff’ mbreader = MARC::Reader.new(‘test.mrc’) # => Stock marc binary reader mbreader = MARC::Reader.new(‘test.mrc’ :readertype=>:marcstrict) # => ditto MARC::Reader.register_parser(My::MARC::Parser, :marcstrict) mbreader = MARC::Reader.new(‘test.mrc’) # => Uses My::MARC::Parser now xmlreader = MARC::Reader.new(‘test.xml’, :readertype=>:marcxml) # …and maybe further on down the road asreader = MARC::Reader.new(‘test.seq’, :readertype=>:alephsequential) mjreader = MARC::Reader.new(‘test.json’, :readertype=>:marchashjson) A parser need only implement #each and a module-level method #decode_from_string. Read all about it on the github page.

Comments closed

Tag: Librarian code

Reintroducing Traject: Traject 2.0

Announcing “traject” indexing software

Ruby gem library_stdnums goes to version 1.0

Simple Ruby gem for dealing with ISBN/ISSN/LCCN

ruby-marc with pluggable readers