Skip to content

Year: 2014

How good/bad is MARC data? The case of place-of-publication

I complain a lot about the MARC format, the way people put data in MARC records, the actual data themselves I find in MARC records, the inexplicably complex syntax for identifiers and, ironically, attempts to replace MARC with something else. One nice little beacon of hope was when I found that only roughly 0.26% of the ISBNs in the UMich catalog have invalid checksums. That’s not bad at all, and it’s worth digging into other things about which I might be likely to complain before I make a fool of myself. [Note: there will be some complaining at the end.…

Comments closed

Ruby MARC serialization/deserialization revisited

A few years ago, I benchmarked various methods of serializing/deserialzing MARC data using the ruby-marc gem. Given that I’m planning on starting fresh with my catalog setup, I thought I’d take a moment to revisit them. The biggest changes since that time have been (a) the continued speed improvements in JRuby, (b) the introduction of the Oj json parser for MRI ruby, and (c) wider availability of msgpack code in the wild. I also wondered what would happen if I tried ruby’s Marshal serialization; maybe it would be faster because I wouldn’t have to "manually" create a MARC::Record object from…

Comments closed

Schemaless” solr with dynamicField and copyField

[Holy Kamoly, it’s been a long time since I blogged!] Recent versions of solr have the option to run in what they call "schemaless mode", wherein fields that aren’t recognized are actually added, automatically, to the schema as real named fields. I find this intruguing, but it’s not what I’m after right now. The problem I’m in the first stages of addressing is that my schema.xml is huge mess — very little consistency, no naming conventions dictating what’s stored/indexed, etc. It grew "ogranically" (which is what I say when I mean I’ve been lazy and sloppy) and needs a full-on…

Comments closed

Help me test yet another LC Callnumber parser

Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers. They’re a freakin’ nightmare. They just are. But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions. The results, so far, aren’t too bad. The gem is called lc_callnumber, but more importantly, I’ve put together a little heroku app to let you play with it, and then correct…

Comments closed