• Ruby MARC serialization/deserialization revisited

    A few years ago, I benchmarked various methods of serializing/deserialzing MARC data using the ruby-marc gem. Given that I’m planning on starting fresh with my catalog setup, I thought I’d take a moment to revisit them.

    The biggest changes since that time have been (a) the continued speed improvements in JRuby, (b) the introduction of the Oj json parser for MRI ruby, and (c) wider availability of msgpack code in the wild.

    ... <more>
  • "Schemaless" solr with dynamicField and copyField

    [Holy Kamoly, it’s been a long time since I blogged!]

    Recent versions of solr have the option to run in what they call “schemaless mode”, wherein fields that aren’t recognized are actually added, automatically, to the schema as real named fields.

    I find this intruguing, but it’s not what I’m after right now.

    The problem I’m in the first stages of addressing is that my schema.xml is huge mess – very little consistency,... <more>

  • Help me test yet another LC Callnumber parser

    Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers.

    They’re a freakin’ nightmare. They just are.

    But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions.

    The results, so... <more>

  • New blog front- and back-end

    A while back, Dreamhost had some problems and my blog and assorted other websites I help keep track of went down.

    For more than two weeks.

    Now, I understand that crap happens. And I understand that sometimes lots of things happen at once. But fundamentally, their infrastructure is such that they could lose everything on a machine and be unable to get it back for more than two weeks. I’m not a mathematician, but... <more>

  • Announcing "traject" indexing software

    [Over the next few days I’ll be writing a series of posts that highlight a new indexing solution by Jonathan Rochkind and myself called traject that we’re using to index MARC data into Solr. This is the introduction.]

    Wow. Six months since I posted here. What have I been doing?

    Well, mostly parenting, but in the last few weeks I was lucky enough to get on board with a project started by Jonathan Rochkind... <more>

  • Come work at the University of Michigan

    The Library has three UX positions available right now – interface designer, interface developer, and a web content strategist.

    Come join me at what is easily the best place I’ve ever worked! Full details are over at Suz’s blog.

  • Please: don't return your books

    So, I’m at code4lib 2013 right now, where side conversations and informal exchanges tend to be the most interesting part.

    Last night I had an conversation with the inimitable Michael B. Klein, and after complaining about faculty members that keep books out for decades at a time, we ended up asking a simple question:

    How much more shelving would we need if everyone returned their books?

    Assuming we could get them... <more>

  • Boosting on Exactish (anchored) phrase matching in Solr: (SST #4)

    Check out introduction to the Stupid Solr Tricks series if you’re just joining us.]

    Exact matching in Solr is easy. Use the default string type: all it does is, essentially, exact phrase matching. string is a great type for faceted values, where the only way we expect to search the index is via text pulled from the index itself. Query the index to get a value: use that value to re-query the... <more>

  • Requiring/Preferring searches that don't span multiple values (SST #3)

    Check out introduction to the Stupid Solr Tricks series if you’re just joining us.]

    Solr and multiValued fields

    Here’s another thing you need to understand about Solr: it doesn’t really have fields that can take multiple values.

    “But Bill,” you’re saying, “sure it does. I mean, hell, it even has a ‘multiValued’ parameter.”

    First off: watch your language.

    Second off: are you sure?

    Let’s do a quick test. Look at the... <more>

  • Using localparams in Solr (or, how to boost records that contain all terms) (SST #2)

    [Note: this isn’t so much a Stupid Solr Trick as a Thing You Should Probably Know; consider it required reading for the next SST. If you’re just joining us, check out the introduction to the Stupid Solr Tricks series]

    What the heck is a localparams query?

    A garden-variety Solr query URL looks something like this:

     http://localhost:8983/solr/select? defType=dismax &amp;qf=name^2 place^1 &amp;q=Dueber 

    Which is fine, as far as it goes. But it’s easy... <more>