Bill Dueber

2017 September 1

[tl;dr: I tried Evernote and never really liked it. I’m now using Bear]

I tried to get on the Evernote train.

I really did. I installed it god-knows-how-many times, and would put stuff in it and pay my (then much lower) annual fee and put a bunch of more crap in it and then, sooner or later, pitter out and not bother with it and not be able to find stuff in it and give up.

I choose to view this as a technology problem and not a personal failing because I’m the one choosing, and I’m going to choose “technology problem.”

2017 August 17

In most contexts that I deal with, saying librarian is a convenient shorthand for someone with an MLS/MIS who works in a library or similar place.

Despite that, I used to officially be a librarian. Kind of. Maybe.

2017 August 11

It’s now been more than two and half years since I blogged anything. During that time I’ve switched jobs, taken on different responsibilities, watched my kids grow, and put on a few pounds. I also got new glasses, but nobody ever says anything when I get new glasses, or a new haircut, and geez, would it kill you to notice these things and tell me I look nice once in a while?

What I haven’t done much of is write.

2015 February 19

traject is an ETL (extract/transform/load) system written in ruby with a special view towards extracting fields from MARC data and writing it out into Solr. Jonathan Rochkind and I wrote this primarily out of frustration using other tools in this space (e.g., Solrmarc, or my own precursor to traject , marc2solr[1]).

traject had its first release almost a year and a half ago (at least based on the date of my post introducting it), and I’ve used it literally every day since then indexing data for the Univeristy of Michigan and HathiTrust library catalogs.

2014 November 10
I complain a lot about the MARC format, the way people put data in MARC records, the actual data themselves I find in MARC records, the inexplicably complex syntax for identifiers and, ironically, attempts to replace MARC with something else. One nice little beacon of hope was when I found that only roughly 0.26% of the ISBNs in the UMich catalog have invalid checksums. That's not bad at all, and it's worth digging into other things about which I might be likely to complain before I make a fool of myself.
2014 October 9
A few years ago, I benchmarked various methods of serializing/deserialzing MARC data using the ruby-marc gem. Given that I'm planning on starting fresh with my catalog setup, I thought I'd take a moment to revisit them. The biggest changes since that time have been (a) the continued speed improvements in JRuby, (b) the introduction of the Oj json parser for MRI ruby, and © wider availability of msgpack code in the wild.
2014 October 6
[Holy Kamoly, it's been a long time since I blogged!] Recent versions of solr have the option to run in what they call "schemaless mode", wherein fields that aren't recognized are actually added, automatically, to the schema as real named fields. I find this intruguing, but it's not what I'm after right now. The problem I'm in the first stages of addressing is that my schema.xml is huge mess – very little consistency, no naming conventions dictating what's stored/indexed, etc.
2014 January 30
Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers. They're a freakin' nightmare. They just are. But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions. The results, so far, aren't too bad.
2013 December 17
A while back, Dreamhost had some problems and my blog and assorted other websites I help keep track of went down. For more than two weeks. Now, I understand that crap happens. And I understand that sometimes lots of things happen at once. But fundamentally, their infrastructure is such that they could lose everything on a machine and be unable to get it back for more than two weeks. I'm not a mathematician, but that's not "five-nine" service.
2013 October 14
[Over the next few days I'll be writing a series of posts that highlight a new indexing solution by Jonathan Rochkind and myself called traject that we're using to index MARC data into Solr. This is the introduction.] Wow. Six months since I posted here. What have I been doing? Well, mostly parenting, but in the last few weeks I was lucky enough to get on board with a project started by Jonathan Rochkind for a new JRuby-based tool optimized for indexing MARC data into solr.