Skip to content

Month: August 2009

Building a solr text filter for normalizing data

[Kind of part of a continuing series on our VUFind implementation; more of a sidebar, really.] In my last post I made the case that you should put as much data normalization into Solr as possible. The built-in text filters will get you a long, long way, but sometimes you want to have specialized code, and then you need to build your own filter. Huge Disclaimer: I’m putting this up not because I’m the best person to do so, but because it doesn’t look as if anyone else has. I don’t know what I’m doing. I don’t know why the…

Comments closed

Going with and “forking” VUFind

Note: This is the second in a series I’m doing about our VUFind installation, Mirlyn. Here I talk about how we got to where we are. Next I’ll start looking at specific technologies, how we solved various problems, and generally more nerd-centered stuff. When the University Library decided to go down the path of an open-source, solr-based OPAC, there were (and are, I guess) two big players: VUFind and Blacklight. I wasn’t involved in the decision, but it must have seemed like a no-brainer. VUFind was in production (at Villanova), seemed to be building a community of similar institutions around…

Comments closed

Easy Solr types for library data

[Yet another bit in a series about our Vufind installation] While I’m no longer shocked at the terrible state of our data every single day, I’m still shocked pretty often. We figured out pretty quickly that anything we could do to normalize data as it went into the Solr index (and, in fact, as queries were produced) would be a huge win. There’s a continuum of attitudes about how much “business logic” belongs in the database layer of any application. Some folks — including super-high throughput sites, but mostly people who have never used anything by MySQL — tend to…

Comments closed

Sending unicode email headers in PHP

I’m probably the last guy on earth to know this, but I’m recording it here just in case. I’m sending record titles in the subject line of emails, and of course they may be unicode. The body takes care of itself, but you need to explicitly encode a header like “Subject.” $headers[‘To’] = $to; $headers[‘From’] = $from; $headers[‘Content-Type’] = “text/plain; charset=utf-8”; $headers[‘Content-Transfer-Encoding’] = “8bit”; $b64subject = “=?UTF-8?B?” . base64_encode($subject) . “?=”; $headers[‘Subject’] = $b64subject; $mail =& Mail::factory(‘sendmail’, array(‘host’ => $host, ‘port’=>$port)); $retval = $mail->send($to, $headers, $body);

Comments closed

Rolling out UMich’s “VUFind”: Introduction and New Features

For the last few months, I’ve been working on rolling out a ridiculous-modified version of Vufind, which we just launched as our primary OPAC, Mirlyn, with a slightly-different version powering catalog.hathitrust.org, a temporary metadata search on the HathiTrust data until the OCLC takes it over at some undetermined date. (Yeah, the HathiTrust site is a lot better looking.) [Our Aleph-based catalog lives on at mirlyn-classic) — I’ll be interested to see how the traffic on the two differs as time goes on.] I’m going to spend a few posts talking about how and why we essentially forked vufind, what sorts…

Comments closed