Why do I ever, ever think that MARC might not rely on order? I don’t know. In any case, control fields will now be just an array of duples: control: [ [‘001’, ‘value of the 001’], [‘006’, ‘value of the 006’] [‘006’, ‘another 006’] }
Comments closedAuthor: Bill Dueber
MARC-Hash: a proposed format for JSON/YAML/Whatever-compatible MARC records
In my first shot at MARC-in-JSON, which I appropriately (and prematurely) named MARC-JSON, I made a point of losing round-tripability (to and from MARC) in order to end up with a nice, easy-to-work-with data structure based mostly on hashes. “Who really cares what order the subfields come in?” I asked myself. Well, of course, it turns out some people do. Some even care about the order of the tags. “Only in the 500s…usually” I was told today. All my lovely dreams of using easy-to-access hashes up in so much smoke. So…I’m suggesting we try something a little simpler. Something so…
Comments closedA plea: use Solr to normalize your data
[Only, of course, if you’re using Solr. Otherwise, that’d be dumb.] We’ve been working on Mirlyn2-Beta, our installation of VuFind for some time now (don’t let the fancy-pants name scare you off), and the further we get into it, the more obvious it is that I want to move as much data normalization into Solr itself as possible. Arguments about how much business logic to move into the database layer, in the form of foreign-key requirements, cascading inserts and deletes, stored procedures, etc. are as old as the features themselves. Solid arguments for and against are made on all sides,…
Comments closedEnough with the freakin’ LC Call Number normalization!
OK. I’m done with it, and this time I mean it. I’ve updated and improved the lc normalization code, documented the algorithm, and put it all into Google Code. In the next couple weeks, I’ll be turning it into a Solr text filter so we can do some decent sorting on call-number search results.
Comments closedAsk, and you shall receive, and it shall be AWESOME!
The good folks at ticTocs heard the call for open data, and they responded…exactly as I asked them to. Which makes me think I should have asked for a pony, too, but I’m still very, very happy! Anyone can now download a simple tab-delimited text file describing all the journal table of contents RSS files they’ve assembled, for use however anyone wants. The data include issns and eissns (where available), the title of the journal, and of course the URL of the RSS/Atom/Whatever feed. The feeds themselves are all over the map — it’s whatever the publisher decides to provide,…
Comments closedTicTocs: Give us a file! Pretty pretty pretty please!
For those who haven’t heard, ticTOCs is a service that provides web-based access to a database of Journal RSS/Atom Table of Contents feeds. Awesome. In their blog at News from TicTocs, a post titled I want to be completely honest with you about ticTOCs notes that: As for the API – yes, we’ve been asked this several times, and the answer is that it is currently being written and should be available very soon. That’s great, but writing in a comment on that post (after logging in with a very, very old OpenID — I used to have a blog named…
Comments closedFive rules to make your open source more open
[I’ve noticed that a sure way to get people to look at stuff (as measured by, say, digg) is to include a number. So I did. Five. ] Over at Bibliographic Wilderness, Jonathan Rothkind has a great followup to an ongoing discussion on the Blacklight list called How to build shared open source in which he tackles some of the differences between open-sourcing your code (a legal and distribution issue) and actually making it so someone else can usefully contribute to your code. The project I’m spending most of my time on right now, VUFind, is a great piece of…
Comments closedAnd then I finally shut the hell up
I had a great — great! I tell you — 30 second conversation with Ken Varnum (of RSS4Lib fame) that went something like this (much paraphrasing, obviously): B: You’re gonna have to fix that interface. The standard header won’t work. K: Well, no, we’re going leave it as it is. B: It’s not gonna work. K: We’ve decided to make it all consistent. B: OK, you can keep saying that, but I’m really, really smart and I say users are going to be confused. K: We’ve done user testing. They weren’t confused. And here’s our plan to see if they…
Comments closedNormalizing LoC Call Numbers for sorting
Updated: I missed a ‘?’ in the original code that pushed a single cutter into the second-cutter position. Fixed below. Crap. Update 2: Initial letters can be three characters long. Regexp and output changed. LoC Call numbers tend to be a mess, and I’ve been working this morning trying to normalize them for easy string comparison. The perl function below takes a call number (with some level of sloppiness) and returns a string suitable for comparisons with other strings returned by the function. It outputs stuff like this: E E 0000.0000 0000 0000 E 184 .A1 G78 E 0184.0000A 1000G…
Comments closedHow to rig an election
No matter where I’ve gone today and for the past few days, I keep running into people (on both sides) who are sure that if Their Guy Doesn’t Win, it’s going to be because of dirty tactics. I’m not an expert in this stuff. Not by a long shot. But I thought it would be fun to work out, for my own benefit, types of election fraud and what to really worry about. Note that how you might interpret all of this really depends on what you consider the greater evil: a vote cast that shouldn’t have been, or a…
Comments closedWanted: a better proxy server
We in the library world have a problem. We spend a zillion-with-a-Z dollars subscribing to online databases, purchases which presume our ability to make sure only authorized people can look at them. The alternative is to be in breach of contract law, which I’ve been assured is something we’d like to avoid. The problem I see is this: The limitations of our proxy server software restrict how we can write contracts with our vendors. The standard approach is to define two types of access: By IP address. The person is sitting in front of the right computer (or has hooked…
Comments closedPlanet Code4Lib in a snapshot
Inspired by the Inquiring Librarian, I just used Wordle to create a “tagcloud” of the current [Planet Code4Lib]() feed. What kills me is the tiny little “Library” in the lower left-hand corner. <a href=”http://wordle.net/gallery/wrdl/55861/Planet_Code4lib” title=”Wordle: Planet Code4lib”><img alt=”” src=”http://wordle.net/thumb/wrdl/55861/Planet_Code4lib” style=”padding:4px;border:1px solid #ddd” />
Comments closedIntuition-based librarianship?
Not long after I started working in the library, I heard someone talking about “Evidence Based Librarianship.” Like the good little kind-of-a-librarian I’d become, I looked it up and found this article which states that: EBL employs the best available evidence based upon library science research to arrive at sound decisions about solving practical problems in librarianship. My immediate response was, of course, What the $#!&% is everyone else doing? The sad truth, of course, is that in general folks working in libraries do not use the “best evidence” based on “library science research” because, like many of the practitioners…
Comments closedThe friend of my enemy’s friend’s enemy’s…err…
Move over, Axis of Evil! Our 43rd president, George W. Bush (and you gotta know that his dad hangs on to that ‘H.’ with two white-knuckled hands) is now in search of “the surest way to defeat the enemies of hatred.” Of course, we’re the best of friends with hatred here at Robot Librarian, so we should be safe.
Comments closedGoogle Doctype — open documentation, open code
Because you can never have too many open encyclopedia-type-thingies, Google has launched Google Doctype, a “Google-sponsored open encyclopedia and reference library for developers of web applications. By web developers, for web developers.” It’s set up to use an open license (Creative Commons Attribution 3.0 Unported License) and, unlike other similar resources, is explicitly set up to include code for testing and browser-compatibility tables generated by running that code against different browsers. Simple, direct… what’s not to like?
Comments closedJSON, JSON everywhere
Via Ajaxian, just saw an announcement for Persevere, a network-centric, JSON-based generic storage engine. It features: A REST-based interface over regular old HTTP JSON as the native data going in and out, including circular references and such Search interface based around JSONPath RPC interface based on JSON-RPC Seemingly buzzword compliant across the board I’ve been thinking about these sorts of servers a lot lately (couchdb and strokedb are two others) in the context of the “not-the-catalog” data we track here at the library. For some stuff, clearly we need the power and speed of a real database. That power and…
Comments closedPsst. We’re not printing cards anymore
[From a series I’m calling, “Things About The Library I Think Are Stoooopid”, part one of about a zillion.] I’m going to wallow in a little bit of hyperbole here, but only a little. The problem Suppose, just for a moment, that you’re a computer programmer working anytime in the last twenty years, and someone wants you to set up a data structure to deal with a timeless issue — how to keep track of who’s on which committees in a library. If you’re a computer person Easy enough. First off, what’s a committee? Committee Committee name (string) Committee inception…
Comments closedUPenn library has video “commercials
The University of Pennsylvania Library has a set of video commercials touting their products — some of which are musicals! Worth a look-see.
Comments closed