<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Robot Librarian &#187; TALITAS</title>
	<atom:link href="http://robotlibrarian.billdueber.com/tag/talitas/feed/" rel="self" type="application/rss+xml" />
	<link>http://robotlibrarian.billdueber.com</link>
	<description>Disclaimer: I'm not actually a robot.</description>
	<lastBuildDate>Thu, 01 Dec 2011 16:37:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Wanted: a better proxy server</title>
		<link>http://robotlibrarian.billdueber.com/wanted-a-better-proxy-server/</link>
		<comments>http://robotlibrarian.billdueber.com/wanted-a-better-proxy-server/#comments</comments>
		<pubDate>Thu, 02 Oct 2008 16:01:40 +0000</pubDate>
		<dc:creator>Bill</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[TALITAS]]></category>

		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/?p=16</guid>
		<description><![CDATA[We in the library world have a problem. We spend a zillion-with-a-Z dollars subscribing to online databases, purchases which presume our ability to make sure only authorized people can look at them. The alternative is to be in breach of contract law, which I&#8217;ve been assured is something we&#8217;d like to avoid. The problem I [...]]]></description>
			<content:encoded><![CDATA[<p>We in the library world have a problem. We spend a zillion-with-a-Z dollars subscribing to online databases, purchases which presume our ability to make sure only authorized people can look at them. The alternative is to be in breach of contract law, which I&#8217;ve been assured is something we&#8217;d like to avoid.</p>

<p>The problem I see is this: <em>The limitations of our proxy server software restrict how we can write contracts with our vendors</em>.</p>

<p>The standard approach is to define two types of access:</p>

<ol>
    <li>By <acronym title="Internet Protocol">IP</acronym> address. The person is sitting in front of the right computer (or has hooked up to the right wireless network) and is assumed to be &#8220;OK&#8221; based on either the location of the computer (e.g., in the library building) or through the nature of the auth/authZ built into the computer&#8217;s login procedure. We tell our vendors, &#8220;Hey,&#8221; (all vendor-library conversations start with &#8216;Hey&#8217;) &#8220;here&#8217;s a list of <acronym title="Internet Protocol">IP</acronym> addresses that you should allow and associate with us.&#8221;</li>
    <li>By authenticating with a central mechanism and then sending everything through a rewriting proxy server, thus allowing us to tell the vendor, &#8220;Hey. Anything coming through our proxy server is OK. Honest.&#8221;</li>
</ol>

<p>The venerable EZProxy (now owned by <acronym title="Online Computer Library Center">OCLC</acronym>) has been the solution of choice for libraries for a long time. It does what it does very well.</p>

<p>But I want more. Much more. More more more.</p>

<p>The current model assumes there&#8217;s exactly one question: <em>Is this person authorized as a UM-Ann Arbor user?</em></p>

<p>But that&#8217;s a pretty crude question. Suppose the Business or Law school wants to buy access to stuff for only their students (news flash: they already do)? Or we want to subscribe to a journal but, because it&#8217;s so esoteric, restrict access to a couple departments to save money. Or recognize when an Ann Arbor faculty member is sitting at a public computer on a different campus but still allow her to get full rights as an Ann Arbor faculty member instead of appearing to be Joe-Random-Dearborn student, a group which has significantly less access to online journals.</p>

<p>Why can&#8217;t people with roles on multiple campuses get the best of all worlds, getting the least restrictive access possible to a given title  based on all their student/staff/faculty affiliations?</p>

<p>Why can&#8217;t we negotiate access to given titles (or even articles???) in lieu of course packets (or online reserves), restricting access to only those enrolled in the class?</p>

<p>Here at UMich, we&#8217;re just starting to get an Enterprise Directory online where we&#8217;ll actually be able to ask some of these questions. But until we get a proxy server that&#8217;s smart enough to do something with all the information, it&#8217;ll just sit there and taunt me.</p>

<p>This isn&#8217;t an idle question. We already have databases that the Business School subscribes to alone that can only be accessed when you&#8217;re physically in the B-School at one of the approved-<acronym title="Internet Protocol">IP</acronym>-address computers. That&#8217;s freakin&#8217; ridiculous.</p>

<p>Of course, this all presumes that all-or-nothing contracts aren&#8217;t the best way to go, but shouldn&#8217;t we at least have the option?</p>]]></content:encoded>
			<wfw:commentRss>http://robotlibrarian.billdueber.com/wanted-a-better-proxy-server/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Psst. We&#8217;re not printing cards anymore</title>
		<link>http://robotlibrarian.billdueber.com/psst-were-not-printing-cards-anymore/</link>
		<comments>http://robotlibrarian.billdueber.com/psst-were-not-printing-cards-anymore/#comments</comments>
		<pubDate>Mon, 12 May 2008 11:59:06 +0000</pubDate>
		<dc:creator>Bill</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[code4lib]]></category>
		<category><![CDATA[TALITAS]]></category>

		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/psst-were-not-printing-cards-anymore/</guid>
		<description><![CDATA[&#91;From a series I&#8217;m calling, &#8220;Things About The Library I Think Are Stoooopid&#8221;, part one of about a zillion.&#93; I&#8217;m going to wallow in a little bit of hyperbole here, but only a little. The problem Suppose, just for a moment, that you&#8217;re a computer programmer working anytime in the last twenty years, and someone [...]]]></description>
			<content:encoded><![CDATA[<p>&#91;From a series I&#8217;m calling, &#8220;Things About The Library I Think Are Stoooopid&#8221;, part one of about a zillion.&#93;</p>

<p>I&#8217;m going to wallow in a little bit of hyperbole here, but only a little.</p>

<h2>The problem</h2>

<p>Suppose, just for a moment, that you&#8217;re a computer programmer working anytime in the last twenty years, and 
someone wants you to set up a data structure to deal with a timeless issue &#8212; how to keep track of who&#8217;s on
which committees in a library.</p>

<h2>If you&#8217;re a computer person</h2>

<p>Easy enough. First off, what&#8217;s a committee?</p>

<p><strong>Committee</strong></p>

<ul>
<li>Committee name (string)</li>
<li>Committee inception date (date)</li>
<li>Chair (person)</li>
<li>Members (set of people)</li>
</ul>

<p>How about a person?</p>

<p><strong>Person</strong></p>

<ul>
<li>Last name (string)</li>
<li>First name (string)</li>
<li>Email address (email)</li>
</ul>

<p>Okeedokee. That looks ok so far, but we&#8217;ve got problems.</p>

<p>First off, everyone knows that committee names change. And, everyone also knows that last names can change, preferred first
names can change. email addresses change, etc. We need some sort of unique identifier to represent the <em>abstract ideal</em> of 
a particular committee or a specific individual. Let&#8217;s be lazy and just throw in an integer ID that we&#8217;ll be careful not 
to reuse, ever, for any reason.</p>

<p>So, we&#8217;ll throw that in, and make sure our references are to these unique IDs, not names or whatnot.</p>

<p>That gives us this.</p>

<p><strong>Committee</strong></p>

<ul>
<li>cID (unique integer)</li>
<li>Committee name (string)</li>
<li>Committee inception date (date)</li>
<li>Chair (pID)</li>
<li>Members (set of pIDs)</li>
</ul>

<p>How about a person?</p>

<p><strong>Person</strong></p>

<ul>
<li>pID (unique integer)</li>
<li>Last name (string)</li>
<li>First name (string)</li>
<li>Email address (email)</li>
</ul>

<p>And the mapping, of course.</p>

<p><strong>Committee-Person Mapping</strong></p>

<ul>
<li>pID (unique integer pointing into the Person table)</li>
<li>cID (unique integer pointing into the Committee table)</li>
<li>dateTermStarted (date)</li>
<li>dateTermEnds (date)</li>
</ul>

<p>If this seems simple, well, it is. Like I said, the theory is almost forty years old, and common implementations of databases at least twenty. We have well-defined unique keys, special types for dates and email addresses so we can do some sanity checking and order things and so forth, and a very, very simple mapping of people to committees where we keep track of start and end dates just to be complete.</p>

<p>Most importantly, you know what&#8217;s not here? There&#8217;s nothing about how to print it out, or what format I&#8217;m going to store it in.
Those are afterthoughts. They don&#8217;t matter. Any well-specified data model can be machine-translated into pretty much anything
you need.</p>

<h2>If you&#8217;re writing a library spec</h2>

<p>As near as I can tell, the &#8220;library&#8221; way to write this would be as follows:</p>

<p><strong>Committee</strong></p>

<p>&#91;Let &#8220;hus&#8221; stand for &#8220;hopefully unique string created by ridiculously complex algorithm&#8221;]</p>

<ul>
<li>Committee name (hus)</li>
<li>Committee inception (string masquerading as a date in any of several formats)</li>
<li>Chair (hus)</li>
<li>Members

<ul>
<li>person1 (hus) $$b email address (string) $$c start date (date-like string) $$d end date (date-like string)</li>
<li>person2 (hus) $$b email address (string) $$c start date (date-like string) $$d end date (date-like string)</li>
</ul></li>
</ul>

<p>Ummmmm&#8230;strings. Nothing but strings. Short strings, long strings, fat strings, tall strings. Strings with dollar signs. 
Strings that look like dates. Strings that contain other strings. And, just for luck, a little bit of hierarchy, where
&#8220;hierarchy&#8221; means &#8220;two levels.&#8221;</p>

<p>If someone&#8217;s name changes, well, good luck trying to find all the occurrences and fixing them all (and 
making sure you don&#8217;t get the wrong John Smith). Good luck parsing out all the dates, which rely not on machine syntax checking
but on a whole set of data-enterers trying to follow some sort of rule without making any mistakes. And good, <em>good</em> luck
getting a list of which committees a specific person belongs to.</p>

<h2>Why I bring it up</h2>

<p>One of the most eye-opening talks I heard at <a href="http://www.code4lib.org/conference/2008/">code4lib 2008</a> was a keynote
by <a href="http://www.kcoyle.net/">Karen Coyle</a> on <acronym title="Resource Description and Access">RDA</acronym> and its ongoing specification. You can <a href="http://www.code4lib.org/conference/2008/kcoyle">view the slides or watch the presentation</a> if you&#8217;d like.</p>

<p>In it, she makes the point that, when push comes to shove, <acronym title="Anglo-American Cataloguing Rules">AACR2</acronym> and <acronym title="Resource Description and Access">RDA</acronym> both ended up being tremendously focused on 
producing <em>text strings</em>.</p>

<p>Whaaaaa??</p>

<p>Was there no one on the <acronym title="Resource Description and Access">RDA</acronym> committee that had experience with anything even approaching modern data theory?</p>

<p>Of course there was. But the giant weight of history is crushing library data modeling like a skinless grape under
a dump truck.</p>

<p>Look, I understand that this is not a simple data modeling problem. I understand that there&#8217;s a whole set of issues, including a
(what I think to be a specious) demand that the cataloged data accurately reflect the actual text in a real, physical object 
that&#8217;s sitting in front of you. I&#8217;m not so naive as to think this is an easy task.</p>

<p>But anyone who, in the 21st century, approaches the large-scale creation of data without <strong>first and foremost</strong> worrying about
machine-parsability, consistent data types with machine-checkable syntax (and even some semantics) and one-to-one mappings between
unique objects (an author, an editor, a publishing house, a work) and something that uniquely identifies that object in any 
reification is&#8230;.well, I don&#8217;t know what they&#8217;re smoking.</p>

<p>We&#8217;re not printing cards anymore, people.</p>

<ul>
<li>If something is only understandable if a human is reading it, it&#8217;s not understandable by any modern definition.</li>
<li>Punctuation doesn&#8217;t belong in the description of an object. Ever. Punctuation is a rendering issue. If you&#8217;re using
punctuation, or well-formed strings, instead of descriptive attributes, <em>you&#8217;re doing it wrong</em>.</li>
<li>Just because you know your data doesn&#8217;t mean you know how to model it. Get outside help from the smartest people you can find.</li>
</ul>

<p>Whew! That felt good!</p>

<p>OK. Rant off.</p>]]></content:encoded>
			<wfw:commentRss>http://robotlibrarian.billdueber.com/psst-were-not-printing-cards-anymore/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.357 seconds -->

