<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Robot Librarian &#187; code</title>
	<atom:link href="http://robotlibrarian.billdueber.com/tag/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://robotlibrarian.billdueber.com</link>
	<description>Disclaimer: I'm not actually a robot.</description>
	<lastBuildDate>Thu, 01 Dec 2011 16:37:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Normalizing LoC Call Numbers for sorting</title>
		<link>http://robotlibrarian.billdueber.com/normalizing-loc-call-numbers-for-sorting/</link>
		<comments>http://robotlibrarian.billdueber.com/normalizing-loc-call-numbers-for-sorting/#comments</comments>
		<pubDate>Thu, 13 Nov 2008 19:14:34 +0000</pubDate>
		<dc:creator>Bill</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[callnumbers]]></category>
		<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/?p=22</guid>
		<description><![CDATA[Updated: I missed a &#8216;?&#8217; in the original code that pushed a single cutter into the second-cutter position. Fixed below. Crap. Update 2: Initial letters can be three characters long. Regexp and output changed. LoC Call numbers tend to be a mess, and I&#8217;ve been working this morning trying to normalize them for easy string [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Updated</strong>: <em>I missed a &#8216;?&#8217; in the original code that pushed a single cutter into the second-cutter position. Fixed below.</em></p>

<p><strong>Crap. Update 2</strong>: Initial letters can be three characters long. Regexp and output changed.</p>

<p>LoC Call numbers tend to be a mess, and I&#8217;ve been working this morning trying to normalize them for easy string comparison.</p>

<p>The perl function below takes a call number (with some level of sloppiness) and returns a string suitable for comparisons with other strings returned by the function. It outputs stuff like this:
<pre>E                          E 0000.0000  0000  0000
E 184 .A1 G78              E 0184.0000A 1000G 7800
E184.A2 G78 1967           E 0184.0000A 2000G 7800 1967
E184.A2 G78 1970           E 0184.0000A 2000G 7800 1970
EA                         EA0000.0000  0000  0000
EA 10                      EA0010.0000  0000  0000
EA 10 1970                 EA0010.0000  0000  0000 1970
EA10 B7                    EA0010.0000B 7000  0000
EA 10.B7.G8                EA0010.0000B 7000G 8000
EA10.5                     EA0010.5000  0000  0000</pre>
The code, in perl, follows:</p>

<div class="geshi no perl"><div class="head">sub normalizeLC {</div><ol><li class="li1"><div class="de1">&nbsp; <span class="kw1">my</span> <span class="re0">$lc</span> = <span class="kw3">uc</span><span class="br0">&#40;</span><span class="kw3">shift</span><span class="br0">&#41;</span>;</div></li>
<li class="li1"><div class="de1">&nbsp; <span class="re0">$lc</span> =~ <span class="sy0">/</span>^</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \<span class="kw3">s</span><span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="br0">&#91;</span>A-Z<span class="br0">&#93;</span><span class="br0">&#123;</span><span class="nu0">1</span>,<span class="nu0">3</span><span class="br0">&#125;</span><span class="br0">&#41;</span> &nbsp;<span class="co1"># alpha</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span> &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># optional numbers</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \d+</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>?: \s<span class="sy0">*</span>\.\s<span class="sy0">*</span>\d+<span class="br0">&#41;</span>? &nbsp;<span class="co1"># &#8230;with optional decimal point</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>?</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>?: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># optional cutter</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \.? \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="br0">&#91;</span>A-Z<span class="br0">&#93;</span>+<span class="br0">&#41;</span> &nbsp; &nbsp; &nbsp;<span class="co1"># cutter letter</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>\d+<span class="br0">&#41;</span>? &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># cutter numbers</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>?</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>?: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># optional cutter</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \.? \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="br0">&#91;</span>A-Z<span class="br0">&#93;</span>+<span class="br0">&#41;</span> &nbsp; &nbsp; &nbsp;<span class="co1"># cutter letter</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>\d+<span class="br0">&#41;</span>? &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># cutter numbers</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>?</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \s<span class="sy0">*</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>.<span class="sy0">*</span>?<span class="br0">&#41;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># everthing else</span></div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \<span class="kw3">s</span><span class="sy0">*</span>$</div></li>
<li class="li1"><div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">/</span>x;</div></li>
<li class="li1"><div class="de1">&nbsp; <span class="kw1">my</span> <span class="br0">&#40;</span><span class="re0">$alpha</span>, <span class="re0">$num</span>, <span class="re0">$c1alpha</span>, <span class="re0">$c1num</span>, <span class="re0">$c2alpha</span>, <span class="re0">$c2num</span>, <span class="re0">$extra</span><span class="br0">&#41;</span> = <span class="br0">&#40;</span>$<span class="nu0">1</span>, $<span class="nu0">2</span>, $<span class="nu0">3</span>, $<span class="nu0">4</span>, $<span class="nu0">5</span>, $<span class="nu0">6</span>, $<span class="nu0">7</span><span class="br0">&#41;</span>;</div></li>
<li class="li1"><div class="de1">&nbsp; <span class="re0">$c1num</span> .= <span class="nu0">0</span> x <span class="br0">&#40;</span><span class="nu0">4</span> &#8211; <span class="kw3">length</span><span class="br0">&#40;</span><span class="re0">$c1num</span><span class="br0">&#41;</span><span class="br0">&#41;</span>; <span class="co1"># Pad out to four decimal places</span></div></li>
<li class="li1"><div class="de1">&nbsp; <span class="re0">$c2num</span> .= <span class="nu0">0</span> x <span class="br0">&#40;</span><span class="nu0">4</span> &#8211; <span class="kw3">length</span><span class="br0">&#40;</span><span class="re0">$c2num</span><span class="br0">&#41;</span><span class="br0">&#41;</span>; <span class="co1"># ditto</span></div></li>
<li class="li1"><div class="de1">&nbsp; <span class="re0">$extra</span> = <span class="st0">&#39; &#39;</span> . <span class="re0">$extra</span> <span class="kw1">if</span> <span class="br0">&#40;</span><span class="re0">$extra</span><span class="br0">&#41;</span>;</div></li>
<li class="li1"><div class="de1">&nbsp; <span class="kw3">return</span> <span class="kw3">sprintf</span><span class="br0">&#40;</span><span class="st0">&quot;%-3s%09.4f%-2s%4s%-2s%4s%s&quot;</span>, <span class="re0">$alpha</span>, <span class="re0">$num</span>, <span class="re0">$c1alpha</span>, <span class="re0">$c1num</span>, <span class="re0">$c2alpha</span>, <span class="re0">$c2num</span>, <span class="re0">$extra</span><span class="br0">&#41;</span>;</div></li>
<li class="li1"><div class="de1"><span class="br0">&#125;</span></div></li></ol></div>]]></content:encoded>
			<wfw:commentRss>http://robotlibrarian.billdueber.com/normalizing-loc-call-numbers-for-sorting/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.894 seconds -->

