Tag: callnumbers

Normalizing LoC Call Numbers for sorting

November 13, 2008 at 3:14 pmCategory:Uncategorized

Updated: I missed a ‘?’ in the original code that pushed a single cutter into the second-cutter position. Fixed below.

Crap. Update 2: Initial letters can be three characters long. Regexp and output changed.

LoC Call numbers tend to be a mess, and I’ve been working this morning trying to normalize them for easy string comparison.

The perl function below takes a call number (with some level of sloppiness) and returns a string suitable for comparisons with other strings returned by the function. It outputs stuff like this:

E                          E 0000.0000  0000  0000
E 184 .A1 G78              E 0184.0000A 1000G 7800
E184.A2 G78 1967           E 0184.0000A 2000G 7800 1967
E184.A2 G78 1970           E 0184.0000A 2000G 7800 1970
EA                         EA0000.0000  0000  0000
EA 10                      EA0010.0000  0000  0000
EA 10 1970                 EA0010.0000  0000  0000 1970
EA10 B7                    EA0010.0000B 7000  0000
EA 10.B7.G8                EA0010.0000B 7000G 8000
EA10.5                     EA0010.5000  0000  0000
The code, in perl, follows:

sub normalizeLC {
  1.   my $lc = uc(shift);
  2.   $lc =~ /^
  3.           \s*
  4.           ([A-Z]{1,3})  # alpha
  5.           \s*
  6.           (         # optional numbers
  7.             \d+
  8.             (?: \s*\.\s*\d+)?  # …with optional decimal point
  9.           )?
  10.           \s*
  11.           (?:               # optional cutter
  12.             \.? \s*
  13.             ([A-Z]+)      # cutter letter
  14.             \s*
  15.             (\d+)?        # cutter numbers
  16.           )?
  17.           (?:               # optional cutter
  18.             \.? \s*
  19.             ([A-Z]+)      # cutter letter
  20.             \s*
  21.             (\d+)?        # cutter numbers
  22.           )?
  23.           \s*
  24.           (.*?)            # everthing else
  25.           \s*$
  26.         /x;
  27.   my ($alpha, $num, $c1alpha, $c1num, $c2alpha, $c2num, $extra) = ($1, $2, $3, $4, $5, $6, $7);
  28.   $c1num .= 0 x (4length($c1num)); # Pad out to four decimal places
  29.   $c2num .= 0 x (4length($c2num)); # ditto
  30.   $extra = ' ' . $extra if ($extra);
  31.   return sprintf("%-3s%09.4f%-2s%4s%-2s%4s%s", $alpha, $num, $c1alpha, $c1num, $c2alpha, $c2num, $extra);
  32. }

3 Responses to “Normalizing LoC Call Numbers for sorting”

  1. Emily Lynema says:

    The first alphabetical characters can be 3 letters, not just 2. For example, KJA147 .M685 2007 (see http://www2.lib.ncsu.edu/catalog/record/NCSU2041714).

  2. Bill says:

    Emily — fixed. Thanks.

    David — I need normalized call numbers to check for inclusion in High Level Browse categories, which have call numbers as start- and end-points. Virtual browsing is another possible application, though — I’ll have to stick normalized call numbers into our VUfind installation.