Updated: I missed a ‘?’ in the original code that pushed a single cutter into the second-cutter position. Fixed below.

Crap. Update 2: Initial letters can be three characters long. Regexp and output changed.

LoC Call numbers tend to be a mess, and I’ve been working this morning trying to normalize them for easy string comparison.

The perl function below takes a call number (with some level of sloppiness) and returns a string suitable for comparisons with other strings returned by the function. It outputs stuff like this:


E                          E 0000.0000  0000  0000
E 184 .A1 G78              E 0184.0000A 1000G 7800
E184.A2 G78 1967           E 0184.0000A 2000G 7800 1967
E184.A2 G78 1970           E 0184.0000A 2000G 7800 1970
EA                         EA0000.0000  0000  0000
EA 10                      EA0010.0000  0000  0000
EA 10 1970                 EA0010.0000  0000  0000 1970
EA10 B7                    EA0010.0000B 7000  0000
EA 10.B7.G8                EA0010.0000B 7000G 8000
EA10.5                     EA0010.5000  0000  0000

The code, in perl, follows:

sub normalizeLC {
  my $lc = uc(shift);
  $lc =~ /^
          \s*
          ([A-Z]{1,3})  # alpha
          \s*
          (         # optional numbers
            \d+
            (?: \s*\.\s*\d+)?  # ...with optional decimal point
          )?
          \s*
          (?:               # optional cutter
            \.? \s*
            ([A-Z]+)      # cutter letter
            \s*
            (\d+)?        # cutter numbers
          )?
          (?:               # optional cutter
            \.? \s*
            ([A-Z]+)      # cutter letter
            \s*
            (\d+)?        # cutter numbers
          )?
          \s*
          (.*?)            # everthing else
          \s*$
        /x;
  my ($alpha, $num, $c1alpha, $c1num, $c2alpha, $c2num, $extra) = ($1, $2, $3, $4, $5, $6, $7);
  $c1num .= 0 x (4 - length($c1num)); # Pad out to four decimal places
  $c2num .= 0 x (4 - length($c2num)); # ditto
  $extra = ' ' . $extra if ($extra);
  return sprintf("%-3s%09.4f%-2s%4s%-2s%4s%s", $alpha, $num, $c1alpha, $c1num, $c2alpha, $c2num, $extra);
}