Archives: February 2011

Corrected Code4Lib slides are up

February 15, 2011 at 5:49 pmCategory:Uncategorized

…at the same URL.

I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong.

I presented a bunch of statistics drawn from nearly a year of Mirlyn logs. The most outlandish of my assertions, and the one that eventually turned out to be the most incorrect, was that some 45% of all our user sessions consist of only one action: a search.

Unfortunately, I’d missed a whole swath of things I should have excluded. I’d remembered robots and stuff coming in from our link resolver and so on. I hadn’t counted on having to fight my own stupidity.

In short: catalog.hathitrust.org and mirlyn.lib.umich.edu share a common code base, as well as a Solr backend. I was correctly excluding all the HathiTrust stuff from my stats except for simple searches. What I ended up with was a whole lotta sessions with nothing in them but that search. Luckily, I noticed waaaay too many people coming in via the HathiTrust site (which I know doesn’t have a link to Mirlyn) and did more digging.

The slides have been updated with correct numbers. Luckily, even though the adjustment was pretty extreme, I don’t think many of my conclusions are invalidated, especially given corroborating evidence from an extensive survey conducted by our usability team (PDF). They conclude, among other things, that known-item searching is prevalent and relevancy raking is important across task boundaries.

The basic stats from the powerpoint, for those who don’t want to read all my notes:

  • 17% of all sessions have one action: a search
  • —In only 28% of all sessions does the user see the Record View
  • 75% of all logged actions that target an individual record (see the full record view, look at extended holdings, etc.) happen with a record in the top 6 search results
  • 7% of sessions involve a user adding a facet
  • 2% of sessions involve a user exporting records

2 Responses to “Corrected Code4Lib slides are up”

  1. Actually what this is is a really good lesson in the difficulty of getting valid results from usage statistics. “valid” meaning “actually answers the question you thought you were asking” in general.

    Doesn’t mean we shouldn’t look at usage statistics of course. But it takes care and time to get good numbers — and then more care and time to make sure the numbers actually allow you to draw the conclusions you want to draw.

    Not saying you didn’t do that here, but when my staff often asks me “Can’t we just get numbers to answer this question,” I will use this as an example of how you can often end up with numbers that don’t mean what you think they mean, and it takes non-trivial staff time to get and analyze the numbers to try to answer the questions you want to ask — so let’s be clear and intentional with the questions we’re asking, instead of just asking for ‘all the numbers’.

Leave a Reply

DANGER! I was trying to re-verify my numbers and found a glaring and hugely important mistake. I’ll make a new post with the details, but basically I was counting about 180k sessions (out of only 735k) that I should have been ignoring. Please ignore my basic stats until further notice. See the new numbers and corrected slides for more accurate data.

===============

I did a little Lightning Talk at Code4Lib 2011 and cleaned up (and heavily annotated) my slides for anyone interested in them.

The focus was on some basic stats about usage of our OPAC, Mirlyn, in calendar 2010.

I’ll be doing some posts and/or more rigorous writing on this stuff soon, but wanted to get these up in a timely fashion.

3 Responses to “[RETRACTED] Code4Lib 2011 Lightning Talk Slides”

  1. [...] This post was mentioned on Twitter by Dan Chudnov and Jennyann, Bill Dueber. Bill Dueber said: Slides from my lightning talk about OPAC stats are up. http://bit.ly/dMvgPb #c4l11 [...]

  2. Aaron Tay says:

    Really enjoyed your slides. Love the long presentation notes. most slides I can’t tell what the speaker is trying to say but for yours no such problems. Just added Google analytics to the catalogue so this is very timely, you have far better statistics but still good figures to compare.

    Just finished reading a year’s worth of your blog post. Really really nice blog, I can’t fully follow 100% of the details (I’m technically a reference librarian), but really enjoy most of the entries and the statistics of the catalogue you report.

  3. I personally (speaking only for myself) think this would make a good short Code4Lib Journal article, and encourage you to submit one.

Leave a Reply