Corrected Code4Lib slides are up
February 15, 2011 at 5:49 pmCategory:Uncategorized
…at the same URL.
I was, to put it mildly, incredibly excited about code4lib this year because, for once, I thought I had something to say. And I did have something to say. And I said it. But it was wrong.
I presented a bunch of statistics drawn from nearly a year of Mirlyn logs. The most outlandish of my assertions, and the one that eventually turned out to be the most incorrect, was that some 45% of all our user sessions consist of only one action: a search.
Unfortunately, I’d missed a whole swath of things I should have excluded. I’d remembered robots and stuff coming in from our link resolver and so on. I hadn’t counted on having to fight my own stupidity.
In short: catalog.hathitrust.org and mirlyn.lib.umich.edu share a common code base, as well as a Solr backend. I was correctly excluding all the HathiTrust stuff from my stats except for simple searches. What I ended up with was a whole lotta sessions with nothing in them but that search. Luckily, I noticed waaaay too many people coming in via the HathiTrust site (which I know doesn’t have a link to Mirlyn) and did more digging.
The slides have been updated with correct numbers. Luckily, even though the adjustment was pretty extreme, I don’t think many of my conclusions are invalidated, especially given corroborating evidence from an extensive survey conducted by our usability team (PDF). They conclude, among other things, that known-item searching is prevalent and relevancy raking is important across task boundaries.
The basic stats from the powerpoint, for those who don’t want to read all my notes:
- 17% of all sessions have one action: a search
- In only 28% of all sessions does the user see the Record View
- 75% of all logged actions that target an individual record (see the full record view, look at extended holdings, etc.) happen with a record in the top 6 search results
- 7% of sessions involve a user adding a facet
- 2% of sessions involve a user exporting records
[...] « Newer | [...]
Actually what this is is a really good lesson in the difficulty of getting valid results from usage statistics. “valid” meaning “actually answers the question you thought you were asking” in general.
Doesn’t mean we shouldn’t look at usage statistics of course. But it takes care and time to get good numbers — and then more care and time to make sure the numbers actually allow you to draw the conclusions you want to draw.
Not saying you didn’t do that here, but when my staff often asks me “Can’t we just get numbers to answer this question,” I will use this as an example of how you can often end up with numbers that don’t mean what you think they mean, and it takes non-trivial staff time to get and analyze the numbers to try to answer the questions you want to ask — so let’s be clear and intentional with the questions we’re asking, instead of just asking for ‘all the numbers’.