[Holy Kamoly, it’s been a long time since I blogged!] Recent versions of solr have the option to run in what they call "schemaless mode", wherein fields that aren’t recognized are actually added, automatically, to the schema as real named fields. I find this intruguing, but it’s not what I’m after right now. The problem I’m in the first stages of addressing is that my schema.xml is huge mess — very little consistency, no naming conventions dictating what’s stored/indexed, etc. It grew "ogranically" (which is what I say when I mean I’ve been lazy and sloppy) and needs a full-on…
Comments closedTag: Solr
Boosting on Exactish (anchored) phrase matching in Solr: (SST #4)
Check out introduction to the Stupid Solr Tricks series if you’re just joining us.] Exact matching in Solr is easy. Use the default string type: all it does is, essentially, exact phrase matching. string is a great type for faceted values, where the only way we expect to search the index is via text pulled from the index itself. Query the index to get a value: use that value to re-query the index. Simple and self-contained. But much of the time, we don’t want exact matching. We want exactish matching. You know, where things are exactly the same except. Except…
Comments closedRequiring/Preferring searches that don’t span multiple values (SST #3)
Check out introduction to the Stupid Solr Tricks series if you’re just joining us.] Solr and multiValued fields Here’s another thing you need to understand about Solr: it doesn’t really have fields that can take multiple values. But Bill, you’re saying, sure it does. I mean, hell, it even has a ‘multiValued’ parameter. First off: watch your language. Second off: are you sure? Let’s do a quick test. Look at the following documents // exampledocs/names.json [ { "id":1, "title":"The Monkees", "name_text":[ "Peter Tork", "Mike Nesmith", "Micky Dolenz", "Davy Thomas Jones" ] }, { "id":2, "title":"Heros of the Wild West", "name_text":[…
Comments closedUsing localparams in Solr (or, how to boost records that contain all terms) (SST #2)
[Note: this isn’t so much a Stupid Solr Trick as a Thing You Should Probably Know; consider it required reading for the next SST. If you’re just joining us, check out the introduction to the Stupid Solr Tricks series] What the heck is a localparams query? A garden-variety Solr query URL looks something like this: http://localhost:8983/solr/select? defType=dismax &qf=name^2 place^1 &q=Dueber Which is fine, as far as it goes. But it’s easy to run into the limits of the standard query plugins (e.g., Dismax). Say, for example, you want something like this: title:Constructivism AND author:Dueber And furthermore, you have multiple underlying…
Solr Field Type for numeric(ish) IDs (SST #1)
[For the introduction to this series, take a quick gander at the introduction] Like everyone else in the library world, I’ve got a bunch of well-defined, well-controlled standard identifiers I need to keep track of and allow searching on. You know, well-vetted stuff like this: 1234-5678 123-4567-890 12-34-567-X 0012-0045 ISBN13: 1234567890123 ISSN: 1234567X (1998-99) ISSN (1998-99): 1234567X 1234567890 (hdk. 22 pgs) 9 Behind the 3rd floor desk Henry VIII [Note: some of these may be a titch exaggerated] How does your system deal with these on index? How about on query? Here’s an idea of how to use a custom…
Comments closedStupid Solr tricks: Introduction (SST #0)
Completed parts of the series: A Solr Field Type for numeric(ish) IDs Using localparams in Solr (or, how to boost records that contain all terms) Requiring/Preferring searches that don’t span multiple values Boosting on Exactish (anchored) phrase matching Those of you who read this blog regularly (Hi Mom!) know that while we do a lot of stuff at the University of Michigan Library, our bread-and-butter these days are projects that center around Solr. Right now, my production Solr is running an ancient nightly of version 1.4 (i.e., before 1.4 was even officially released), and reflects how much I didn’t know…
Comments closedSolr and boolean operators
[Summary: ALWAYS ALWAYS ALWAYS USE PARENTHESES TO GROUP BOOLEANS IN SOLR!!!] What does Solr do, given the following query? a OR b AND c I’ll give you three guesses, but you’ll get the first two wrong and won’t have any idea how to generate a third, so don’t spend too much time on it. Boolean algebra and operator precedence Anyone who’s had even a passing introduction to boolean alegebra knows that it specifies a strict order to how the operators are bound: NOT before AND before OR. So, one might expect the following grouping: a OR (b AND c) That’s…
Comments closedHow good is our relevancy ranking?
For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important — and, in many ways, most opaque — questions is, “How good is our relevancy ranking?” Research from the UMich Library’s Usability Group (pdf; 600k) points to the importance of relevancy ranking  for both known-item searches and discovery, but mapping search terms to the “best” results involves crawling deep inside the searcher’s head to know what she’s looking for. So, what can we do? Record interaction as a way of showing interest One possibility is to look at those…
Comments closedSolr: Forcing items with all query terms to the top of a Solr search
[Note: I’ve since made a better explanation of, and solution for, this problem.] Here at UMich, we’re apparently in the minority in that we have Mirlyn, our catalog discovery interface (a very hacked version of VuFind), set up to find records that match only a subset of the query terms. Put more succinctly: everyone else seem to join all terms with ‘AND’, whereas we do a DisMax variant on ‘OR’. Now, I’m actually quite proud of how our searching behaves. Reference desk anecdotes and our statistics all point to the idea that people tend to find what they’re looking for.…
Comments closedPushing MARC to Solr; processing times and threading and such
[This is in response to a thread on the blacklight mailing list about getting MARC data into Solr.] What’s the question? The question came up, “How much time do we spend processing the MARC vs trying to push it into Solr?”. Bob Haschart found that even with a pretty damn complicated processing stage, pushing the data to solr was still, at best, taking at least as long as the processing stage. I’m interested because I’ve been struggling to write a solrmarc-like system that runs under JRuby. Architecturally, the big difference between my stuff and solrmac is that I use the…
Comments closed