Archives: May 2011

Using SQLite3 from JRuby without ActiveRecord

Tags:

May 26, 2011 at 2:20 pmCategory:Uncategorized

I spent way too long asking my friend, The Internet, how to get a normal DBI connection to SQLIte3 using JRuby. Apparently, everyone except me is using ActiveRecord and/or Rails and doesn’t want to just connect to the database.

But I do. Here’s how.

First, get the gems:

  1.   gem install dbi
  2.   gem install dbd-jdbc
  3.   gem install jdbc-sqlite3

Then you’re ready to load it up into DBI.

  1. require 'rubygems' # if you're using 1.8 still
  2. require 'java'
  3. require 'dbi'
  4. require 'dbd/jdbc'
  5. require 'jdbc/sqlite3'
  6.  
  7. databasefile = 'test.db'
  8. dbh = DBI.connect(
  9.   "DBI:jdbc:sqlite:#{databasefile}",  # connection string
  10.   '',                                 # no username for sqlite3
  11.   '',                                 # no password for sqlite3
  12.   'driver' => 'org.sqlite.JDBC')      # need to set the driver
  13.  
  14. # That's it. Everything below here is stock DBI
  15.  
  16. dbh.do "create table squares (i integer, isquared integer)"
  17.  
  18. ins = dbh.prepare("insert into squares values (?, ?)")
  19. (1..20).each do |i|
  20.   ins.execute(i, i*i)
  21. end

3 Responses to “Using SQLite3 from JRuby without ActiveRecord”

  1. Matteo says:

    Thanks! Only a question : i’m unable to retrieve values stored in the ‘squares’ table.. @dbh.execute("select * from squared";) do |row| puts row #or .. puts row['squares'] end

    and even with the prepare / execute statement but it always returns ‘nil’ or #

    What i’m missing?

    p.s. : I checked test.db with an external application and squares is correctly filled.

    jruby and jdbc are up to date.

  2. Matteo says:

    Sorry for bad formatting..and SQL string is : “select * from squares;”

  3. patches says:

    At least with dbd-jdbc 0.1.4 it looks like the line:

    require ‘dbd/jdbc’

    should have the ‘J’ capitalized:

    require ‘dbd/Jdbc’

Leave a Reply

How good is our relevancy ranking?

May 25, 2011 at 2:53 pmCategory:Uncategorized

For those of us that spend our days trying to tweak Mirlyn to make it better, one of the most important — and, in many ways, most opaque — questions is, “How good is our relevancy ranking?”

Research from the UMich Library’s Usability Group (pdf; 600k) points to the importance of relevancy ranking  for both known-item searches and discovery, but mapping search terms to the “best” results involves crawling deep inside the searcher’s head to know what she’s looking for.

So, what can we do?

Record interaction as a way of showing interest

One possibility is to look at those records that are somehow “touched” by a user in such a way that we can log it. If a user bothers to interact with an individual record, we’ll assume the record is interesting to her in the context of the current search.

There are three links associated with an individual record that a user can click on from the search results:

  • (62% of all record interactions) The title
  • (28%) An external link (HathiTrust, Google Books, or one of our vendors)
  • (10%) The “see holdings” link for those items that have multiple holdings

Our first issue arises quickly: only about a quarter of Mirlyn sessions contain any of these actions. For a full 75% of sessions, we have no data about which records users are paying attention to. They get a call number — or determine they have a failed search — and move on.

Where on the page do users interact with items?

We don’t know how users that interact with items differ from those that don’t. But for those that do, more than half of all record interactions are with the first record.

Here are the numbers for the first five records:

  • First record: 54%
  • Second record: 12%
  • Third record: 6%
  • Fouth record: 3.7%
  • Fifth record: 2.5%

More than 75% of all record interactions are with the first four items on the first page of results.

What does it all mean?

Frustratingly, we don’t know. Several possibilities are obvious:

  • we’re doing a good job with relevancy ranking
  • people do mostly known-item searches
  • people don’t bother looking past the first few results
  • excellent general search engines (e.g., Google) have trained people to believe that the first result is always worth a closer look.

The interactions between these (and unknown other) factors are likely complex.

In the meantime, though, to the extent these data can be extended to the general case (not at all obvious), we’re not doing too bad of a job.

Leave a Reply

I just released another (this time pretty good) version of my gem for normalizing/validating library standard numbers, library_stdnums (github source / docs).

The short version of the functions available:

  • ISBN: get checkdigit, validate, convert isbn10 to/from isbn13, normalize (to 13-digit)
  • ISSN: get checkdigit, validate, normalize
  • LCCN: validate, normalize

Validation of LCCNs doesn’t involve a checkdigit; I basically just normalize whatever is sent in and then see if the result is syntactically valid.

My plan in my Copious Free Time is to do a Java version of these as well and then stick them into a new-style Solr v.3 filter so I (and, by extension, you, if you’re interested) can have Solr do normalization during both index and search time.

Leave a Reply

A couple days ago I decided to finally get back to working on threach to try to deal with problems it had — essentially, it didn’t deal well with non-local exits due to calls to break or even something simple like a NoMethodError.

[BTW, I think I managed it. As near as I can tell, threach version 0.4 won't deadlock anymore]

Along the way, while trying to figure out how threads affect the behavior of different non-local exits, I noticed that in some cases there was still work being done by one or more threads long after there was an exception raised.

I re-discovered something that a lot of people already know: raise/rescue under MRI is slow, and under JRuby can be unbearably slow. How slow?

Let’s look at four simple blocks that exercise four different block exit strategies: break, catch and throw, raise with the normal single (or zero) arguments, as well as the three-argument version of raise.

Simple breakCatch/Throw
range.each do |i|      
  break          
end              
      
    
catch(:benchmarking) do  
 range.each do |i|      
   throw(:benchmarking) 
 end                    
end
      
    
Raise (1 arg)Raise (3 args)
 begin                  
   range.each do |i|    
     raise StandardError
   end                  
 rescue                 
  # do nothing                
 end                          
     
    
begin                  
  range.each do |i|
    raise StandardError, :hi, nil
  end
rescue 
 # do nothing
end
      
    

In each case, we immediately exit the block without doing any work; the idea is to measure how long it takes to break out for each case.

So....let's run them each 100K times and see what happens, shall we? Times are in seconds, averaged over two runs.

Ruby 1.8Ruby 1.9JRubyJRuby --1.9
break 0.120.070.29 0.21
catch/throw 0.350.280.64 0.48
raise (1 arg)1.782.1026.6022.06
raise (3 arg)1.852.130.45 0.45

The first thing to note is that this is 100K iterations. Three of the strategies are fast enough that you'd have to work really, really hard to notice them. In terms of speed, raise (3 args), catch/throw, and break are fast enough that you shouldn't bother worrying about them (although you should choose the method that makes your code easy to understand).

The second things to note is Holy Camoli! JRuby is slow there!

This Jira ticket tells the tale: The creation of the backtrace is very, very expensive for JRuby. That nil at the end of the raise (3 args) call suppresses the creation of that backtrace, so the speed is fine.

Three things worth saying here:

  • If you're using raise/rescue for flow control, you're already doing it wrong. Reserve exceptions for, well, exceptional conditions that are only going to be raised once or twice, not all the time.
  • If you're writing code that, for some ungodly reason, is planning on raising a crapload of exceptions, use the three-arg version. I'm looking at you, gem authors.
  • If you're writing your code without worrying about how it will work under multiple threads, well, please don't do that. Everyone has multi-core systems these days, and it's silly to not be able to use them. Plus, counting on Matz to never move to a VM with real threads is a big gamble.

4 Responses to “A short ruby diversion: cost of flow control under Ruby”

  1. For flow control in ruby, there’s actually a throw/catch architecture, which is an entirely different beast from raise/rescue. Nobody hardly ever uses them, throw/catch, I never see em, never used em myself either.

    Note: raise/rescue DO correspond to JAVA’s throw/catch. ruby’s throw/catch is something different: It can only be used in ‘static scoped’ situations, basically where the catch is in a static code block that’s a parent of the throw. But if people are using raise/rescue for ‘flow control’ scenarios in places where throw/catch would work…. would be interesting to benchmark the performance of throw/catch. throw/catch at least is indeed actually intended for flow control.

    Maybe nobody uses em cause they smell suspiciosuly like the dreaded ‘goto’, but that’s essentially what you’re doing with raise/rescue if you’re using em for flow control too, and apparently that doesn’t stop some people? Very curious what code you saw that was using raise/rescue like this, it’s certainly not a recommended thing to do by anyone (I don’t think?).

  2. PS: Am I the only one that never uses those raise syntactic sugar shortcuts? I always actually create the Exception object myself:

    raise StandardError.new

    “raise StandardError” does the same thing, it’s just a shortcut. And:

    raise StandardError, “message” ==== raise StandardError.new(“message”)

    I don’t know the way to avoid backtrace generation when throwing an actually explicitly created Exception object, but there probably is one.

  3. And briefly looking up the documentation on throw/catch, I’m wrong about the catch having to be statically scoped in a block above the ‘throw’ (the page I found in the online old ruby book actually specifically tells you this isn’t the case even though you might think it is, heh). But I’m still confused about where throw/catch can actually be used. It’s like the least used ruby language feature ever. But if lots of people are using raise/rescue for flow control, maybe throw/catch ought to be marketted better.

  4. Another blog figures out the same thing, posted on reddit. You beat them to it! http://www.coffeepowered.net/2011/06/17/jruby-performance-exceptions-are-not-flow-control/

Leave a Reply