<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Still another look at MARC parsing in ruby and jruby</title>
	<atom:link href="http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/</link>
	<description>Disclaimer: I'm not actually a robot.</description>
	<lastBuildDate>Thu, 08 Jul 2010 20:37:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Bill</title>
		<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/comment-page-1/#comment-292</link>
		<dc:creator>Bill</dc:creator>
		<pubDate>Mon, 01 Feb 2010 15:08:59 +0000</pubDate>
		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/#comment-292</guid>
		<description>&lt;p&gt;The XML parsing code for ruby-marc in jruby is a stax-based implementation I wrote from a position of intense ignorance. Nabbing the xml-parser out of the marc4j code is probably a next step when I get time. For the moment I&#039;ve just re-opened the various marc4j classes to add syntactic sugar where necessary.&lt;/p&gt;

&lt;p&gt;BTW, can I tell you how much jruby rocks????&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>The <acronym title="Extensible Markup Language">XML</acronym> parsing code for ruby-marc in jruby is a stax-based implementation I wrote from a position of intense ignorance. Nabbing the xml-parser out of the marc4j code is probably a next step when I get time. For the moment I&#8217;ve just re-opened the various marc4j classes to add syntactic sugar where necessary.</p>

<p><acronym title="By The Way">BTW</acronym>, can I tell you how much jruby rocks????</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Charles Oliver Nutte</title>
		<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/comment-page-1/#comment-291</link>
		<dc:creator>Charles Oliver Nutte</dc:creator>
		<pubDate>Mon, 01 Feb 2010 14:35:30 +0000</pubDate>
		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/#comment-291</guid>
		<description>&lt;p&gt;The XML performance in ruby-marc is almost certainly due to whatever XML parsing library it uses. Do you know what library that might be?&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>The <acronym title="Extensible Markup Language">XML</acronym> performance in ruby-marc is almost certainly due to whatever <acronym title="Extensible Markup Language">XML</acronym> parsing library it uses. Do you know what library that might be?</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/comment-page-1/#comment-289</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Fri, 29 Jan 2010 22:36:25 +0000</pubDate>
		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/#comment-289</guid>
		<description>&lt;p&gt;So one thing about marc-ruby, not in parsing but in analyzing/processing, is every time you ask for a tag (say, &#039;245&#039;), it&#039;s got to iterate through every field in the record, and match each one to see if it&#039;s a 245. If you&#039;re doing a lot of &#039;mapping&#039; to a lot of records.... I&#039;ve wondered for a while if there&#039;s a bottleneck there, and how much difference it would make make to build a hash &#039;index&#039; of tags. Almost all &#039;mapping&#039; operations begin by looking up tag numbers, and the majority pretty much end there too.&lt;/p&gt;

&lt;p&gt;But I&#039;ve never gotten around to trying to profile it.&lt;/p&gt;

&lt;p&gt;Curious how marc4j is implemented in terms of access to marc record fields by tag.&lt;/p&gt;

&lt;p&gt;Since your tests reveal that even ruby-marc is faster under jruby than mri, there&#039;s something just about the environment that is speeding things up. But I wonder if there are optmizations (perhaps simple ones) that could be made to ruby-marc to make it catch up with marc4j.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>So one thing about marc-ruby, not in parsing but in analyzing/processing, is every time you ask for a tag (say, &#8216;245&#8242;), it&#8217;s got to iterate through every field in the record, and match each one to see if it&#8217;s a 245. If you&#8217;re doing a lot of &#8216;mapping&#8217; to a lot of records&#8230;. I&#8217;ve wondered for a while if there&#8217;s a bottleneck there, and how much difference it would make make to build a hash &#8216;index&#8217; of tags. Almost all &#8216;mapping&#8217; operations begin by looking up tag numbers, and the majority pretty much end there too.</p>

<p>But I&#8217;ve never gotten around to trying to profile it.</p>

<p>Curious how marc4j is implemented in terms of access to marc record fields by tag.</p>

<p>Since your tests reveal that even ruby-marc is faster under jruby than mri, there&#8217;s something just about the environment that is speeding things up. But I wonder if there are optmizations (perhaps simple ones) that could be made to ruby-marc to make it catch up with marc4j.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/comment-page-1/#comment-288</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Fri, 29 Jan 2010 22:12:11 +0000</pubDate>
		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/#comment-288</guid>
		<description>&lt;p&gt;Oh wait, I see it now! Nevermind.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Oh wait, I see it now! Nevermind.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/comment-page-1/#comment-287</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Fri, 29 Jan 2010 22:10:48 +0000</pubDate>
		<guid isPermaLink="false">http://robotlibrarian.billdueber.com/still-another-look-at-marc-parsing-in-ruby-and-jruby/#comment-287</guid>
		<description>&lt;p&gt;These kinds of numbers always make my head swim a bit.&lt;/p&gt;

&lt;p&gt;The original question you had, if I understand right, was if you could parse MARC at 250/300 records per second. But your findings aren&#039;t expressed in records per second. Is it possible to say how they add up in records per second, so we know which methods are fast enough to meet your original criteria and which aren&#039;t?&lt;/p&gt;

&lt;p&gt;(You must have a really fast machine to get 250/300 records per second from SolrMarc. I only get 100 records per second on my machine.)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>These kinds of numbers always make my head swim a bit.</p>

<p>The original question you had, if I understand right, was if you could parse <acronym title="MAchine Readable Cataloging">MARC</acronym> at 250/300 records per second. But your findings aren&#8217;t expressed in records per second. Is it possible to say how they add up in records per second, so we know which methods are fast enough to meet your original criteria and which aren&#8217;t?</p>

<p>(You must have a really fast machine to get 250/300 records per second from SolrMarc. I only get 100 records per second on my machine.)</p>]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.382 seconds -->
