Clip Man

daniele


Daniel Einspanjer's journal

Data warehousing, ETL, BI, and general hackery


Previous Entry Add to Memories Share Next Entry
Performance of Rhino JS engine and Janino library in Kettle
Clip Man
daniele
My friend Roland Bouman made an interesting blog post regarding the performance of a bit of JavaScript for Kettle that he saw on a different blog.

Given the large amounts of data that I am shoving through Kettle every day, I tend to be extremely concerned about performance. Even a small inefficiency can lead to dramatic slowdowns. Hence, when I saw his post, I got to thinking about how I would approach the problem if it were within the realm of the large data sets I work with and hence required extreme optimization.

I didn't have a lot of spare time to dedicate to this experiment, so I opted for a screen-cast instead of a nicely formatted blog post. That said, I think there is a certain benefit in being able to see the work flow of someone who is very comfortable with Kettle.

The screen-cast is currently in Apple QuickTime format. Bleh. I need to get a new Ogg Theora transcoder because the one that I tried to use last time is not happy with me and I didn't have time to fiddle with it.

So, if you use Kettle and are interested in these things, here is the screen-cast. Be warned it is 30 minutes long and probably not extremely exciting to anyone outside of the ETL field.

Kettle string transformation optimization walk-through

If you are familiar with developing plug-ins for Kettle and you'd like to take a look at the User Defined Java Class plug-in I demonstrated at the end of the screen-cast, you can pick it up from the Pentaho SVN plugins repository. Just wear gloves because it has rough edges.
User Defined Java Class plug-in
Tags:

Hi!

Love the video - it's great! You pull a few wicked tricks there, and I am planning to add those to my repetoire. I might re-blog it over the weekend, in order to compare with the things I have already.

One thing about the video: in the end you mention a 100 x performance increase. But it looked to me you were up from about 50,000 r/s to 500,000 r/s, which i would call a 10x increase. Am I missing something?

Yes, it is a 10x increase. Did I mention I sometimes have problems with basic math? I knew as soon as I started saying the words that they weren't right, but I wasn't about to go back and rerecord that part so I just mumbled and trailed off. ;)

This is interesting

(Anonymous)

2009-11-18 03:43 pm (UTC)

Hey Daniel this is Manish Maheshwari, from Northeastern...
Well I started with Kettle and pentaho BI recently...did a few experiments with Spoon and kitchen..few demos..As advised by you I did download the Jira solution and I am still facing issues with it (Software-quality) but I am sure going through your blogs and other knowledgeable blogs will make my life easier... I liked the screencast...hopefully u will keep posting more.. :)

You are viewing daniele