My friend Roland Bouman made an interesting blog post regarding the performance of a bit of JavaScript for Kettle that he saw on a different blog.
Given the large amounts of data that I am shoving through Kettle every day, I tend to be extremely concerned about performance. Even a small inefficiency can lead to dramatic slowdowns. Hence, when I saw his post, I got to thinking about how I would approach the problem if it were within the realm of the large data sets I work with and hence required extreme optimization.
I didn't have a lot of spare time to dedicate to this experiment, so I opted for a screen-cast instead of a nicely formatted blog post. That said, I think there is a certain benefit in being able to see the work flow of someone who is very comfortable with Kettle.
The screen-cast is currently in Apple QuickTime format. Bleh. I need to get a new Ogg Theora transcoder because the one that I tried to use last time is not happy with me and I didn't have time to fiddle with it.
So, if you use Kettle and are interested in these things, here is the screen-cast. Be warned it is 30 minutes long and probably not extremely exciting to anyone outside of the ETL field.
Kettle string transformation optimization walk-through
If you are familiar with developing plug-ins for Kettle and you'd like to take a look at the User Defined Java Class plug-in I demonstrated at the end of the screen-cast, you can pick it up from the Pentaho SVN plugins repository. Just wear gloves because it has rough edges.
User Defined Java Class plug-in
- Performance of Rhino JS engine and Janino library in Kettle
tired
2009-11-17 09:50 am (UTC)
Love the video - it's great! You pull a few wicked tricks there, and I am planning to add those to my repetoire. I might re-blog it over the weekend, in order to compare with the things I have already.
One thing about the video: in the end you mention a 100 x performance increase. But it looked to me you were up from about 50,000 r/s to 500,000 r/s, which i would call a 10x increase. Am I missing something?
2009-11-17 05:07 pm (UTC)
This is interesting
(Anonymous)
2009-11-18 03:43 pm (UTC)
Well I started with Kettle and pentaho BI recently...did a few experiments with Spoon and kitchen..few demos..As advised by you I did download the Jira solution and I am still facing issues with it (Software-quality) but I am sure going through your blogs and other knowledgeable blogs will make my life easier... I liked the screencast...hopefully u will keep posting more.. :)