Daniel Einspanjer (daniele) wrote,
Daniel Einspanjer

  • Mood:

Performance of Rhino JS engine and Janino library in Kettle

My friend Roland Bouman made an interesting blog post regarding the performance of a bit of JavaScript for Kettle that he saw on a different blog.

Given the large amounts of data that I am shoving through Kettle every day, I tend to be extremely concerned about performance. Even a small inefficiency can lead to dramatic slowdowns. Hence, when I saw his post, I got to thinking about how I would approach the problem if it were within the realm of the large data sets I work with and hence required extreme optimization.

I didn't have a lot of spare time to dedicate to this experiment, so I opted for a screen-cast instead of a nicely formatted blog post. That said, I think there is a certain benefit in being able to see the work flow of someone who is very comfortable with Kettle.

The screen-cast is currently in Apple QuickTime format. Bleh. I need to get a new Ogg Theora transcoder because the one that I tried to use last time is not happy with me and I didn't have time to fiddle with it.

So, if you use Kettle and are interested in these things, here is the screen-cast. Be warned it is 30 minutes long and probably not extremely exciting to anyone outside of the ETL field.

Kettle string transformation optimization walk-through

If you are familiar with developing plug-ins for Kettle and you'd like to take a look at the User Defined Java Class plug-in I demonstrated at the end of the screen-cast, you can pick it up from the Pentaho SVN plugins repository. Just wear gloves because it has rough edges.
User Defined Java Class plug-in
Tags: kettle
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded