?

Log in

No account? Create an account
Clip Man

daniele


Daniel Einspanjer's journal

Data warehousing, ETL, BI, and general hackery


Previous Entry Share Next Entry
Performance improvements at the cost of complexity
Clip Man
daniele
I discovered something that I feel is a bit of a bug in the Sun Java implementation. 

If you pass in a string to the method InetAddress.getByName(), it does a bunch of testing to see if it is a domain name or a literal IP address.
If it is an IPv4 address, it will then use String.split() to split the four parts.  String.split() uses regexes to do its work.

That means that if you are querying for hundreds or millions of addresses in a tight loop (as I've been doing), the JVM is spawning and compiling hundreds or millions of regex objects, in addition to a String array and four String objects per call.

So at first, I just worked around it by doing basic substringing instead of splitting.  That gave me about 100x performance improvement. But then I realized I was still generating four string objects for every call..

So I came up with this mapping method and it runs about 1000x faster with a near constant minimal memory footprint.

I pre-calculate a multidimensional array of shorts where each element is indexed by the literal character value - 48 of the digits making up the number 0 - 255.

With that array available, at run time, I can do a simple lookup of the short value and then do the math to get the long representation of the IP address.  I'm still generating a couple of references and a few intermediate int values, but the JIT optimizer can make quick work of that.

Linked is the test program I created to play with the different methods:  InetAddressParse test


Powered by ScribeFire.


  • 1

Re: Premature optimisation

I'm not sure what part of the conversations exactly you are referring to as potentially unnecessary, but I can describe a bit more of what I"m doing and maybe that will help clarify the situation I found myself in:

Java has an InetAddress class that can be used to do things like DNS lookups and also to establish a TCP/IP connection. My case was not really the most normal one. I was doing GeoIP lookups from raw IP addresses. Because of the fact that I was performing this work on millions of IP addresses, the inefficiency I describe above with the String.split() method became very apparent when I performed some performance and memory profiling.

As I mentioned above, my particular solution is not the correct way to deal with the general case, but it was the optimal way to handle my case. I think that the fact that I found this problem through performance testing and that I validated the worth of my changes through more testing would clear it of the possibility of "premature optimisation". Now I must admit, I'm not sure how many other people out there might be performing InetAddress lookups in tight loops so maybe there isn't any need for Sun to change their code.

As to the part about the OS doing it for you anyway, I believe that comes down to the fact that there are few things that Java can rely on the OS to do for it since in those cases, they have to have appropriate code and tests in place for each different OS that they want to support.

  • 1