Saturday, October 4, 2008

Ehcache RMI Address Determination is Error Prone

I had to fix this problem with Ehcache at work. We use it with Hibernate in a clustered environment (more on that in another post). Recently we noticed that sometimes not all peers would discover each other.

The problem exists in all versions from 1.2.4 on, and is still unchanged in the current 1.5 release here, on line 166. Can you spot the problem?
return InetAddress.getLocalHost().getHostAddress()
Turns out Java doesn't specify exactly what this means on a machine with multiple network interfaces, and the result can vary between machines with similar hardware and operating system (as we found).

one of our cluster nodes had a setup that returned an IPv6 interface first, the loopback interface second, and an external IPv4 interface third. Java returned the loopback address from the above method.

Discussion of this Java issue can be found in many threads on the web. The solution wasn't available until java 1.4, but since almost no one has to use anything less any more, it should be available to most applications. You need to implement your own subclass of net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory overriding doCreateCachePeerListener to return a new instance of a subclass of net.sf.ehcache.distribution.RMICacheManagerPeerListener which overrides calculateHostAddress to discover the local host address in a more flexible manner. Register the factory class in your ehcache.xml configuration file.

The new code can look something like this:
Enumeration interfaces;
try {
interfaces = NetworkInterface.getNetworkInterfaces();
} catch (SocketException e) {
throw new UnknownHostException("Error getting network interfaces: ".concat(e.getLocalizedMessage()));
}

if (interfaces == null) {
throw new UnknownHostException("No network interfaces found");
}

InetAddress addrToUse = null;

while(interfaces.hasMoreElements()) {
NetworkInterface i = interfaces.nextElement();
Enumeration addresses = i.getInetAddresses();
if (addresses == null) continue;

while(addresses.hasMoreElements()) {
InetAddress a = addresses.nextElement();
if (addrToUse == null && ! a instanceof Inet6Address) addrToUse = a;
}
}
if (addrToUse == null) {
throw new UnknownHostException("No IPv4 non-loopback address found for any interface.");
}
return addrToUse.getHostAddress();
This returns the String representation of the first IPv4, non-loopback address found. The assumption here is that Java will always return the interfaces and addresses in the same order, given the machine network settings haven't changed.

If you want to fall back on the loopback address, if found, modify the above code accordingly. I'll leave that as an exercise for the reader (oops, my math background is showing).

With this updated host lookup code our app didn't need different configuration files for each instance, and all nodes automatically discovered each other, even between machines with multiple network interfaces, such as modern rack servers with multiple NICs and automatic failover between them.

Eventually I'll post about all the pieces we use to run a single Java web application in multiple Tomcat instances on multiple servers with Hibernate and an L2 Ehcache on each one, without the overhead of Tomcat clustering or a full-blown application server. All nodes sharing the same database automatically discover eachother and keep their cache contents in sync without any special configuration on individual nodes. Pretty cool.

No comments: