Wednesday, November 5, 2008

Calculating a Multicast Address for Cluster Partitioning

Say you want to deploy multiple instances of an application on multiple machines, all using a common data store.  Further, you want to let them all talk to maintian some sort of state.  But you also want to have development and test clusters in the same subnet environment.  One common mechanism for cluster membership announcements is IP multicast.  This is one method that can be used by JGroups, and is the default used by EHCache.

But what if you don't want to have to bother remembering to configure the same/different addresses on each node to keep your test and development clusters independent?  That is just overhead grunt work that is prone to error, and can lead to data corruption if done incorrectly.

A simpler, automated approach is to calculate the multicast address on the fly, based on the configuration information you already have to configure by hand for the common data store.

For example, my company uses the JDBC URL for a MySQL database, combined with a user name and password for accessing it.  These are pieces of information necessary for each application node anyway, so using them to generate the multicast address removes the need to synchronize another configuration parameter.  In the case of EHCache with Hibernate, the multicast configuration parameters would even need to be in a different file than your main application configuration.  Too messy.

My assumption here will be that this is a cluster inside a private network, so the multicast address will come from the Administratively Scoped address space.  

This technique only varies the final two segments of the IP address.  I'll use the site-local scoped range of 239.555.*.* as my starting point, but any initial two-segment value in the administrative scope will work as a base address.

In Java, a String hash code is represented by an int, positive or negative.  The maximum positive value of an int is 2,147,483,647 (Integer.MAX_VALUE).  The address space available for site-local multicast is
255 * 255 * 65535 + 255 * 65535 + 65535 = 4,278,190,335 > 2,147,483,647
segment 3               segment 2       port
So, taking the absolute value of a String hash (with a sanity check for Integer.MIN_VALUE, as abs(MIN_VALUE) returns MIN_VALUE - see the JavaDoc) will be a fairly unique value.  Hash codes aren't truely unique, of course, but if you examine the String.hashcode() implementation, you will see that no two values with anything approaching similar contents or length can return the same hash code.  So unles you choose starting String values that may be wildly different both in content and length, this value should work just fine for ensuring unique multicast addresses.

Now for the mapping math.
int hash = "config-with-DBHost/Port/User/Pwd".hashcode();

// hash modulo max port
int port = hash % 65535

//hash divided by max port modulo max segment
int segment4 = hash / 65535 % 255

//hash divided by max port divided by max segment
int segment 3 = hash / 65535 / 255
Put the address together like this:
base.address.segment3.segment4
See how easy that was?  The key is to make sure the string you are using for your hash is the same on all nodes you want to share a given multicast address/port combination.  That means if something like database host is part of the string, make sure it isn't "localhost" on one node and a DNS name on all the others - that obviously will result in different hashes.