Red Hat Bugzilla – Bug 780229
Provide alternative registry implementation
Last modified: 2013-01-23 05:28:02 EST
Help Desk Ticket Reference: https://na7.salesforce.com/500A00000045LkR
The issue here is: in a cluster environment, after a malfunction on JBoss (e.g. JVM crash, erroneous deployment etc) the registry remains with invalid endpoints and the ESB has an erroneous behavior by using those endpoints. Even your removeDeadEPR property is unusable, because as stated in the documentation the "the end-point reference for a service that is [...] slow to respond may, inadvertently, be removed from the registry by mistake".
Since we have a cluster and to manually clean the DB is not just to drop the tables and restart the server, this is a major issue for us, because it can lead to erroneous service calls if a server node crashes.
Changed to a feature request to replace jUDDI as the registry implementation.
Investigation needed. See Kevin's comment.
Link: Added: This issue relates to JBESB-3629
When using jUDDI v3, it makes much more sense to use user defined keys, and have part of the binding key be the host and port name. This way the existing binding keys are simply reused rather then a new set being created when the servers come back up. Also the new jUDDI v3.1 client ships with a service cache, which is updated by a subscription (when a service or endpoints change or are removed the client caches are informed). This is using standard UDDI v3 functionality so should work with any v3 registry and makes the registry much more dynamic.
This reduces the problem to removing dead endpoint information of nodes that are really gone. I would think that a 3 'strikes you're out' policy in the ServiceInvoker should do the trick, that with an optional periodic query ("am I in the registry") - if not, reregistration to bring back services that could not be seen due to for example temporary network issues.
"The real solution is to have a 'service repository' that reacts dynamically to cluster members appearing/disappearing" if you have this info then you can do registration/unregistration right there..
My 2 cents.
It is not the client behaviour that is the issue here, rather it is the missing 'dead endpoint information' of the nodes that have disappeared. The 'three strikes' doesn't work given the number of times this is raised by customers.
Having an alternate implementation which can react to the topology changes is preferable.
Sorry, I should clarify and say that the 'three strikes' wouldn't work given that customers are unhappy with the current implementation (possible removal after one).