Testing clustering with the 1.3 beta ...
- Applications and brokers all stopped
- start both brokers
- Perform a couple of failovers to verify nodes leave & join the cluster successfully
- start our application suite
- shutdown one broker
- some apps failover successfully, some do not (theory: those that use durable topic subscriptions do not survive)
- attempt to start the stopped broker. Cluster rejoin fails.
- Try once more, same failure.
- Shutdown the application suite, except for a single monitoring app.
- Start the stopped broker. This time it successfully rejoins the cluster.
- Perform a few more failovers between the two brokers (checking connectivity via the monitoring app)
- Shut down both brokers.
Between the applications (approx. 20) we make use of most features of qpid - fanout exchanges, LVQs, ring queues, durable topics, direct queues. All clients are java apps.
Created attachment 435551 [details]
logs & conf files for both nodes
There are errors in the logs relating to locked exclusive queues which I believe (as suggested above) relate to durable subscriptions from the JMS client. The sessions owning these queues are not detached at the point the clients failover.
E.g. first error in the log for amqb02 is at line 11912 (22:27:08), the session owning that queue doesn't get detached until line 69705 (22:30:49).
The failure to join appears to be down to an inconsistent error during update:
E.g. in the other log (for amqb01):
2010-07-29 22:32:11 critical cluster(10.34.22.64:26810 CATCHUP/error) local error 34184 did not occur on member 10.34.22.65:4830: resource-locked: Cannot grant exclusive access to queue firstname.lastname@example.org_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399)
2010-07-29 22:32:11 debug Exception constructed: local error did not occur on all cluster members : resource-locked: Cannot grant exclusive access to queue email@example.com_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399) (qpid/cluster/ErrorCheck.cpp:89)
2010-07-29 22:32:11 critical Error delivering frames: local error did not occur on all cluster members : resource-locked: Cannot grant exclusive access to queue firstname.lastname@example.org_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399) (qpid/cluster/ErrorCheck.cpp:89)
2010-07-29 22:32:11 notice cluster(10.34.22.64:26810 LEFT/error) leaving cluster intg1
I think failover is not relevant to the lines I indicated in the start of the last comment. The errors appear to be a result of an attempt to use the same durable subscription ids and occur before the node is shutdown.
Although the symptoms are different the cause is the same as bug 620418
*** This bug has been marked as a duplicate of bug 620418 ***