Bug 619759 - cluster failover issues
cluster failover issues
Status: CLOSED DUPLICATE of bug 620418
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
beta
All Linux
urgent Severity medium
: 1.3
: ---
Assigned To: Alan Conway
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-30 09:16 EDT by Graham Biswell
Modified: 2014-08-14 21:43 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-08-03 11:40:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs & conf files for both nodes (3.60 MB, application/x-tar)
2010-07-30 09:23 EDT, Graham Biswell
no flags Details

  None (edit)
Description Graham Biswell 2010-07-30 09:16:59 EDT
Testing clustering with the 1.3 beta ...

- Applications and brokers all stopped
- start both brokers
- Perform a couple of failovers to verify nodes leave & join the cluster successfully
- start our application suite
- shutdown one broker
- some apps failover successfully, some do not (theory: those that use durable topic subscriptions do not survive)
- attempt to start the stopped broker. Cluster rejoin fails.
- Try once more, same failure.
- Shutdown the application suite, except for a single monitoring app.
- Start the stopped broker. This time it successfully rejoins the cluster.
- Perform a few more failovers between the two brokers (checking connectivity via the monitoring app)
- Shut down both brokers.

Between the applications (approx. 20) we make use of most features of qpid - fanout exchanges, LVQs, ring queues, durable topics, direct queues. All clients are java apps.
Comment 1 Graham Biswell 2010-07-30 09:23:13 EDT
Created attachment 435551 [details]
logs & conf files for both nodes
Comment 2 Gordon Sim 2010-07-30 09:35:37 EDT
There are errors in the logs relating to locked exclusive queues which I believe (as suggested above) relate to durable subscriptions from the JMS client. The sessions owning these queues are not detached at the point the clients failover.

E.g. first error in the log for amqb02 is at line 11912 (22:27:08), the session owning that queue doesn't get detached until line 69705 (22:30:49).

The failure to join appears to be down to an inconsistent error during update:

E.g. in the other log (for amqb01):

2010-07-29 22:32:11 critical cluster(10.34.22.64:26810 CATCHUP/error) local error 34184 did not occur on member 10.34.22.65:4830: resource-locked: Cannot grant exclusive access to queue _admin@amq.topic_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399)
2010-07-29 22:32:11 debug Exception constructed: local error did not occur on all cluster members : resource-locked: Cannot grant exclusive access to queue _admin@amq.topic_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399) (qpid/cluster/ErrorCheck.cpp:89)
2010-07-29 22:32:11 critical Error delivering frames: local error did not occur on all cluster members : resource-locked: Cannot grant exclusive access to queue _admin@amq.topic_98760bd9-072f-4b25-91c3-42c4ab65a169 (qpid/broker/SessionAdapter.cpp:399) (qpid/cluster/ErrorCheck.cpp:89)
2010-07-29 22:32:11 notice cluster(10.34.22.64:26810 LEFT/error) leaving cluster intg1
Comment 3 Gordon Sim 2010-07-30 13:44:04 EDT
I think failover is not relevant to the lines I indicated in the start of the last comment. The errors appear to be a result of an attempt to use the same durable subscription ids and occur before the node is shutdown.
Comment 4 Alan Conway 2010-08-03 11:40:55 EDT
Although the symptoms are different the cause is the same as bug 620418

*** This bug has been marked as a duplicate of bug 620418 ***

Note You need to log in before you can comment on or make changes to this bug.