Bug 859170 - Non-ready HA broker can be incorrectly promoted to primary
Non-ready HA broker can be incorrectly promoted to primary
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
Unspecified Unspecified
high Severity unspecified
: 2.3
: ---
Assigned To: Alan Conway
MRG Quality Engineering
: OtherQA
Depends On:
Blocks: 698367
  Show dependency treegraph
 
Reported: 2012-09-20 13:35 EDT by Jason Dillaman
Modified: 2013-03-19 12:38 EDT (History)
5 users (show)

See Also:
Fixed In Version: qpid-cpp-0.18-2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-19 12:38:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-4360 None None None 2012-10-05 12:00:17 EDT

  None (edit)
Description Jason Dillaman 2012-09-20 13:35:30 EDT
Description of problem:
rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state.  This can result in loss of messages and broker configuration.  Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:

Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd[10871]: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
100%

Steps to Reproduce:
1. Start a primary and backup broker
2. Inject messages into the primary and ensure messages replicate to backup
3. Restart the primary broker and manually re-promote to primary
  
Actual results:
Restarted broker becomes primary

Expected results:
Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout
Comment 1 Alan Conway 2012-09-21 08:38:23 EDT
Have you also seen this problem when using rgmanager? 
If so was the failing node rebooted or just had qpidd restarted?
Comment 2 Jason Dillaman 2012-09-21 11:08:59 EDT
Yes, I have run into this problem w/o running the manual steps above.  We encounter a lot of flapping during our test startup due to the sheer number of connections and queues being created.  This results in 'qpid-ha' timing out on its QMF query -- which results in rgmanager stopping the primary promotion service and relocating it.

Note You need to log in before you can comment on or make changes to this bug.