Red Hat Bugzilla – Bug 859170
Non-ready HA broker can be incorrectly promoted to primary
Last modified: 2013-03-19 12:38:28 EDT
Description of problem:
rgmanager can promote a non-ready backup HA broker to primary when other backup brokers are available in the ready state. This can result in loss of messages and broker configuration. Additionally, this can cause the previously ready backups to throw exceptions when connecting to the new primary:
Sep 20 10:17:18 itcm12 qpidd: 2012-09-20 10:17:18 [HA] critical Backup queue Queue1: Replication failed: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd: 2012-09-20 10:17:18 [Protocol] error Unexpected exception: Invalid position move, preceeds messages
Sep 20 10:17:18 itcm12 qpidd: 2012-09-20 10:17:18 [Broker] error Connection 10.3.100.12:43837-10.3.100.105:9006 closed by error: Invalid position move, preceeds messages(501)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start a primary and backup broker
2. Inject messages into the primary and ensure messages replicate to backup
3. Restart the primary broker and manually re-promote to primary
Restarted broker becomes primary
Restarted broker refuses to become primary since at least one ready backup was discovered within some timeout
Have you also seen this problem when using rgmanager?
If so was the failing node rebooted or just had qpidd restarted?
Yes, I have run into this problem w/o running the manual steps above. We encounter a lot of flapping during our test startup due to the sheer number of connections and queues being created. This results in 'qpid-ha' timing out on its QMF query -- which results in rgmanager stopping the primary promotion service and relocating it.