Bug 860018 - Possible message loss if a cluster is partitioned
Possible message loss if a cluster is partitioned
Status: NEW
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Alan Conway
MRG Quality Engineering
Depends On:
  Show dependency treegraph
Reported: 2012-09-24 11:35 EDT by Alan Conway
Modified: 2013-02-22 13:18 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Reproducer (120 bytes, application/gzip)
2012-10-04 06:49 EDT, Pavel Moravec
no flags Details

  None (edit)
Description Alan Conway 2012-09-24 11:35:31 EDT
Description of problem: 

If there is a partition of a cluster, it is theoretically possible that some messages sent by a client of an inquorate broker could be lost because of mis-match between cman and corosync status.

Version-Release number of selected component (if applicable): 0.18

How reproducible: has never been observed, this is a theoretical bug.

Steps to Reproduce: Unknown
Actual results: message loss

Expected results: no message loss

Additional info:

Qpidd monitors the cman for quorum changes and shuts the broker down if it becomes inquorate. This is intended to prevent message loss by forcing clients to fail over to a healthy broker and replay their un-acknowledged messages. 

However there is a possible race condition between when qpidd checks the quorum status with cpg and when it multicasts messages to corosync.

Each time a broker joins or leaves the cluster, the cluster is considered to be in a new configuration. Each configuration is identified by a sequence number called the ring-id. Although CPG and corosync are dealing with the same cluster, they update their cluster status independently. As qpidd is currently coded, it's possible for it to see a cpg status from an older indicating a quorate configuration but to send corosync messages to a newer inquorate configuration.

In order to be sure not to send messages to an inquorate cluster, qpidd needs to check before each mcast that the cman and corosync ring-ids are the same AND cman indicates is quorate. If not, qpidd needs to wait till the sequence numbers converge before mcasting anything.

The fix should be reasonably straightforward, but testing will probably be very difficult. I'm not sure how the problem could be reproduced.
Comment 2 Pavel Moravec 2012-10-04 06:49:29 EDT
Created attachment 621562 [details]

A "weak" reproducer - using the script test_bz860018.sh, I was able to very few times to get message loss (once per many hours of run) and/or message duplicity (twice per the same time).

The repro simply runs qpid-send and qpid-receive (with message loss&duplicity checks on) against a broker where network failure is emulated.

The network failure is emulated following https://access.redhat.com/knowledge/solutions/79523 where it is dropped the whole traffic on the eth.interface used by corosync+cman (note, one needs to run this test on a machine with 2 NICs to keep AMQP traffic passing).

The reproducer has two flaws:
1) It takes ages to detect and recover from a split-brain. Usually, node reboot is required to un-fence. The script somehow mimics this just without reboots.

2) Message loss or duplicity is seen quite rarely, needs to run for a long time to verify possible fix.

Note You need to log in before you can comment on or make changes to this bug.