Bug 860018
| Summary: | Possible message loss if a cluster is partitioned | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Alan Conway <aconway> | ||||
| Component: | qpid-cpp | Assignee: | messaging-bugs <messaging-bugs> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 2.2 | CC: | jross, mcressma | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-11-27 23:15:45 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Alan Conway
2012-09-24 15:35:31 UTC
Created attachment 621562 [details] Reproducer A "weak" reproducer - using the script test_bz860018.sh, I was able to very few times to get message loss (once per many hours of run) and/or message duplicity (twice per the same time). The repro simply runs qpid-send and qpid-receive (with message loss&duplicity checks on) against a broker where network failure is emulated. The network failure is emulated following https://access.redhat.com/knowledge/solutions/79523 where it is dropped the whole traffic on the eth.interface used by corosync+cman (note, one needs to run this test on a machine with 2 NICs to keep AMQP traffic passing). The reproducer has two flaws: 1) It takes ages to detect and recover from a split-brain. Usually, node reboot is required to un-fence. The script somehow mimics this just without reboots. 2) Message loss or duplicity is seen quite rarely, needs to run for a long time to verify possible fix. |