Red Hat Bugzilla – Bug 464020
recieved flag not set properly in commit token results in lost messages.
Last modified: 2016-04-26 10:10:11 EDT
Description of problem:
If a commit token is created such sa:
node=1 aru=1 ringid=4
node=2 aru=1ef ringid=8
node=3 aru=1fb ringid=8
node=4 aru=1fe ringid=8
What should happen is node 4 should resend all messages from the lowest aru 1ef to the highest aru 1fe. It does this through the setting of a received flag in the commit token. Today this received flag is not always set properly.
What happens now is that node2 will not be delivered messages 1fa-1fe, node 2 will not be delivered messages 1fb-1fe. This results in message loss and possible corruption of information multicast when using services like CPG or EVS.
Version-Release number of selected component (if applicable):
more reproducible with a larger cluster, but requires manual inspection of the commit tokens. The keys to reproduction are that every node must be sending traffic and there must be atleast 4 nodes with 1 node being killed/restarted.
Could result in a segfault, but I'm not certain about this. Does not fix the checkpoint bug.
Definately violates EVS.
Steps to Reproduce:
messages are lost.
messages should not be lost.
patch to fix problem in hand and has passed Andrew Beekhof's crm testing suite which verifies messages are correctly sent for 500 iterations including node kills/restarts.