Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 605313 - isolating a node and then removing the isolation at 30 seconds causes a protocol meltdown 15% of time
isolating a node and then removing the isolation at 30 seconds causes a proto...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync (Show other bugs)
6.0
All Linux
high Severity urgent
: rc
: ---
Assigned To: Steven Dake
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-17 11:58 EDT by David Teigland
Modified: 2016-04-26 10:13 EDT (History)
5 users (show)

See Also:
Fixed In Version: corosync-1.2.3-11.el6
Doc Type: Bug Fix
Doc Text:
Isolating a node and then removing the isolation potentially resulted in membership misbehavior.
Story Points: ---
Clone Of:
: 610334 611129 (view as bug list)
Environment:
Last Closed: 2010-11-10 17:07:11 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer for corosync (1.17 KB, application/x-gzip)
2010-07-03 15:31 EDT, Steven Dake
no flags Details
flatiron revision 2985 to fix the problem (3.66 KB, patch)
2010-07-03 18:02 EDT, Steven Dake
no flags Details | Diff

  None (edit)
Description David Teigland 2010-06-17 11:58:53 EDT
Description of problem:

Nate hit this running laryngitis.
cluster members 4,5,6,7,8
cpg members 4,5,6,7,8
followed by:


Jun 16 15:47:06 west-04 qarshd[17048]: Running cmdline: /sbin/iptables -D OUTPUT
 -m state --state NEW --proto udp --dport 5405 -j DROP

Jun 16 15:47:06 west-04 qarshd[17050]: Running cmdline: /sbin/iptables -D OUTPUT
 -m state --state NEW --proto udp --dport 5404 -j DROP


Cluster membership looks fine:

Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[1]: 4
Jun 16 15:47:06 west-04 corosync[16944]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[2]: 4 5
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[2]: 4 5
Jun 16 15:47:06 west-04 corosync[16944]:   [CMAN  ] quorum regained, resuming activity
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] This node is within the primary component and will provide service.
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[3]: 4 5 6
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[3]: 4 5 6
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[4]: 4 5 6 7
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[4]: 4 5 6 7
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-04 corosync[16944]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-04 corosync[16944]:   [MAIN  ] Completed service synchronization, ready to provide service.

Jun 16 15:47:05 west-05 corosync[15640]:   [QUORUM] Members[4]: 5 6 7 8
Jun 16 15:47:05 west-05 corosync[15640]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:05 west-05 corosync[15640]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:05 west-05 fenced[15660]: fencing node west-04
Jun 16 15:47:05 west-05 fenced[15660]: fence west-04 success
Jun 16 15:47:06 west-05 corosync[15640]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:06 west-05 corosync[15640]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-05 corosync[15640]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-05 corosync[15640]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:06 west-05 fenced[15660]: cpg_mcast_joined error 12 handle 1d4ed43b00000001 start

Jun 16 15:47:05 west-06 corosync[10945]:   [QUORUM] Members[4]: 5 6 7 8
Jun 16 15:47:05 west-06 corosync[10945]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:05 west-06 corosync[10945]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:05 west-06 fenced[10965]: fencing deferred to west-05
Jun 16 15:47:06 west-06 corosync[10945]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:06 west-06 corosync[10945]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-06 corosync[10945]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-06 corosync[10945]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:06 west-06 fenced[10965]: cpg_mcast_joined error 12 handle 1d4ed43b00000001 start

Jun 16 15:47:05 west-07 corosync[31744]:   [QUORUM] Members[4]: 5 6 7 8
Jun 16 15:47:05 west-07 corosync[31744]:   [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Jun 16 15:47:05 west-07 corosync[31744]:   [MAIN  ] Completed service synchronization, ready t
o provide service.
Jun 16 15:47:05 west-07 fenced[31763]: fencing deferred to west-05
Jun 16 15:47:06 west-07 corosync[31744]:   [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Jun 16 15:47:06 west-07 corosync[31744]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-07 corosync[31744]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-07 corosync[31744]:   [MAIN  ] Completed service synchronization, ready t
o provide service.
Jun 16 15:47:06 west-07 fenced[31763]: cpg_mcast_joined error 12 handle 1d4ed43b00000001 start

Jun 16 15:47:05 west-08 corosync[10623]:   [QUORUM] Members[4]: 5 6 7 8
Jun 16 15:47:05 west-08 corosync[10623]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:05 west-08 corosync[10623]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:05 west-08 fenced[10643]: fencing deferred to west-05
Jun 16 15:47:06 west-08 corosync[10623]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 16 15:47:06 west-08 corosync[10623]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-08 corosync[10623]:   [QUORUM] Members[5]: 4 5 6 7 8
Jun 16 15:47:06 west-08 corosync[10623]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 16 15:47:06 west-08 fenced[10643]: cpg_mcast_joined error 12 handle 1d4ed43b00000001 start


cpg membership is wrong:

[root@west-05 ~]# cat fenced.txt | grep 'fenced:'
1276717590 cpg_join fenced:daemon ...
1276717590 fenced:daemon conf 2 1 0 memb 4 5 join 5 left
1276717590 fenced:daemon ring 4:3652 5 memb 4 5 6 7 8
1276717590 cpg_join fenced:default ...
1276717590 fenced:daemon conf 3 1 0 memb 4 5 6 join 6 left
1276717590 fenced:daemon conf 4 1 0 memb 4 5 6 8 join 8 left
1276717590 fenced:daemon conf 5 1 0 memb 4 5 6 7 8 join 7 left
1276717590 fenced:default ring 4:3652 5 memb 4 5 6 7 8
1276717590 fenced:default conf 2 1 0 memb 4 5 join 5 left
1276717591 fenced:default conf 3 1 0 memb 4 5 7 join 7 left
1276717591 fenced:default conf 4 1 0 memb 4 5 6 7 join 6 left
1276717591 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 8 left
1276717625 fenced:daemon conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:daemon ring 5:3656 4 memb 5 6 7 8
1276717625 fenced:default ring 5:3656 4 memb 5 6 7 8
1276717626 fenced:daemon conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:default conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:daemon ring 4:3664 5 memb 4 5 6 7 8
1276717626 fenced:default ring 4:3664 5 memb 4 5 6 7 8
1276718259 fenced:daemon ring 5:3668 4 memb 5 6 7 8

m=4,5,6,7,8
-4
-5

[root@west-06 ~]# cat fenced.txt | grep 'fenced:'
1276717590 cpg_join fenced:daemon ...
1276717590 fenced:daemon conf 3 1 0 memb 4 5 6 join 6 left
1276717590 fenced:daemon ring 4:3652 5 memb 4 5 6 7 8
1276717590 fenced:daemon conf 4 1 0 memb 4 5 6 8 join 8 left
1276717590 fenced:daemon conf 5 1 0 memb 4 5 6 7 8 join 7 left
1276717591 cpg_join fenced:default ...
1276717591 fenced:default conf 4 1 0 memb 4 5 6 7 join 6 left
1276717591 fenced:default ring 4:3652 5 memb 4 5 6 7 8
1276717591 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 8 left
1276717625 fenced:daemon conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:daemon ring 5:3656 4 memb 5 6 7 8
1276717625 fenced:default ring 5:3656 4 memb 5 6 7 8
1276717626 fenced:daemon conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:daemon conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:default conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:default conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:daemon ring 4:3664 5 memb 4 5 6 7 8
1276717626 fenced:default ring 4:3664 5 memb 4 5 6 7 8
1276718259 fenced:default ring 5:3668 4 memb 5 6 7 8
1276718259 fenced:daemon ring 5:3668 4 memb 5 6 7 8

m=4,5,6,7,8
-4
-5
-6

[root@west-07 ~]# cat fenced.txt | grep 'fenced:'
1276717590 cpg_join fenced:daemon ...
1276717590 fenced:daemon conf 5 1 0 memb 4 5 6 7 8 join 7 left
1276717590 fenced:daemon ring 4:3652 5 memb 4 5 6 7 8
1276717591 cpg_join fenced:default ...
1276717591 fenced:default conf 3 1 0 memb 4 5 7 join 7 left
1276717591 fenced:default ring 4:3652 5 memb 4 5 6 7 8
1276717591 fenced:default conf 4 1 0 memb 4 5 6 7 join 6 left
1276717591 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 8 left
1276717625 fenced:daemon conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:daemon ring 5:3656 4 memb 5 6 7 8
1276717625 fenced:default ring 5:3656 4 memb 5 6 7 8
1276717626 fenced:daemon conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:daemon conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:daemon conf 1 0 1 memb 8 join left 7
1276717626 fenced:default conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:default conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:default conf 1 0 1 memb 8 join left 7
1276717626 fenced:daemon ring 4:3664 5 memb 4 5 6 7 8
1276717626 fenced:default ring 4:3664 5 memb 4 5 6 7 8
1276718259 fenced:daemon ring 5:3668 4 memb 5 6 7 8
1276718259 fenced:default ring 5:3668 4 memb 5 6 7 8

m=4,5,6,7,8
-4
-5
-6
-7


[root@west-08 ~]# cat fenced.txt | grep 'fenced:'
1276717590 cpg_join fenced:daemon ...
1276717590 fenced:daemon conf 4 1 0 memb 4 5 6 8 join 8 left
1276717590 fenced:daemon ring 4:3652 5 memb 4 5 6 7 8
1276717590 fenced:daemon conf 5 1 0 memb 4 5 6 7 8 join 7 left
1276717591 cpg_join fenced:default ...
1276717591 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 8 left
1276717591 fenced:default ring 4:3652 5 memb 4 5 6 7 8
1276717625 fenced:daemon conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default ring 5:3656 4 memb 5 6 7 8
1276717625 fenced:daemon ring 5:3656 4 memb 5 6 7 8
1276717626 fenced:daemon conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:daemon conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:daemon conf 1 0 1 memb 8 join left 7
1276717626 fenced:daemon conf 0 0 1 memb join left 8
1276717626 fenced:default conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:default conf 2 0 1 memb 7 8 join left 6
1276717626 fenced:default conf 1 0 1 memb 8 join left 7
1276717626 fenced:default conf 0 0 1 memb join left 8
1276717626 fenced:daemon ring 4:3664 5 memb 4 5 6 7 8
1276717626 fenced:default ring 4:3664 5 memb 4 5 6 7 8
1276718259 fenced:daemon ring 5:3668 4 memb 5 6 7 8
1276718259 fenced:default ring 5:3668 4 memb 5 6 7 8

m=4,5,6,7,8
-4
-5
-6
-7
-8


Version-Release number of selected component (if applicable):

# rpm -q corosync
corosync-1.2.3-1.el6.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 2 RHEL Product and Program Management 2010-06-17 12:13:23 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 3 Steven Dake 2010-06-17 13:45:55 EDT
Angus,

Is this fixed upstream by your recent cpg work?
Comment 4 Angus Salkeld 2010-06-17 22:21:24 EDT
Nice test name :)

I suspect it is but it is not super clear to me what is happening above.
We might still be sensitive to removing the lowest nodeid on flatiron.

David can you confirm my understanding of the output/problem below?
1) what is "fenced:daemon" and "fenced:default"?
2) what message is coming from what node?

<all joined, 4 leaving> - {all ok}
1276717625 fenced:daemon conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4
1276717625 fenced:daemon ring 5:3656 4 memb 5 6 7 8
1276717625 fenced:default ring 5:3656 4 memb 5 6 7 8

<5 leaving> - {all ok}
1276717626 fenced:daemon conf 3 0 1 memb 6 7 8 join left 5
1276717626 fenced:default conf 3 0 1 memb 6 7 8 join left 5

Problem:
node 4 thinks there are still 5 members? {should be 1 - node 4 is isolated}
node 5 thinks there are still 4 members? {should be 1 - node 5 is isolated}

1276717626 fenced:daemon ring 4:3664 5 memb 4 5 6 7 8
1276717626 fenced:default ring 4:3664 5 memb 4 5 6 7 8
1276718259 fenced:daemon ring 5:3668 4 memb 5 6 7 8

m=4,5,6,7,8
-4
-5

Angus
Comment 5 David Teigland 2010-06-18 11:13:34 EDT
fenced:daemon and fenced:default are the names of two cpgs used by fenced.
fenced:foo conf ... is the confchg callback for cpg fenced:foo.
fenced:foo ring ... is the ringid callback for cpg fenced:foo.
There are no messages in the logs I pasted.

conf <memb_count> <join_count> <left_count> memb <memb_ids> join <join_ids> left <left_ids>

ring <ringid> <memb_count> memb <memb_ids>

Only node 4 was partitioned/merged.  The correct output on nodes 5-8 would be:

1276717591 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 8 left (starting state)
1276717625 fenced:default conf 4 0 1 memb 5 6 7 8 join left 4   (-4, partition)
1276717626 fenced:default conf 5 1 0 memb 4 5 6 7 8 join 4 left (+4, merge)

and at this point nodes 5-8 would kill node 4, so they'd see another:
?????????? fenced:default conf 4 0 1 memb 5 6 7 8 join left 4   (-4, killed) 

The ringid callbacks appear to be fine except that node 5 is missing:
1276718259 fenced:default ring 5:3668 4 memb 5 6 7 8
Comment 6 Steven Dake 2010-06-19 21:20:43 EDT
Is there a test case for this problem from QE?
Comment 7 Nate Straz 2010-06-21 09:38:00 EDT
Yes, laryngitis reproduces this easily.
Comment 9 Steven Dake 2010-06-22 14:44:46 EDT
Nate reproduced with corosync-1.2.3-4.el6 using laryngitis.  This version has all the fixes for cpg issues we currently are aware of.

Angus can you look into fixing in laryngitis environment.
Comment 16 Steven Dake 2010-07-02 02:05:20 EDT
When isolating a node via iptables for more then 30 seconds and then removing that isolation, the totem membership becomes unstable 15% of the time.

token = 10 sec
consensus = 20 sec

boundary = 30 seconds (token+consensus)
Comment 18 Steven Dake 2010-07-03 15:29:44 EDT
Based upon Nate's test, I've distilled this down to a faster reproducer with only corosync (and attached the code).  To reproduce, run a 3 node cluster with the attached reproduce.tar.gz config file.  On one of the nodes, run the loop script in the reproducer directory.  This script loops isolating and removing isolation from one of the nodes.  On the remaining two nodes, apply the "force-exit.patch" which triggers corosync to exit when the configuration melts down.

takes about 45 secs to 1 minute to trigger a meltdown.
Comment 19 Steven Dake 2010-07-03 15:31:56 EDT
Created attachment 429290 [details]
reproducer for corosync
Comment 20 Steven Dake 2010-07-03 16:07:50 EDT
Based upon instrumentation of the membership protocol, the Totem specification corosync implements is in error for a particular case of its membership algorithm.  Specifically page 17 figure 4:

"
Join message from processor q received:
do some work to process the join mesage
...
if my_id in message.fail_set then
  add message.sender_id to my_fail_set
else 
  merge message.fail_set into my_fail_set


The purpose of this code is to allow two processors to see each other as faulty and form singleton rings.  Unfortunately the specification does not account for the fact that there may be other members that receive the same message and execute the else branch when in fact the fail set is invalid for that current membership.  This doesn't occur all the time because sometimes the if branch is sometimes executed first.  Since there is no synchrony in the membership algorithm different timings can expose this problem in the protocol.

Here is an example:
partition 1 my_proc_list = A
partition 2 my_proc_list = B,C

partition 2 is in gather state
partition 1 is in gather state
isolation removed
B sends join message with proc set=A,B,C, fail set=A
A receives join message.  my_id is subset of fail set so it adds B to its fail set (ie executes the if part of the branch)
C receives join message (executes else part of branch)
C,B agree on consensus, and form a configuration
Sometime before A receives a join message from C (which would put it in consensus and trigger it to start a new membership round):
A sends a join message with proc set A,B,C fail set=B
C receives this message (while in operational)
C adds B (which it is happily communicating with at the moment) to the failed set (then part of conditional).
C sends join message to A,B,C proc set=A,B,C fail set=B
A receives fail set=B, adding B to fail set
B receives C's join message(proc=a,b,c fail=b) and adds C to fail set (if part of branch)
A sends join message (proc=a,b,c fail=b)
B receives A's join message, which adds A to fail set (if part of conditional)

At this point, every processor ends with a singleton ring and more confusion occurs in the totem membership protocol after this mess.

The proper solution is to use the last known good membership when making a decision to add a failed processor to the failed list.
Comment 23 Steven Dake 2010-07-03 18:02:04 EDT
Created attachment 429298 [details]
flatiron revision 2985 to fix the problem
Comment 24 David Teigland 2010-07-06 12:17:19 EDT
I'm trying to understand how the bug described in comments 18-23 relates to what I first reported.  In the original description, the totem membership appeared to be ok (see [QUORUM] and ringid messages), but the cpg membership was bad.  But maybe the appearance is misleading?  The most recent tests don't include cpg, and only focus on totem errors.  Are there totem errors in the original bug that just aren't visible in the log messages?  I'm guessing it was the test in comment 19 that ran for 12 hours?  Have we tried the original test to verify that's fixed also?
Comment 25 Steven Dake 2010-07-06 15:19:02 EDT
Angus tried laryngitis for 100 iterations and it passes (previously failed at 5-7).  Yes dct, after investigation with larygitis, we found that there was a condition of totem entering a singleton node configuration which broke the rest of the stack.  Perhaps cpg should also behave better in this condition.
Comment 28 Nate Straz 2010-08-24 15:10:06 EDT
Ran through laryngitis just fine with corosync-1.2.3-19.el6.
Comment 29 releng-rhel@redhat.com 2010-11-10 17:07:11 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.
Comment 30 Douglas Silas 2011-01-11 18:12:59 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Isolating a node and then removing the isolation potentially resulted in membership misbehavior.

Note You need to log in before you can comment on or make changes to this bug.