Bug 472786

Summary: cluster view inconsistent after "service cman stop; service cman start"
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: urgent    
Version: 5.3CC: cfeist, cluster-maint, cward, edamato, jplans, matt, mrappa, rlerch, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cman-2.0.100-1.el5 Doc Type: Bug Fix
Doc Text:
Cause: When a node leaves the cluster normally it sends a message to other nodes that set its state to LEAVING. It is only when the node actually disappears from openais that the state is set to DOWN. Consequence: If the node is restarted quickly then the node UP message arrives before the expected node down message (which gets cancelled). But cman only looks for DOWN nodes when marking nodes as back up again. So the node appears to stay DOWN. Fix: The check for a node transition to the UP state now also checks for nodes in LEAVING as well as DOWN states. Result: Quickly restarting a node using cman_tool leave; cman_tool join correctly updates the node state in cman.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:06:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 510510    
Attachments:
Description Flags
/var/log/messages from all marathon nodes none

Description Nate Straz 2008-11-24 16:38:31 UTC
Created attachment 324502 [details]
/var/log/messages from all marathon nodes

Description of problem:

After quickly stopping and starting the cman service on one node in a cluster, the cluster membership becomes inconsistent across the cluster.

Version-Release number of selected component (if applicable):
cman-2.0.95-1.el5
openais-0.80.3-19.el5


How reproducible:
100%

Steps to Reproduce:
1. Start a cluster
2. On one node run `service cman stop && service cman start`
3. Check cman_tool nodes on all cluster nodes
  
Actual results:

After running the command on marathon-01, marathon-03 and marathon-05 in sequence:

marathon-01:
Node  Sts   Inc   Joined               Name
   1   M   5336   2008-11-24 09:47:33  marathon-01
   2   M   5340   2008-11-24 09:47:33  marathon-02
   3   X   5340                        marathon-03
   4   M   5340   2008-11-24 09:47:33  marathon-04
   5   X   5340                        marathon-05
marathon-02:
Node  Sts   Inc   Joined               Name
   1   X   5332                        marathon-01
   2   M   5312   2008-11-24 09:33:13  marathon-02
   3   X   5324                        marathon-03
   4   M   5328   2008-11-24 09:33:14  marathon-04
   5   X   5320                        marathon-05
marathon-03:
Node  Sts   Inc   Joined               Name
   1   M   5348   2008-11-24 10:12:44  marathon-01
   2   M   5348   2008-11-24 10:12:44  marathon-02
   3   M   5344   2008-11-24 10:12:44  marathon-03
   4   M   5348   2008-11-24 10:12:44  marathon-04
   5   X   5348                        marathon-05
marathon-04:
Node  Sts   Inc   Joined               Name
   1   X   5332                        marathon-01
   2   M   5328   2008-11-24 09:33:18  marathon-02
   3   X   5328                        marathon-03
   4   M   5316   2008-11-24 09:33:17  marathon-04
   5   X   5328                        marathon-05
marathon-05
Node  Sts   Inc   Joined               Name
   1   M   5356   2008-11-24 10:14:55  marathon-01
   2   M   5356   2008-11-24 10:14:55  marathon-02
   3   M   5356   2008-11-24 10:14:55  marathon-03
   4   M   5356   2008-11-24 10:14:55  marathon-04
   5   M   5352   2008-11-24 10:14:54  marathon-05


marathon-01:
Version: 6.1.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 27036
Cluster Member: Yes
Cluster Generation: 5356
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 3
Quorum: 3  
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: marathon-01
Node ID: 1
Multicast addresses: 239.192.105.6 
Node addresses: 10.15.89.71 

marathon-02:
Version: 6.1.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 27036
Cluster Member: Yes
Cluster Generation: 5356
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 2
Quorum: 3 Activity blocked
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: marathon-02
Node ID: 2
Multicast addresses: 239.192.105.6 
Node addresses: 10.15.89.72 

marathon-03:
Version: 6.1.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 27036
Cluster Member: Yes
Cluster Generation: 5356
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 4
Quorum: 3  
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: marathon-03
Node ID: 3
Multicast addresses: 239.192.105.6 
Node addresses: 10.15.89.73 

marathon-04:
Version: 6.1.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 27036
Cluster Member: Yes
Cluster Generation: 5356
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 2
Quorum: 3 Activity blocked
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: marathon-04
Node ID: 4
Multicast addresses: 239.192.105.6 
Node addresses: 10.15.89.74 

marathon-05:
Version: 6.1.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 27036
Cluster Member: Yes
Cluster Generation: 5356
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 7
Flags: Dirty 
Ports Bound: 0  
Node name: marathon-05
Node ID: 5
Multicast addresses: 239.192.105.6 
Node addresses: 10.15.89.75 


Expected results:
Cluster membership should be consistent even after a quick
stop & start of services.

Additional info:

Comment 1 Nate Straz 2008-11-25 17:02:35 UTC
I was able to easily reproduce this on RHEL 5.2.  You can use "service cman restart" instead of the compound command.

Comment 2 Christine Caulfield 2008-12-03 10:50:19 UTC
This is a bug that was fixed quite some time ago in STABLE2 but the patch never got into RHEL. It has now:

commit 6325f9d1d135d2a86974a3ffc36f62d0693080d1
Author: Christine Caulfield <ccaulfie>
Date:   Wed Dec 3 10:46:20 2008 +0000

    cman: Fix inconsistent state if a node leaves/joins quickly

This is in git for 5.4 - do we want a 5.3 patch too ? It's not on the blocker list but it's a pretty stupid bug with a small fix.

Comment 4 Matthew Kent 2009-01-27 17:16:04 UTC
Glad I found this to verify I wasn't going crazy. 2 days fighting with this.

Seeing this same issue as described here and can reproduce it in my simple 4 node cluster. The twist I'm seeing is when you add in clvmd via

service clvmd stop; service cman stop; service cman start; service clvmd start

and it becomes a real mess.

The node that was removed and rejoined looks good

Node  Sts   Inc   Joined               Name
   1   M   1344   2009-01-27 08:30:24  www
   2   M   1352   2009-01-27 08:30:27  xxx
   3   M   1348   2009-01-27 08:30:26  yyy
   4   M   1340   2009-01-27 08:30:24  zzz

but the three remaining nodes not so much..

Node  Sts   Inc   Joined               Name
   1   M   1340   2009-01-27 08:30:21  www
   2   M   1352   2009-01-27 08:30:27  xxx
   3   M   1348   2009-01-27 08:30:26  yyy
   4   X   1344                        zzz

one of them saying

Jan 27 08:31:40 www kernel: [  236.425837] dlm: connect from non cluster node

dlm_send then pegs the cpu on the three remaining nodes

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9500 root      20  -5     0    0    0 R  100  0.0   4:39.65 dlm_send

task list 

dlm_send      R  running task       0  9317    175          9318  9316 (L-TLB)
dlm_recoverd  D ffff81000102df80     0  9318    175                9317 (L-TLB)
 ffff81045c51ddd0 0000000000000046 ffff81000102f5a0 ffff81045c18f580
 ffff81000001dc00 000000000000000a ffff810451d89860 ffff81010f71c100
 00000096af1198d5 0000000000000181 ffff810451d89a48 0000000500000000
Call Trace:
 [<ffffffff8843ccb0>] :dlm:rcom_response+0x0/0xb
 [<ffffffff8843dd48>] :dlm:dlm_wait_function+0xdc/0x135
 [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8843d59f>] :dlm:dlm_rcom_status+0xa4/0x179
 [<ffffffff884399a2>] :dlm:dlm_recover_members+0x36d/0x45c
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8843e817>] :dlm:dlm_recoverd+0x11d/0x47f
 [<ffffffff8843e6fa>] :dlm:dlm_recoverd+0x0/0x47f
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

and after about 9 minutes these three remaining nodes panic when the oom-killer starts a massacre

automount invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800c39dd>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f2eb>] __alloc_pages+0x245/0x2ce
 [<ffffffff8003213d>] read_swap_cache_async+0x45/0xd8
 [<ffffffff800c9472>] swapin_readahead+0x60/0xd3
 [<ffffffff80009027>] __handle_mm_fault+0x9bc/0xe5c
 [<ffffffff80066b9a>] do_page_fault+0x4cb/0x830
 [<ffffffff80030d9a>] do_fork+0x148/0x1c1
 [<ffffffff8005dde9>] error_exit+0x0/0x84
etc.


Patch fixes the issue, suggest updating asap so no one has to endure my pain :)

Comment 7 Christine Caulfield 2009-05-19 06:57:39 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Cause:   When a node leaves the cluster normally it sends a message to other nodes that set its state to LEAVING. It is only when the node actually disappears from openais that the state is set to DOWN. 

Consequence:  If the node is restarted quickly then the node UP message arrives before the expected node down message (which gets cancelled). But cman only looks for DOWN nodes when marking nodes as back up again. So the node appears to stay DOWN.

Fix:   The check for a node transition to the UP state now also checks for nodes in LEAVING as well as DOWN states.

Result:  Quickly restarting a node using cman_tool leave; cman_tool join correctly updates the node state in cman.

Comment 8 Chris Ward 2009-07-03 18:14:06 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 12 Nate Straz 2009-07-24 20:29:22 UTC
Verified with cman-2.0.110-1.el5 on i386 using new test case hokeypokey.  I made it through 326 restarts before stopping the test.  We will be including 20 iterations of hokeypokey in future regression testing.

Comment 15 errata-xmlrpc 2009-09-02 11:06:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html