Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 373711

Summary: recovery "stuck" on 3 remaining nodes after fourth node is shot.
Product: [Retired] Red Hat Cluster Suite Reporter: Dean Jansa <djansa>
Component: cman-kernelAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-14 14:18:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
link-13 stack
none
link-14 stack
none
link-15 stack
none
link-16 stack none

Description Dean Jansa 2007-11-09 20:30:37 UTC
Description of problem:

Four node cluster: link-{13,14,15,16}
link-13 is shot during recovery testing, remaining 3 nodes never complete
recovery.  Dave dug about and found link-{14,15,16} are all waiting for a
barrier to complete, which isn't happening for some unknown reason.


Version-Release number of selected component (if applicable):

2.6.9-55.0.12.ELlargesmp
cman-kernel-largesmp-2.6.9-50.2.0.6
dlm-kernel-largesmp-2.6.9-46.16.0.12



How reproducible:

Haven't tried at this point.


Steps to Reproduce:
1.  Run revolver with single gfs fs on a 4 node cluster.
2.
3.
 
----------------------------
[root@link-13 ~]# cat /proc/cluster/services 
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 join      S-4,4,1
[2 3 4 1]

DLM Lock Space:  "clvmd"                             4   3 join      S-4,4,1
[2 3 4 1]

User:            "usrm::manager"                    11   4 run       -
[1]


---------
[root@link-14 ~]# cat /proc/cluster/services 
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       U-1,10,1
[4 2 3]

DLM Lock Space:  "clvmd"                             4   4 run       U-1,10,1
[4 2 3]

DLM Lock Space:  "link_ia640"                        5   5 run       -
[4 2 3]

DLM Lock Space:  "link_ia641"                        7   7 run       -
[4 2 3]

DLM Lock Space:  "link_ia642"                        9   9 recover 4 -
[4 2 3]

GFS Mount Group: "link_ia640"                        6   6 recover 0 -
[4 2 3]

GFS Mount Group: "link_ia641"                        8   8 recover 0 -
[4 2 3]

GFS Mount Group: "link_ia642"                       10  10 recover 0 -
[4 2 3]

--------------------
[root@link-15 ~]# cat /proc/cluster/services 
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       U-1,10,1
[2 4 3]

DLM Lock Space:  "clvmd"                             4   4 run       U-1,10,1
[2 4 3]

DLM Lock Space:  "link_ia640"                        5   5 run       -
[2 4 3]

DLM Lock Space:  "link_ia641"                        7   7 run       -
[2 4 3]

DLM Lock Space:  "link_ia642"                        9   9 recover 4 -
[2 4 3]

GFS Mount Group: "link_ia640"                        6   6 recover 0 -
[2 4 3]

GFS Mount Group: "link_ia641"                        8   8 recover 0 -
[2 4 3]

GFS Mount Group: "link_ia642"                       10  10 recover 0 -
[2 4 3]

--------------------

[root@link-16 ~]# cat /proc/cluster/services 
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           2   2 run       U-1,10,1
[2 3 4]

DLM Lock Space:  "clvmd"                             4   4 run       U-1,10,1
[2 3 4]

DLM Lock Space:  "link_ia640"                        5   5 run       -
[2 3 4]

DLM Lock Space:  "link_ia641"                        7   7 run       -
[2 3 4]

DLM Lock Space:  "link_ia642"                        9   9 recover 4 -
[2 3 4]

GFS Mount Group: "link_ia640"                        6   6 recover 0 -
[2 3 4]

GFS Mount Group: "link_ia641"                        8   8 recover 0 -
[2 3 4]

GFS Mount Group: "link_ia642"                       10  10 recover 0 -
[2 3 4]

Comment 1 Dean Jansa 2007-11-09 20:30:37 UTC
Created attachment 253351 [details]
link-13 stack

Comment 2 Dean Jansa 2007-11-09 20:32:36 UTC
Created attachment 253361 [details]
link-14 stack

Comment 3 Dean Jansa 2007-11-09 20:33:20 UTC
Created attachment 253371 [details]
link-15 stack

Comment 4 Dean Jansa 2007-11-09 20:33:47 UTC
Created attachment 253381 [details]
link-16 stack

Comment 5 Christine Caulfield 2007-11-12 09:50:23 UTC
I strongly suspect this is the same as
https://bugzilla.redhat.com/show_bug.cgi?id=299061#c39

Comment 6 Christine Caulfield 2007-11-14 14:18:05 UTC
The above now has it's own BZ so I'll make this a duplicate of that bug. The
patch is ready. 

*** This bug has been marked as a duplicate of 373671 ***