Bug 131002

Summary: second and third mount attempts on recovered node hangs
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: David Teigland <teigland>
Status: CLOSED INSUFFICIENT_DATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: ccaulfie, djuran
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:41:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2004-08-26 15:54:35 UTC
Description of problem:
After morph-06 paniced, the cluster went through recovery and
conituned with I/O. The filesystems continued to be accessable.

I then brought morph-06 back into the cluster and attempted to mount
the 5 filesystems but after the first filesystem mounted successfully
the remaining attempts hung.

How reproducible:
Didn't try

Comment 1 Corey Marthaler 2004-08-26 15:55:06 UTC
morph-01:
[root@morph-01 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 2 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[3 4 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[3 4 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[3 4 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[3 4 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[3 4 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[3 4 5 6 2]


morph-02:
[root@morph-02 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 2 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[3 4 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[3 4 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[3 4 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[3 4 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[3 4 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[3 4 5 6 2]


morph-03:
[root@morph-03 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[4 2 3 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[4 3 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[4 3 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[4 3 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[4 3 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[4 3 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[4 3 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[4 3 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[4 3 5 6 2]


morph-04:
[root@morph-04 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[5 3 2 4 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[5 3 4 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[5 3 4 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[5 3 4 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[5 3 4 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[5 3 4 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[5 3 4 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[5 3 4 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[5 3 4 6 2]


morph-05:
[root@morph-05 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[2 3 4 5 6]

DLM Lock Space:  "corey3"                            9  10 run       -
[2 3 4 5 6]

DLM Lock Space:  "corey4"                           11  12 run       -
[2 3 4 5 6]

GFS Mount Group: "corey0"                            4   5 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[2 3 4 5 6 1]

GFS Mount Group: "corey2"                            8   9 run       -
[2 3 4 5 6]

GFS Mount Group: "corey3"                           10  11 run       -
[2 3 4 5 6]

GFS Mount Group: "corey4"                           12  13 run       -
[2 3 4 5 6]


morph-06:
[root@morph-06 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey0"                            4   5 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey1"                            6   7 join     
S-6,20,6
[2 3 4 5 6 1]


Comment 2 Corey Marthaler 2004-08-27 20:26:15 UTC
I was able to reproduce this mount hang using revolver and by just
shooting one node

Comment 3 Corey Marthaler 2004-08-27 20:28:44 UTC
[root@morph-06 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey0"                            4   5 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey1"                            6   7 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey2"                            8   9 join     
S-6,20,6
[5 4 3 2 6 1]




Comment 4 David Teigland 2004-10-26 05:53:02 UTC
I recently fixed a dlm bug that could cause any gfs mount to hang.
It could be the culprit here.


Comment 5 Corey Marthaler 2004-10-29 21:40:10 UTC
unable to reproduce. marking fixed.

Comment 6 Kiersten (Kerri) Anderson 2004-11-16 19:02:40 UTC
Updating version to the right level in the defects.  Sorry for the storm.

Comment 7 Wade Mealing 2007-08-02 07:27:29 UTC
I -think- i have been able to reproduce this.  Run this on each of the nodes,
wait about 5 hours, two of the nodes were able to continue to mount and unmount,
one was not.. it hung at mounting. Approximately one hour later, the second node
from my three cluster setup hung at unmounting.

Admittedly, this is a bit brutish and I think it may expose the same problem. I
have no access to revolver.

kernel 2.6.9-55.0.2.EL , and related packages.

#!/bin/bash

i="0"

while [ $i -lt 1 ]
do
echo "Mounting ... "
mount -t gfs /dev/hdb1 /mnt/test
echo "Unmounting ..." 
umount /mnt/test
done




Comment 8 David Teigland 2009-01-20 20:41:45 UTC
Closing again, this was fixed/closed in 2004.
comment 7 would have been something different.