Bug 131002 - second and third mount attempts on recovered node hangs
Summary: second and third mount attempts on recovered node hangs
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs (Show other bugs)
(Show other bugs)
Version: 4
Hardware: i686 Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: GFS Bugs
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-26 15:54 UTC by Corey Marthaler
Modified: 2010-01-12 02:57 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 20:41:45 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Corey Marthaler 2004-08-26 15:54:35 UTC
Description of problem:
After morph-06 paniced, the cluster went through recovery and
conituned with I/O. The filesystems continued to be accessable.

I then brought morph-06 back into the cluster and attempted to mount
the 5 filesystems but after the first filesystem mounted successfully
the remaining attempts hung.

How reproducible:
Didn't try

Comment 1 Corey Marthaler 2004-08-26 15:55:06 UTC
morph-01:
[root@morph-01 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 2 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[3 4 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[3 4 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[3 4 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[3 4 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[3 4 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[3 4 5 6 2]


morph-02:
[root@morph-02 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 2 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[3 4 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[3 4 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[3 4 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[3 4 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[3 4 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[3 4 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[3 4 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[3 4 5 6 2]


morph-03:
[root@morph-03 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[4 2 3 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[4 3 5 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[4 3 5 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[4 3 5 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[4 3 5 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[4 3 5 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[4 3 5 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[4 3 5 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[4 3 5 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[4 3 5 6 2]


morph-04:
[root@morph-04 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[5 3 2 4 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[5 3 4 6 2 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[5 3 4 6 2]

DLM Lock Space:  "corey3"                            9  10 run       -
[5 3 4 6 2]

DLM Lock Space:  "corey4"                           11  12 run       -
[5 3 4 6 2]

GFS Mount Group: "corey0"                            4   5 run       -
[5 3 4 6 2 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[5 3 4 6 2 1]

GFS Mount Group: "corey2"                            8   9 run       -
[5 3 4 6 2]

GFS Mount Group: "corey3"                           10  11 run       -
[5 3 4 6 2]

GFS Mount Group: "corey4"                           12  13 run       -
[5 3 4 6 2]


morph-05:
[root@morph-05 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[2 3 4 5 6]

DLM Lock Space:  "corey3"                            9  10 run       -
[2 3 4 5 6]

DLM Lock Space:  "corey4"                           11  12 run       -
[2 3 4 5 6]

GFS Mount Group: "corey0"                            4   5 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey1"                            6   7 update   
U-4,1,1
[2 3 4 5 6 1]

GFS Mount Group: "corey2"                            8   9 run       -
[2 3 4 5 6]

GFS Mount Group: "corey3"                           10  11 run       -
[2 3 4 5 6]

GFS Mount Group: "corey4"                           12  13 run       -
[2 3 4 5 6]


morph-06:
[root@morph-06 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[2 3 4 5 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey0"                            4   5 run       -
[2 3 4 5 6 1]

GFS Mount Group: "corey1"                            6   7 join     
S-6,20,6
[2 3 4 5 6 1]


Comment 2 Corey Marthaler 2004-08-27 20:26:15 UTC
I was able to reproduce this mount hang using revolver and by just
shooting one node

Comment 3 Corey Marthaler 2004-08-27 20:28:44 UTC
[root@morph-06 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey0"                            3   4 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey1"                            5   6 run       -
[5 4 3 2 6 1]

DLM Lock Space:  "corey2"                            7   8 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey0"                            4   5 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey1"                            6   7 run       -
[5 4 3 2 6 1]

GFS Mount Group: "corey2"                            8   9 join     
S-6,20,6
[5 4 3 2 6 1]




Comment 4 David Teigland 2004-10-26 05:53:02 UTC
I recently fixed a dlm bug that could cause any gfs mount to hang.
It could be the culprit here.


Comment 5 Corey Marthaler 2004-10-29 21:40:10 UTC
unable to reproduce. marking fixed.

Comment 6 Kiersten (Kerri) Anderson 2004-11-16 19:02:40 UTC
Updating version to the right level in the defects.  Sorry for the storm.

Comment 7 Wade Mealing 2007-08-02 07:27:29 UTC
I -think- i have been able to reproduce this.  Run this on each of the nodes,
wait about 5 hours, two of the nodes were able to continue to mount and unmount,
one was not.. it hung at mounting. Approximately one hour later, the second node
from my three cluster setup hung at unmounting.

Admittedly, this is a bit brutish and I think it may expose the same problem. I
have no access to revolver.

kernel 2.6.9-55.0.2.EL , and related packages.

#!/bin/bash

i="0"

while [ $i -lt 1 ]
do
echo "Mounting ... "
mount -t gfs /dev/hdb1 /mnt/test
echo "Unmounting ..." 
umount /mnt/test
done




Comment 8 David Teigland 2009-01-20 20:41:45 UTC
Closing again, this was fixed/closed in 2004.
comment 7 would have been something different.


Note You need to log in before you can comment on or make changes to this bug.