Bug 252209 - mount attempt deadlocks after gulm recovery
mount attempt deadlocks after gulm recovery
Status: CLOSED WONTFIX
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gulm (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Chris Feist
Cluster QE
:
: 382671 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-14 15:00 EDT by Corey Marthaler
Modified: 2010-04-20 11:03 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-20 11:03:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kern dump from taft-04 (95.35 KB, text/plain)
2007-08-14 15:03 EDT, Corey Marthaler
no flags Details
kernel dump from taft-04 during the hung mnt attempt (86.73 KB, text/plain)
2008-04-29 09:57 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-08-14 15:00:30 EDT
Description of problem:
This appears to be the same as closed bz 183383.

[revolver]
================================================================================
[revolver] Senario iteration 0.6 started at Mon Aug 13 17:30:58 CDT 2007
[revolver] Sleeping 2 minute(s) to let the I/O get its lock count up...
[revolver]      Gulm Status
[revolver]      ===========
[revolver]      taft-02: Master
[revolver]      taft-01: Slave
[revolver]      taft-03: Slave
[revolver]      taft-04: Client
[revolver] Senario: GULM kill all Clients and Slaves
[revolver]
[revolver] Those picked to face the revolver... taft-04 taft-03 taft-01
[revolver] Feeling lucky taft-04? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver] Feeling lucky taft-03? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver] Feeling lucky taft-01? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver]
[revolver] Verify that taft-04 has been removed from cluster on remaining nodes
[revolver] Verify that taft-03 has been removed from cluster on remaining nodes
[revolver] Verify that taft-01 has been removed from cluster on remaining nodes
[revolver] Verifying that the dueler(s) are alive
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[...]
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] All killed nodes are back up, making sure they're qarshable...
[revolver] Verifying that recovery properly took place on the node(s) which
stayed in the cluster
[revolver] checking Gulm recovery...
[revolver] Verifying that clvmd was started properly on the dueler(s)
[revolver] mounting /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER0 on /mnt/TAFT_CLUSTER0
on taft-04
[revolver] mounting /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER1 on /mnt/TAFT_CLUSTER1
on taft-04
[revolver] mounting /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER2 on /mnt/TAFT_CLUSTER2
on taft-04
PAN2 caught SIGINT: ALL STOP!!!


[root@taft-04 ~]# ps -ef | grep mount
root      5258  5257  0 Aug13 ?        00:00:00 mount -t gfs -o debug /dev/mappe2
root     21267 31714  0 13:51 ttyS0    00:00:00 grep mount

Version-Release number of selected component (if applicable):
gulm-1.0.10-0


I'll attach the stack traces...
Comment 1 Corey Marthaler 2007-08-14 15:03:37 EDT
Created attachment 161297 [details]
kern dump from taft-04
Comment 2 Chris Feist 2007-08-20 18:20:17 EDT
This is caused by a problem with the protocol which doesn't notify us if a node
is rejoining the cluster or is joining the cluster for the first time.  I'm
working on a solution to this issue without changing the protocol.
Comment 3 Corey Marthaler 2007-11-26 11:37:11 EST
FYI - hit this bug again during 4.6 regression testing.

2.6.9-67.ELsmp
gulm-1.0.10-0
Comment 4 Corey Marthaler 2008-04-29 09:53:02 EDT
Hit this issue again during 4.6.Z testing. Note, this may be the same issue as
bz 382671.

================================================================================
[revolver] Senario iteration 0.6 started at Tue Apr 29 00:21:14 CDT 2008
[revolver] Sleeping 5 minute(s) to let the I/O get its lock count up...
[revolver]      Gulm Status
[revolver]      ===========
[revolver]      taft-02: Master
[revolver]      taft-03: Slave
[revolver]      taft-01: Slave
[revolver]      taft-04: Client
[revolver] Senario: GULM kill all Clients and Slaves
[revolver]
[revolver] Those picked to face the revolver... taft-04 taft-01 taft-03
[revolver] Feeling lucky taft-04? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver] Feeling lucky taft-01? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver] Feeling lucky taft-03? Well do ya? Go'head make my day...
[revolver] Didn't receive heartbeat for 5 seconds
[revolver]
[revolver] Verify that taft-04 has been removed from cluster on remaining nodes
[revolver] Verify that taft-01 has been removed from cluster on remaining nodes
[revolver] Verify that taft-03 has been removed from cluster on remaining nodes
[revolver] Verifying that the dueler(s) are alive
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] All killed nodes are back up, making sure they're qarshable...
[revolver] Verifying that recovery properly took place on the node(s) which
stayed in the cluster
[revolver] checking Gulm recovery...
[revolver] Verifying that clvmd was started properly on the dueler(s)
[revolver] mounting /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER0 on /mnt/TAFT_CLUSTER0
on taft-04
[STUCK]

root      5640  5639  0 00:28 ?        00:00:00 mount -t gfs -o debug
/dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER0 /mnt/TAFT_CLUSTER0
Comment 5 Corey Marthaler 2008-04-29 09:57:43 EDT
Created attachment 304116 [details]
kernel dump from taft-04 during the hung mnt attempt
Comment 6 Corey Marthaler 2008-07-17 11:03:06 EDT
Appear to have reproduced this again during 4.7 GA regression testing. It
requires all Slaves being killed (leaving only the Master).
Comment 7 Nate Straz 2008-07-17 11:21:18 EDT
*** Bug 382671 has been marked as a duplicate of this bug. ***
Comment 8 Corey Marthaler 2008-09-03 17:19:20 EDT
Reproduced during 4.7.Z testing.

[revolver] Senario iteration 0.4 started at Wed Sep  3 10:24:21 CDT 2008
[revolver] Sleeping 3 minute(s) to let the I/O get its lock count up...
[revolver]      Gulm Status
[revolver]      ===========
[revolver]      grant-02: Slave
[revolver]      grant-03: Master
[revolver]      grant-01: Slave
[revolver] Senario: GULM kill all Slaves
Comment 9 RHEL Product and Program Management 2010-04-20 11:03:38 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.