Bug 252209
Summary: | mount attempt deadlocks after gulm recovery | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> | ||||||
Component: | gulm | Assignee: | Chris Feist <cfeist> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 4 | CC: | cluster-maint, nstraz | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-04-20 15:03:38 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Corey Marthaler
2007-08-14 19:00:30 UTC
Created attachment 161297 [details]
kern dump from taft-04
This is caused by a problem with the protocol which doesn't notify us if a node is rejoining the cluster or is joining the cluster for the first time. I'm working on a solution to this issue without changing the protocol. FYI - hit this bug again during 4.6 regression testing. 2.6.9-67.ELsmp gulm-1.0.10-0 Hit this issue again during 4.6.Z testing. Note, this may be the same issue as bz 382671. ================================================================================ [revolver] Senario iteration 0.6 started at Tue Apr 29 00:21:14 CDT 2008 [revolver] Sleeping 5 minute(s) to let the I/O get its lock count up... [revolver] Gulm Status [revolver] =========== [revolver] taft-02: Master [revolver] taft-03: Slave [revolver] taft-01: Slave [revolver] taft-04: Client [revolver] Senario: GULM kill all Clients and Slaves [revolver] [revolver] Those picked to face the revolver... taft-04 taft-01 taft-03 [revolver] Feeling lucky taft-04? Well do ya? Go'head make my day... [revolver] Didn't receive heartbeat for 5 seconds [revolver] Feeling lucky taft-01? Well do ya? Go'head make my day... [revolver] Didn't receive heartbeat for 5 seconds [revolver] Feeling lucky taft-03? Well do ya? Go'head make my day... [revolver] Didn't receive heartbeat for 5 seconds [revolver] [revolver] Verify that taft-04 has been removed from cluster on remaining nodes [revolver] Verify that taft-01 has been removed from cluster on remaining nodes [revolver] Verify that taft-03 has been removed from cluster on remaining nodes [revolver] Verifying that the dueler(s) are alive [revolver] Still not all alive, sleeping another 10 seconds [revolver] Still not all alive, sleeping another 10 seconds [revolver] Still not all alive, sleeping another 10 seconds [revolver] Still not all alive, sleeping another 10 seconds [revolver] All killed nodes are back up, making sure they're qarshable... [revolver] Verifying that recovery properly took place on the node(s) which stayed in the cluster [revolver] checking Gulm recovery... [revolver] Verifying that clvmd was started properly on the dueler(s) [revolver] mounting /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER0 on /mnt/TAFT_CLUSTER0 on taft-04 [STUCK] root 5640 5639 0 00:28 ? 00:00:00 mount -t gfs -o debug /dev/mapper/TAFT_CLUSTER-TAFT_CLUSTER0 /mnt/TAFT_CLUSTER0 Created attachment 304116 [details]
kernel dump from taft-04 during the hung mnt attempt
Appear to have reproduced this again during 4.7 GA regression testing. It requires all Slaves being killed (leaving only the Master). *** Bug 382671 has been marked as a duplicate of this bug. *** Reproduced during 4.7.Z testing. [revolver] Senario iteration 0.4 started at Wed Sep 3 10:24:21 CDT 2008 [revolver] Sleeping 3 minute(s) to let the I/O get its lock count up... [revolver] Gulm Status [revolver] =========== [revolver] grant-02: Slave [revolver] grant-03: Master [revolver] grant-01: Slave [revolver] Senario: GULM kill all Slaves Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |