Bug 382671

Summary: mount hung after recovery, lock_gulmd_LT in busy wait
Product: [Retired] Red Hat Cluster Suite Reporter: Nate Straz <nstraz>
Component: gulmAssignee: Chris Feist <cfeist>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-17 15:21:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gzipped tcpdump of port 41040 from morph-01 none

Description Nate Straz 2007-11-14 15:52:03 UTC
Description of problem:

While doing GFS recovery testing with lock_gulm a mount hung after shooting only
one node.  The lock_gulmd_LT threads on all nodes are very busy.  I did a packet
capture of port 41040 on the master to hopefully shed some light on what is
going on.


Version-Release number of selected component (if applicable):
gulm-1.0.10-0
kernel-hugemem-2.6.9-67.EL
GFS-6.1.15-1
GFS-kernel-hugemem-2.6.9-75.9


How reproducible:
Unknown

Actual results:

Senario iteration 1.1 started at Tue Nov 13 16:09:51 CST 2007
Sleeping 5 minute(s) to let the I/O get its lock count up...
        Gulm Status
        ===========
        morph-02: Client
        morph-04: Client
        morph-05: Master
        morph-03: Slave
        morph-01: Slave
Senario: GULM kill Master

Those picked to face the revolver... morph-05 
...
checking Gulm recovery...
Verifying that clvmd was started properly on the dueler(s)
mounting /dev/mapper/morph--cluster-morph--cluster0 on /mnt/morph-cluster0 on
morph-05
mounting /dev/mapper/morph--cluster-morph--cluster1 on /mnt/morph-cluster1 on
morph-05
(hung)

Expected results:
The mount should not hang.

Additional info:

The recovery was done with a load on each node.

Comment 1 Nate Straz 2007-11-14 15:55:52 UTC
Created attachment 258231 [details]
gzipped tcpdump of port 41040 from morph-01

Comment 2 Nate Straz 2007-11-15 14:30:35 UTC
I think I've hit this twice now.  The most recent time I thought it was a hung
mount after losing quorum, but after re-fencing the nodes which were fenced the
mount did not continue.  

Comment 3 Nate Straz 2008-07-17 15:21:17 UTC

*** This bug has been marked as a duplicate of 252209 ***