Bug 324881 - first mount can hang during parallel mounts
Summary: first mount can hang during parallel mounts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: GFS-kernel
Version: 4
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-09 13:47 UTC by David Teigland
Modified: 2010-01-12 03:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-03-12 19:54:47 UTC
Embargoed:


Attachments (Terms of Use)
possible patch (600 bytes, text/plain)
2007-10-15 13:39 UTC, David Teigland
no flags Details

Description David Teigland 2007-10-09 13:47:17 UTC
Description of problem:

This was found while trying to reproduce bz 299061. Running the test in
comment 17 of that bz seems to reproduce this bug quite easily; at least
in my xen cluster which is not smp, this bug may be harder to hit on
smp machines.

It's easy to tell if you've hit this bug, because a message like this will
always appear in /var/log/messages:

SM: 02000378 ignoring service callback id=2000144 event=1324

If you look at /proc/cluster/lock_dlm/debug on this node at this point,
you'll see something like this at the end, which shows what the problem
is:

others_may_mount start_done 1322 b

The event_id that others_may_mount uses when calling kcl_start_done()
is incorrect; it's using 1322 when it should be 1324.

I believe the fix is for others_may_mount() to read the event_id
after taking the umount_lock semaphore which serializes
others_may_mount() with a start callback from the lock_dlm thread.
In this case, I believe the start callback is changing the event_id
after others_may_mount reads it, and before othres_may_mount gets
the umount_lock semaphore.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2007-10-10 14:08:40 UTC
This has not been reproducable on smp machines so far.


Comment 2 David Teigland 2007-10-15 13:39:07 UTC
Created attachment 227681 [details]
possible patch

This patch seems to fix the problem.

Comment 3 David Teigland 2008-01-14 15:36:03 UTC
fix checked into RHEL4 branch

Checking in mount.c;
/cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/mount.c,v  <--  mount.c
new revision: 1.11.2.4; previous revision: 1.11.2.3


Comment 4 Steve Whitehouse 2009-01-20 15:19:41 UTC
This seemed to be missing flags.

Comment 5 Chris Feist 2009-03-12 19:54:47 UTC
This was fxied way back in January of 2008, it's already in 4.7.


Note You need to log in before you can comment on or make changes to this bug.