Bug 324881 - first mount can hang during parallel mounts
first mount can hang during parallel mounts
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: GFS-kernel (Show other bugs)
All Linux
low Severity low
: ---
: ---
Assigned To: David Teigland
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2007-10-09 09:47 EDT by David Teigland
Modified: 2010-01-11 22:19 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-03-12 15:54:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
possible patch (600 bytes, text/plain)
2007-10-15 09:39 EDT, David Teigland
no flags Details

  None (edit)
Description David Teigland 2007-10-09 09:47:17 EDT
Description of problem:

This was found while trying to reproduce bz 299061. Running the test in
comment 17 of that bz seems to reproduce this bug quite easily; at least
in my xen cluster which is not smp, this bug may be harder to hit on
smp machines.

It's easy to tell if you've hit this bug, because a message like this will
always appear in /var/log/messages:

SM: 02000378 ignoring service callback id=2000144 event=1324

If you look at /proc/cluster/lock_dlm/debug on this node at this point,
you'll see something like this at the end, which shows what the problem

others_may_mount start_done 1322 b

The event_id that others_may_mount uses when calling kcl_start_done()
is incorrect; it's using 1322 when it should be 1324.

I believe the fix is for others_may_mount() to read the event_id
after taking the umount_lock semaphore which serializes
others_may_mount() with a start callback from the lock_dlm thread.
In this case, I believe the start callback is changing the event_id
after others_may_mount reads it, and before othres_may_mount gets
the umount_lock semaphore.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 David Teigland 2007-10-10 10:08:40 EDT
This has not been reproducable on smp machines so far.
Comment 2 David Teigland 2007-10-15 09:39:07 EDT
Created attachment 227681 [details]
possible patch

This patch seems to fix the problem.
Comment 3 David Teigland 2008-01-14 10:36:03 EST
fix checked into RHEL4 branch

Checking in mount.c;
/cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/mount.c,v  <--  mount.c
new revision:; previous revision:
Comment 4 Steve Whitehouse 2009-01-20 10:19:41 EST
This seemed to be missing flags.
Comment 5 Chris Feist 2009-03-12 15:54:47 EDT
This was fxied way back in January of 2008, it's already in 4.7.

Note You need to log in before you can comment on or make changes to this bug.