Red Hat Bugzilla – Bug 191222
read flock broken on single-node
Last modified: 2010-01-11 22:11:24 EST
Description of problem:
Remember this is all on a single node.
When you hold out a READ FLOCK on a file. First request for a READ FLOCK
succeeds, but Second request for a READ FLOCK hangs/returns error
Version-Release number of selected component (if applicable):
How reproducible: All the time
Steps to Reproduce:
1. Process1 opens and acquires a READ FLOCK on a gfs file foo and goes to sleep.
2. Process2 opens, acquires a READ FLOCK on foo, UNFLOCKS and closes. No errors.
3. Process3 opens, tries to acquire a READ FLOCK on foo
Process3 blocks or returns with "resource not available" error (depending on
blocking flag used with flock())
All READ FLOCKS should be compatible with each other, no blocks or errors.
Process3 should behave exactly the same way as Process2
Attached is a test program that can be used to simulate this scenario.
Created attachment 128812 [details]
test-program to simulate bug
At least part of the problem is that the GL_NOCACHE flag used on
flock glocks assumes that there's only a single glock holder, so
when a NOCACHE holder is dequeued the glock is unlocked without
any thought that other holders may still exist.
Created attachment 129069 [details]
Patch to potentially fix this bz
This patch ensures that a GL_NOCACHE glock is removed from cache only when
gfs_glock_dq is called on the last holder. I haven't seen any ill-effects of
this patch, but will feel comfortable when it goes through a round of qa.
Committed above patch into RHEL4, HEAD and STABLE branches.
A little explanation of FLOCKS, GL_NOCACHE etc
1. Why do flocks need GL_NOCACHE flag turned on for its glocks?
If FLOCK glocks are cached on one node after use, another node requesting
a conflicting FLOCK coupled with the LOCK_NB flag will be denied. The first
node has already used and released the FLOCK and should not conflict with the
second node's request. The GL_NOCACHE flag ensures this.
2. In RHEL3, there was no GL_NOCACHE flag. How were flocks working then?
Without the GL_NOCACHE flag the release of the glock depends on a timeout
value associated with FLOCK glocks. This timeout mechanism (flock_demote_ok())
is not implemented and hence the glock gets released immediately.
But, there is a correctness issue here. The release of the glock doesn't
happen synchronously. The issue in 1. could still occur if the second
node requests the flock within the small window between the release of
the flock and release of the glock.
The solution is a correct implementation of GL_NOCACHE, which this patch
attempts to accomplish.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
Just stumbled upon this bug myself using RHEL4U3. The symptoms I saw was that
the traffic on the heartbeat (DLM) network was high and performance was poor(er)
on nodes which were not the first to mount the filesystem.
The first mounter obtained journal locks then dequeued them when they still had
holders. From that moment on the other nodes had to do network DLM transactions
to get the locks and could never cache them locally.
This fix solved the performance problem.