Description of problem: Remember this is all on a single node. When you hold out a READ FLOCK on a file. First request for a READ FLOCK succeeds, but Second request for a READ FLOCK hangs/returns error Version-Release number of selected component (if applicable): How reproducible: All the time Steps to Reproduce: 1. Process1 opens and acquires a READ FLOCK on a gfs file foo and goes to sleep. 2. Process2 opens, acquires a READ FLOCK on foo, UNFLOCKS and closes. No errors. 3. Process3 opens, tries to acquire a READ FLOCK on foo Actual results: Process3 blocks or returns with "resource not available" error (depending on blocking flag used with flock()) Expected results: All READ FLOCKS should be compatible with each other, no blocks or errors. Process3 should behave exactly the same way as Process2 Additional info: Attached is a test program that can be used to simulate this scenario.
Created attachment 128812 [details] test-program to simulate bug
At least part of the problem is that the GL_NOCACHE flag used on flock glocks assumes that there's only a single glock holder, so when a NOCACHE holder is dequeued the glock is unlocked without any thought that other holders may still exist.
Created attachment 129069 [details] Patch to potentially fix this bz This patch ensures that a GL_NOCACHE glock is removed from cache only when gfs_glock_dq is called on the last holder. I haven't seen any ill-effects of this patch, but will feel comfortable when it goes through a round of qa.
Committed above patch into RHEL4, HEAD and STABLE branches.
A little explanation of FLOCKS, GL_NOCACHE etc 1. Why do flocks need GL_NOCACHE flag turned on for its glocks? If FLOCK glocks are cached on one node after use, another node requesting a conflicting FLOCK coupled with the LOCK_NB flag will be denied. The first node has already used and released the FLOCK and should not conflict with the second node's request. The GL_NOCACHE flag ensures this. 2. In RHEL3, there was no GL_NOCACHE flag. How were flocks working then? Without the GL_NOCACHE flag the release of the glock depends on a timeout value associated with FLOCK glocks. This timeout mechanism (flock_demote_ok()) is not implemented and hence the glock gets released immediately. But, there is a correctness issue here. The release of the glock doesn't happen synchronously. The issue in 1. could still occur if the second node requests the flock within the small window between the release of the flock and release of the glock. The solution is a correct implementation of GL_NOCACHE, which this patch attempts to accomplish.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0561.html
Just stumbled upon this bug myself using RHEL4U3. The symptoms I saw was that the traffic on the heartbeat (DLM) network was high and performance was poor(er) on nodes which were not the first to mount the filesystem. The first mounter obtained journal locks then dequeued them when they still had holders. From that moment on the other nodes had to do network DLM transactions to get the locks and could never cache them locally. This fix solved the performance problem.