Description of problem: This is a bug pointed out by Matthew Brookover on the linux-cluster mailing list. https://www.redhat.com/archives/linux-cluster/2007-August/msg00093.html It exists in GFS and GFS2. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Have one process P1 hold an flock on a gfs2 file 'foo' in exclusive mode. 2. Have another process P2 continuously attempt to get another exclusive flock on the 'foo' in a loop. 3. Have P1 request a shared flock on 'foo' now. Actual results: P2 is able to sneak in and grab the exclusive lock on the file before P1 can demote it to a shared state. Expected results: On comparison with ext3, the expected behavior is for P1 to go ahead and obtain the shared lock (in step 3, above). There shouldn't be a window of opportunity for P2 to obtain it's exclusive lock. Additional info: On briefly looking at the code, it appears that gfs peforms an unlock on the existing exclusive lock followed by a shared flock request. I'll need to investigate more to be sure. I'm attaching the test programs to the bugzilla below.
Created attachment 161204 [details] test program to do exclusive followed by shared flock
Created attachment 161205 [details] test program to do continous exclusive flock requests
Using the programs: (from the email to linux-cluster) Compile both programs: [mbrookov@imagine locktest]$ cc -o flock_EX_SH flock_EX_SH.c [mbrookov@imagine locktest]$ cc -o flockwritelock flockwritelock.c [mbrookov@imagine locktest]$ EXT3 test: Start up xterm twice and cd to the directory where you compiled the 2 programs. On my system, /tmp is an EXT3 file system. In the first xterm, run 'flock_EX_SH /tmp/bar' and hit return. In the second xterm, run 'flockwritelock /tmp/bar' and hit return. The flockwritelock process will block waiting for an exclusive lock on the file /tmp/bar. One the first xterm, hit return, the flock_EX_SH process will attempt to demote the exclusive lock to a shared lock and display a prompt. The flockwritelock process on the second xterm will stay blocked. In the first xterm, hit return again, the flock_EX_SH process will free the lock, close the file and exit. The flockwritelock process will then receive the exclusive lock on /tmp/bar and display a prompt. Hit return in the second xterm to get flockwritelock to close and exit. Output on first xterm: [mbrookov@imagine locktest]$ ./flock_EX_SH /tmp/bar Have exclusive lock, hit return to free write lock on /tmp/bar and exit Attempt to demote lock on /tmp/bar to shared lock Have shared lock, hit return to free lock on /tmp/bar and exit [mbrookov@imagine locktest]$ Output on second xterm: [mbrookov@imagine locktest]$ ./flockwritelock /tmp/bar Have write lock, hit return to free write lock on /tmp/bar and exit [mbrookov@imagine locktest]$ GFS test: Start up xterm twice and cd to the directory where you compiled the 2 programs. On my system, the locktest directory is on a GFS file system. In the first xterm, run 'flock_EX_SH bar' and hit return. In the second xterm, run 'flockwritelock bar' and hit return. The flockwritelock process will block waiting for an exclusive lock on the file bar. On the first xterm, hit return, the flock_EX_SH process will attempt to demote the exclusive lock on bar to a shared lock but will fail because the system call to flock frees the lock allowing the flockwritelock process to get an exclusive lock. The flock_EX_SH process will exit. Hit return on the second xterm, flockwritelock will close bar and exit. Output on first xterm: [mbrookov@imagine locktest]$ ./flock_EX_SH bar Have exclusive lock, hit return to free write lock on bar and exit Attempt to demote lock on bar to shared lock Could not demote to shared lock on file bar, Resource temporarily unavailable [mbrookov@imagine locktest]$ Output on second xterm: [mbrookov@imagine locktest]$ ./flockwritelock bar Have write lock, hit return to free write lock on bar and exit [mbrookov@imagine locktest]$ The results for flock on GFS are the same if you run the two programs on the same node or on 2 different nodes. The locks (shared, exclusive, blocking, non blocking) also work correctly on both file systems. The problem is the case where GFS will free the exclusive lock and return an error instead of demote the exclusive lock to a shared lock. The program depends on the EXT3 flock behavior -- the exclusive lock can be demoted to a shared lock without the possibility that another process that is blocked waiting for an exclusive lock receiving the lock.
There is a similar issue with promoting from a shared lock to an exclusive lock. On EXT3 a process can use flock to get a shared lock on a file. A second process can block on an attempt to get an exclusive lock on the same file. The first process can promote the shared lock to exclusive, the second process stays blocked. When the first process frees the exclusive lock, then the second process will unblock and get the exclusive lock. On GFS, when the first process tries to promote the shared lock to and exclusive, it will block and the second process will get the lock. test procedure: start up two xterms compile the programs: cc -o flock_SH_EX flock_SH_EX.c cc -o flockwritelock flockwritelock.c In the first xterm run "flock_SH_EX foo" to craete the file named foo and get a shared lock on the file In the second xterm run "flockwritelock foo", the process will block. In the first xterm hit return, the process will attempt to promote the lock from shared to exclusive. On an EXT3 the process will get the exclusive lock, on GFS the process will block and the second process in the other xterm will get the exclusive lock. I would prefer that GFS have the EXT3 behavior where the process that holds the shared lock get priority over the process that is blocked waiting for the exclusive lock.
Created attachment 161405 [details] Get a shared lock then attempt to promote to an exclusive lock
Created attachment 161406 [details] Get an exclusive lock on a file