The script in the description has ---------------------------------------------------------------------- #echo('Shared lock attempt.'."\n"); #flock($fh, LOCK_SH); #echo('Locked as shared.'."\n"); #sleep(10); ---------------------------------------------------------------------- lines commented with #. Uncomment them before running, or use the following: ---------------------------------------------------------------------- #/usr/bin/php <? $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); #echo('Shared lock attempt.'."\n"); #flock($fh, LOCK_SH); #echo('Locked as shared.'."\n"); #sleep(10); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); ?> ----------------------------------------------------------------------
Oops, I posted the same again. It is: ---------------------------------------------------------------------- #/usr/bin/php <? $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); echo('Shared lock attempt.'."\n"); flock($fh, LOCK_SH); echo('Locked as shared.'."\n"); sleep(10); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); ?> ---------------------------------------------------------------------- Now for sure. Sorry for the crap above, it's just the end of the working day.
I am trying to put 3.0.4 in production, but have reached a showstopper bug. My config is: * CentOS 5.5 x86_64 (vanilla up-to-date 2.6.18 kernel), custom FUSE module and userspace built from fuse-2.7.4glfs11-1.tar.gz, vanilla GlusterFS from glusterfs-*-3.0.4-1.x86_64.rpm's. The bug is: when the file lock is upgraded from LOCK_SH to LOCK_EX, and other node keeps LOCK_SH, we get a deadlock on client that causes the file in operation to never be locked with LOCK_EX again (any attempt to LOCK_EX it deadlocks). This also happens sometimes if we just upgrade the lock on one node. The script deadlocks and never exits. Side effect: after deadlock, only umount --force and fusermount -u can unmount the file system. Regular umount says file system is busy even if I manage to kill the test scripts. How to repeat: use this script ---------------------------------------------------------------------- #/usr/bin/php <? $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); #echo('Shared lock attempt.'."\n"); #flock($fh, LOCK_SH); #echo('Locked as shared.'."\n"); #sleep(10); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); ?> ---------------------------------------------------------------------- Run it on two nodes with a slight delay (1-2 sec), and you'll get first deadlock after both nodes go to "Exclusive lock attempt" state. CTRL-C can break the scripts, but it is already fatal: run script on any one node (or on both) again, and after next "Exclusive lock attempt" you'll get complete deadlock with the script going defunct. umount --force and then fusermount -u helps though. ---------------------------------------------------------------------- That's the config I use (passwords removed, IPs changed, volume names changed): First server: ---------------------------------------------------------------------- volume a1_posix type storage/posix option directory /glusterfs/a0 option background-unlink yes end-volume volume a1 type features/locks subvolumes a1_posix end-volume volume a1_server type protocol/server option transport-type tcp option transport.socket.listen-port 6996 option auth.addr.bigweb_ttknw.allow 10.1.1.* subvolumes a1 end-volume ---------------------------------------------------------------------- Second server config is identical, just the names are "a2", not "a1". Client: ---------------------------------------------------------------------- volume a1 type protocol/client option transport-type tcp option remote-host 10.1.1.2 option remote-port 6996 option remote-subvolume a1 end-volume volume a2 type protocol/client option transport-type tcp option remote-host 10.1.1.4 option remote-port 6996 option remote-subvolume a2 end-volume volume a0 type cluster/replicate subvolumes a1 a2 end-volume ----------------------------------------------------------------------
Reporting in: GlusterFS 3.0.5 still has that problem.
Created attachment 295 Tarball contains sample ctdb setup configuration files. These files needs to be modified as per test setup environment.
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.
PATCH: http://patches.gluster.com/patch/5285 in master (features/locks: Handle lock upgrade and downgrade properly in locks.)
PATCH: http://patches.gluster.com/patch/5507 in release-3.0 (features/locks: Handle upgrade/downgrade of locks properly.)
GlusterFS 3.1 release still fails on that even patched. How to repeat: 1. Create one replicated volume (2 replicas) according to manual 2. Put test script on volume 3. Run script simultaneously on both nodes (with 1-2 second interval) 4. See the script hanging (even SIGKILL fails)
It's better than previously: the FS can be unmounted with -f after some tries, and the scripts terminate when one of the nodes affected unmounts FS. But it is still deadlocking on shared->excl upgrade, if the second node tries to do the same at the same time.
PATCH: http://patches.gluster.com/patch/5552 in release-3.0 (cluster/afr: Do a broadcast unlock in replicate to eliminate deadlock during upgrade/downgrade.)
Applied 5552 to 3.1 with a small filename change and whitespace ignoring. The test passes now, but... it really fails. Look: 1. First node locks file as SHARED 2. Second node locks file as SHARED, this is correct 3. First node attempts to lock file as EXCLUSIVE and it waits for the shared lock to be removed. 4. Second node attempts to lock file as EXCLUSIVE, and then everything goes wrong... Second node is allowed to take EXCLUSIVE lock then. It must not be, because first node still holds the SHARED lock. Why this is wrong. First node expects file to be unmodified between SHARED and EXCLUSIVE locks. Yes. There will be some deadlock, but not in the filesystem core. File system must not hang and allow processes to be terminated at least with SIGKILL. In real life, processes opt to fallback if the lock upgrade is ungrantable. The filesystem must not hung, just report that such a lock is ungrantable at the time of request. --- Okay. Now we have semi-working configuration, that is good enough, but it can lead to corrupted files in case of some lock upgrades confirmed as granted. I will also post a second test in a while.
Hmm. Here is another test suite: -------------------------------------------------------------------------- test2-1.php -------------------------------------------------------------------------- #!php <? @unlink('gluster.test'); $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); echo('Shared lock attempt.'."\n"); flock($fh, LOCK_SH); echo('Locked as shared.'."\n"); sleep(10); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); echo('Result: '.file_get_contents('gluster.test')."\n"); ?> -------------------------------------------------------------------------- test2-2.php -------------------------------------------------------------------------- #!php <? $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); ftruncate($fh, 0); fwrite($fh, 'WRONG!'); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); ?>
One mistake. Please place echo('Result: '.file_get_contents('gluster.test')."\n"); just after sleep(10); in the code. ---------------------------------------------------------------------- How to use: 1. Place test2-1.php and test2-2.php onto GlusterFS volume. 2. Run test2-1.php on one node. 3. Wait about 1-2 seconds. 4. Run test2-2.php on another node. 5. Wait for the result. ---- What must be: 1. Node 1 locks file as SHARED 2. Node 2 attempts to lock file as EXCLUSIVE. It must wait, because there is shared lock on file. 3. Node 1 upgrades lock to EXCLUSIVE. It is permitted to do so, because there are no other locks on the file. 4. Node 1 prints file contents. 5. Node 1 unlocks the file. 6. Node 2 is allowed to do its job. What happens currently: 1. Node 1 locks file as SHARED 2. Node 2 attempts to lock file as EXCLUSIVE. It must wait, because there is shared lock on file. 3. Node 1 tries to upgrade lock to EXCLUSIVE. 4. Oops, Node 2 is allowed to get the lock and do as it wish... this is bad, it corrupts the file with "WRONG!" 5. Node 2 unlocks the file. 6. Node 1 is allowed to get the lock. 7. Node 1 prints WRONG file contents. 8. Node 1 unlocks the file.
IRL, such lock upgrades are rare, because they pose deadlock danger. I still do think this patch is a must for start, while we can have our time to debug locking. Alas, I'm not too versed in a matters of GlusterFS internals, but will try to understand it in the meanwhile so I can be of a help (maybe).
PATCH: http://patches.gluster.com/patch/5742 in master (features/locks: Send prelock unlock only if it is not grantable and is a blocking lock call.)
Reporting in: The test from Comment #2 still fails in gluster 3.1.1. It hangs on exclusive locking.
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000) Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.
A Pivotal Tracker story has been created for this Bug: http://www.pivotaltracker.com/story/show/18852795
Dave Garnett deleted the linked story in Pivotal Tracker
GlusterFS 3.2.4 (FUSE) fails script in Comment 2.
3.2.4 also hangs on that script, defunct'ing the script process. Killing glusterfs processes on one of the nodes resolves hang on another node. Something is still wrong dead in the locking code.
hi Alex, I tested for deadlock in comment-2 on release-3.3 branch and the deadlock did not happen. I also want to test the use-case in comment-12,13. Could you tell me if your suggestion in comment-13 is for test2-1.php or test2-2.php?. Could you attach the files instead of copy/paste. If you want to play with the new changes in locks xlator yourself, use: http://bits.gluster.com/pub/gluster/glusterfs/src/glusterfs-3.3.0qa43.tar.gz Pranith.
Yes, I would like to check it again. I will check in a short while (may take one to two days to build test case), then I will post test results along with the test scripts used. Thanks for your assistance.
Thanks Alex. Could you give me the scripts you used in comment-12,13 so that I can test that case while you build other test cases?. Pranith.
Alex, I am moving this bug to MODIFIED state as the fix for the first test-case that resulted in dead-lock is not happening on 3.3. anymore. Please feel free to open new bugs if you can come up with new cases that result in problems.
Checked with glusterfs-3.3.0. Executed the php script mentioned from 2 clients and tests completed both the sides.
Sorry for the long absence, I had had some health issues to cope with. Tested on 3.3.0 the script #1 at the end of this message (running two script copies on different hosts in the same mount dir within a 1-3 second interval). Result: locking success, workflow is impaired. When both scripts race for the exclusive lock, glusterfs FUSE mount blocks completely until one of the processes are SIGKILL'ed. Normal behavior involves passing SIGHUP/SIGTERM to process as well during lockwaits. Otherwise, locking works as desired. Nonblocking locks work. Shared/exclusive mechanics work. Verdict: usable, but needs some love to the signal handling while in lockwait. -------------------------------------------------------------------------- Script #1 #!/usr/bin/php <? $fh = fopen('gluster.test', 'ab+'); echo('Opened.'."\n"); sleep(2); echo('Shared lock attempt.'."\n"); flock($fh, LOCK_SH); echo('Locked as shared.'."\n"); sleep(10); echo('Exclusive lock attempt.'."\n"); flock($fh, LOCK_EX); echo('Locked exclusively.'."\n"); sleep(10); flock($fh, LOCK_UN); echo('Unlocked.'."\n"); sleep(2); fclose($fh); echo('Closed.'."\n"); sleep(1); ?> -------------------------------------------------------------------------- Script #2
Welcome back. Thanks for the verification. Could you explain a bit more about the signal handling part.
Yes. When waiting for lock (or racing for lock) on regular filesystems, SIGINT/SIGHUP/SIGTERM and other signals reach the process waiting for lock. When waiting for lock on GlusterFS, these signals does not reach the process waiting for lock, and process can never be signalled using external means. I found that SIGKILL now works, but SIGKILL is not quite a regular signal. It's not fatal for everyday ops, but it prevents process termination i.e. from console (SIGINT) when debugging.