Description of problem: While running locktests, trying to set a write lock on a write lock returned EAGAIN. This was on a distributed-replicate setup with 4 bricks. How reproducible: Consistently Steps to Reproduce: 1. ./locktests -n 5 -f /mnt/gluster/file 2. 3. Actual results: TEST : TRY TO WRITE ON A READ LOCK:===== TEST : TRY TO WRITE ON A WRITE LOCK:===== TEST : TRY TO READ ON A READ LOCK:===== TEST : TRY TO READ ON A WRITE LOCK:===== TEST : TRY TO SET A READ LOCK ON A READ LOCK:===== TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock : Resource temporarily unavailable Echec : Resource temporarily unavailable Init process initalization Expected results: All tests should pass Additional info: Client log- [2012-03-06 14:38:48.594050] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. P roceeding to blocking [2012-03-06 14:38:48.594110] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction [2012-03-06 14:38:48.594134] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 0 by 280967ffff7f 0000 [2012-03-06 14:38:48.594152] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. P roceeding to blocking [2012-03-06 14:38:48.594418] D [afr-lk-common.c:1427:afr_nonblocking_inodelk] 0-test2-replicate-1: attempting data lock range 0 62 by 244267f fff7f0000 [2012-03-06 14:38:48.594597] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction [2012-03-06 14:38:48.594625] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 0 by ff3167ffff7f 0000 [2012-03-06 14:38:48.633362] D [fuse-bridge.c:3157:fuse_setlk_cbk] 0-glusterfs-fuse: Returning EAGAIN Flock: start=0, len=0, pid=7926, lk-owner=10ffff320eff775e [2012-03-06 14:38:48.633650] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction [2012-03-06 14:38:48.633681] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 62 by 244267ffff7f0000 [2012-03-06 14:38:48.633702] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. Proceeding to blocking [2012-03-06 14:38:48.634019] D [afr-lk-common.c:1013:afr_lock_blocking] 0-test2-replicate-1: we're done locking [2012-03-06 14:38:48.634051] D [afr-transaction.c:982:afr_post_blocking_inodelk_cbk] 0-test2-replicate-1: Blocking inodelks done. Proceeding to FOP
I think its expected behavior. Can you turn flush-behind off and try again? It should not give EAGAIN with flush-behind off.
I tried with flush-behind off using volume set, but still see the same problem.
Anush, can you please confirm behavior on qa33?
Hi Vijay, still see this issue on qa33.
locktests doesnt' fail while attempting to set WRITE LOCK on WRITE LOCK on 17b0814243b4ccd56c0ce570b7f42d5e572e1e71 (git commit-id), it 'fails' in "READ LOCK THE WHOLE FILE BYTE BY BYTE" and its WRITE counterpart. I am changing the synopsis to reflect this. The commit log message explains the reason why the issue is observed.
CHANGE: http://review.gluster.com/3306 (locks: Set flock.l_type on successful F_UNLCK to signal last unlock on fd.) merged in master by Anand Avati (avati)
Created attachment 583779 [details] The Python script helps in testing "last unlock based unref'ing" in NLM. ./simplepy /path/to/file The script takes two write locks (fcntl) on 'adjacent' regions, on an fd and unlocks the same. To verify if the last unlock on the fd has the flock.l_type set to F_UNLCK and F_RDLCK otherwise, one must 'gdb' into one of the 'lock servers' (brick) and add breakpoint on pl_lk.
This is still seen on fuse mount ( latest commit id-0cdb1d147afd644153855f6557bf7e809e5444f0). I had filed this bug for fuse mount.
(In reply to comment #8) > This is still seen on fuse mount ( latest commit > id-0cdb1d147afd644153855f6557bf7e809e5444f0). I had filed this bug for fuse > mount. Anush, The fix is agnostic to the type of mount. The mention of NLM in the bug and commit log is because of what NLM expects of the locks xlator for its smooth functioning. I am unable to recreate the issue on the commit-id you've mentioned. It would be better if you could provide access to the setup where you are seeing this issue. It would also help if you can attach the output of locktests, which says which test(s) in the collection of tests failed.
Unable to recreate this issue in a couple of different setups. Removing target milestone 3.3.0, since it is not reproducible consistently.
closing it as WORKSFORME as in none of our unittesting we could hit it again, If seen again, please re-open.