Bug 800300 - locktests fail in "READ LOCK THE WHOLE FILE BYTE BY BYTE" test case.
locktests fail in "READ LOCK THE WHOLE FILE BYTE BY BYTE" test case.
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: locks (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: krishnan parthasarathi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-06 04:20 EST by Anush Shetty
Modified: 2015-11-03 18:04 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-21 04:52:51 EST
Type: ---
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The Python script helps in testing "last unlock based unref'ing" in NLM. (667 bytes, text/x-python)
2012-05-11 05:33 EDT, krishnan parthasarathi
no flags Details

  None (edit)
Description Anush Shetty 2012-03-06 04:20:40 EST
Description of problem: While running locktests, trying to set a write lock on a write lock returned EAGAIN. This was on a distributed-replicate setup with 4 bricks.


How reproducible: Consistently


Steps to Reproduce:
1. ./locktests -n 5 -f /mnt/gluster/file
2.
3.
  
Actual results:

TEST : TRY TO WRITE ON A READ  LOCK:=====
TEST : TRY TO WRITE ON A WRITE LOCK:=====
TEST : TRY TO READ  ON A READ  LOCK:=====
TEST : TRY TO READ  ON A WRITE LOCK:=====
TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:=====
TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock
: Resource temporarily unavailable
Echec
: Resource temporarily unavailable
Init
process initalization


Expected results:

All tests should pass


Additional info:

Client log-
[2012-03-06 14:38:48.594050] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. P
roceeding to blocking
[2012-03-06 14:38:48.594110] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction
[2012-03-06 14:38:48.594134] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 0 by 280967ffff7f
0000
[2012-03-06 14:38:48.594152] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. P
roceeding to blocking
[2012-03-06 14:38:48.594418] D [afr-lk-common.c:1427:afr_nonblocking_inodelk] 0-test2-replicate-1: attempting data lock range 0 62 by 244267f
fff7f0000
[2012-03-06 14:38:48.594597] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction
[2012-03-06 14:38:48.594625] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 0 by ff3167ffff7f
0000
[2012-03-06 14:38:48.633362] D [fuse-bridge.c:3157:fuse_setlk_cbk] 0-glusterfs-fuse: Returning EAGAIN Flock: start=0, len=0, pid=7926, lk-owner=10ffff320eff775e
[2012-03-06 14:38:48.633650] D [afr-lk-common.c:406:transaction_lk_op] 0-test2-replicate-1: lk op is for a transaction
[2012-03-06 14:38:48.633681] D [afr-lk-common.c:607:afr_unlock_inodelk] 0-test2-replicate-1: attempting data unlock range 0 62 by 244267ffff7f0000
[2012-03-06 14:38:48.633702] D [afr-transaction.c:1002:afr_post_nonblocking_inodelk_cbk] 0-test2-replicate-1: Non blocking inodelks failed. Proceeding to blocking
[2012-03-06 14:38:48.634019] D [afr-lk-common.c:1013:afr_lock_blocking] 0-test2-replicate-1: we're done locking
[2012-03-06 14:38:48.634051] D [afr-transaction.c:982:afr_post_blocking_inodelk_cbk] 0-test2-replicate-1: Blocking inodelks done. Proceeding to FOP
Comment 1 Raghavendra Bhat 2012-03-06 04:28:47 EST
I think its expected behavior. Can you turn flush-behind off and try again? It should not give EAGAIN with flush-behind off.
Comment 2 Anush Shetty 2012-03-06 04:39:44 EST
I tried with flush-behind off using volume set, but still see the same problem.
Comment 3 Vijay Bellur 2012-04-04 03:45:53 EDT
Anush, can you please confirm behavior on qa33?
Comment 4 Anush Shetty 2012-04-04 05:03:23 EDT
Hi Vijay, still see this issue on qa33.
Comment 5 krishnan parthasarathi 2012-05-10 06:31:14 EDT
locktests doesnt' fail while attempting to set WRITE LOCK on WRITE LOCK on 17b0814243b4ccd56c0ce570b7f42d5e572e1e71 (git commit-id), it 'fails' in "READ LOCK THE WHOLE FILE BYTE BY BYTE" and its WRITE counterpart. I am changing the synopsis to reflect this. The commit log message explains the reason why the issue is observed.
Comment 6 Anand Avati 2012-05-10 18:26:45 EDT
CHANGE: http://review.gluster.com/3306 (locks: Set flock.l_type on successful F_UNLCK to signal last unlock on fd.) merged in master by Anand Avati (avati@redhat.com)
Comment 7 krishnan parthasarathi 2012-05-11 05:33:33 EDT
Created attachment 583779 [details]
The Python script helps in testing "last unlock based unref'ing" in NLM.

./simplepy /path/to/file

The script takes two write locks (fcntl) on 'adjacent' regions, on an fd and unlocks the same. To verify if the last unlock on the fd has the flock.l_type set to F_UNLCK and F_RDLCK otherwise, one must 'gdb' into one of the 'lock servers' (brick) and add breakpoint on pl_lk.
Comment 8 Anush Shetty 2012-05-18 09:31:19 EDT
This is still seen on fuse mount ( latest commit id-0cdb1d147afd644153855f6557bf7e809e5444f0). I had filed this bug for fuse mount.
Comment 9 krishnan parthasarathi 2012-05-20 01:36:31 EDT
(In reply to comment #8)
> This is still seen on fuse mount ( latest commit
> id-0cdb1d147afd644153855f6557bf7e809e5444f0). I had filed this bug for fuse
> mount.

Anush,
The fix is agnostic to the type of mount. The mention of NLM in the bug and commit log is because of what NLM expects of the locks xlator for its smooth functioning.
I am unable to recreate the issue on the commit-id you've mentioned. It would be better if you could provide access to the setup where you are seeing this issue. It would also help if you can attach the output of locktests, which says which test(s) in the collection of tests failed.
Comment 10 krishnan parthasarathi 2012-05-29 06:16:29 EDT
Unable to recreate this issue in a couple of different setups. Removing target milestone 3.3.0, since it is not reproducible consistently.
Comment 11 Amar Tumballi 2012-11-21 04:52:51 EST
closing it as WORKSFORME as in none of our unittesting we could hit it again, If seen again, please re-open.

Note You need to log in before you can comment on or make changes to this bug.