Bug 769283 - POSIX lock test failure for stripe-replicate volume
Summary: POSIX lock test failure for stripe-replicate volume
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: write-behind
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-20 12:57 UTC by shylesh
Modified: 2015-10-22 15:46 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-22 15:46:38 UTC
Regression: ---
Mount Type: ---
Documentation: DP
CRM:
Verified Versions:


Attachments (Terms of Use)

Description shylesh 2011-12-20 12:57:33 UTC
Description of problem:
lock test is failing for stripe-replicate

Version-Release number of selected component (if applicable):
Mainline

How reproducible:
often

Steps to Reproduce:
1.create a stripe-replicate volume
2.mount it
3.run lock tests
  
Actual results:

Init
process initalization
....................
--------------------------------------

TEST : TRY TO WRITE ON A READ  LOCK:==========
TEST : TRY TO WRITE ON A WRITE LOCK:==========
TEST : TRY TO READ  ON A READ  LOCK:==========
TEST : TRY TO READ  ON A WRITE LOCK:==========
TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:==========
TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock
: Resource temporarily unavailable
Echec
: Resource temporarily unavailable


Expected results:
should not abort

Additional info:

Comment 1 shishir gowda 2011-12-29 07:22:34 UTC
This issue seems to be wrt replica. passes on stripe/dht/ volumes.
----------------------
strace output:
for non replica:


open("test", O_RDWR|O_CREAT|O_SYNC, 0600) = 25
write(1, "\n", 1)                       = 1
write(0, "TEST : TRY TO SET A WRITE LOCK O"..., 47) = 47
write(25, "Ceci est une phrase test \303\251crite"..., 62) = 62
fcntl(25, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
...
-----------------
for replica:

open("test", O_RDWR|O_CREAT|O_SYNC, 0600) = 25
write(1, "\n", 1)                       = 1
write(0, "TEST : TRY TO SET A WRITE LOCK O"..., 47) = 47
write(25, "Ceci est une phrase test \303\251crite"..., 62) = 62
fcntl(25, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
dup(2)                                  = 26
fcntl(26, F_GETFL)                      = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(26, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fde74e64000
lseek(26, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
write(26, "Master: can't set lock\n", 23) = 23
write(26, ": Resource temporarily unavailab"..., 35) = 35

------------

Actual run:
root@shishirng:/mnt# strace -o /tmp/trace.afr /opt/qa/tools/locks/locktests -n 10 -f test
Init
process initalization
....................
--------------------------------------

TEST : TRY TO WRITE ON A READ  LOCK:==========
TEST : TRY TO WRITE ON A WRITE LOCK:==========
TEST : TRY TO READ  ON A READ  LOCK:==========
TEST : TRY TO READ  ON A WRITE LOCK:==========
TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:==========
TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock
: Resource temporarily unavailable
Echec
: Resource temporarily unavailable

---------------------

Error logs:


[2011-12-29 12:43:53.939349] D [afr-lk-common.c:405:transaction_lk_op] 0-new1-replicate-0: lk op is for a transaction
[2011-12-29 12:43:53.939367] D [afr-lk-common.c:606:afr_unlock_inodelk] 0-new1-replicate-0: attempting data unlock range 0 0 by 139822922292356
[2011-12-29 12:43:53.939802] D [afr-lk-common.c:1426:afr_nonblocking_inodelk] 0-new1-replicate-0: attempting data lock range 0 62 by 139822922293936
[2011-12-29 12:43:53.940253] D [fuse-bridge.c:3043:fuse_setlk_cbk] 0-glusterfs-fuse: Returning EAGAIN Flock: start=0, len=0, pid=542, lk-owner=1483090
1126550067613

Comment 2 Anand Avati 2012-01-06 21:00:45 UTC
The issue is that write-behind is not barrier'ing the lk calls after the flush-"behind" call completion. Extending write-behind's barrier for all background operations and including the lk() call to enter the wb queue will fix this problem the right way.

Comment 3 Amar Tumballi 2012-02-28 03:24:49 UTC
Du, can you please resend your patches with rebase, review comments? (for master branch only for now).

Comment 4 Raghavendra G 2012-03-20 02:12:45 UTC
(In reply to comment #3)
> Du, can you please resend your patches with rebase, review comments? (for
> master branch only for now).

Its been done, the patch applies fine on 65c6e3706f529094717992 and passes tests.

regards,
Raghavendra.

Comment 5 Amar Tumballi 2012-10-11 10:07:46 UTC
http://review.gluster.org/2610 need a rebase..

Comment 6 Vijay Bellur 2013-04-01 12:25:34 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#11) for review on master by Raghavendra G (raghavendra)

Comment 7 Vijay Bellur 2013-04-02 02:27:29 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#12) for review on master by Raghavendra G (raghavendra)

Comment 8 Vijay Bellur 2013-04-02 02:47:25 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#13) for review on master by Raghavendra G (raghavendra)

Comment 9 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.


Note You need to log in before you can comment on or make changes to this bug.