Bug 769283

Summary: POSIX lock test failure for stripe-replicate volume
Product: [Community] GlusterFS Reporter: shylesh <shmohan>
Component: write-behindAssignee: Raghavendra G <rgowdapp>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: chrisw, gluster-bugs, pkarampu, redhat.bugs, rgowdapp, rwheeler, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:46:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: DP CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description shylesh 2011-12-20 12:57:33 UTC
Description of problem:
lock test is failing for stripe-replicate

Version-Release number of selected component (if applicable):
Mainline

How reproducible:
often

Steps to Reproduce:
1.create a stripe-replicate volume
2.mount it
3.run lock tests
  
Actual results:

Init
process initalization
....................
--------------------------------------

TEST : TRY TO WRITE ON A READ  LOCK:==========
TEST : TRY TO WRITE ON A WRITE LOCK:==========
TEST : TRY TO READ  ON A READ  LOCK:==========
TEST : TRY TO READ  ON A WRITE LOCK:==========
TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:==========
TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock
: Resource temporarily unavailable
Echec
: Resource temporarily unavailable


Expected results:
should not abort

Additional info:

Comment 1 shishir gowda 2011-12-29 07:22:34 UTC
This issue seems to be wrt replica. passes on stripe/dht/ volumes.
----------------------
strace output:
for non replica:


open("test", O_RDWR|O_CREAT|O_SYNC, 0600) = 25
write(1, "\n", 1)                       = 1
write(0, "TEST : TRY TO SET A WRITE LOCK O"..., 47) = 47
write(25, "Ceci est une phrase test \303\251crite"..., 62) = 62
fcntl(25, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
...
-----------------
for replica:

open("test", O_RDWR|O_CREAT|O_SYNC, 0600) = 25
write(1, "\n", 1)                       = 1
write(0, "TEST : TRY TO SET A WRITE LOCK O"..., 47) = 47
write(25, "Ceci est une phrase test \303\251crite"..., 62) = 62
fcntl(25, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 EAGAIN (Resource temporarily unavailable)
dup(2)                                  = 26
fcntl(26, F_GETFL)                      = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(26, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fde74e64000
lseek(26, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
write(26, "Master: can't set lock\n", 23) = 23
write(26, ": Resource temporarily unavailab"..., 35) = 35

------------

Actual run:
root@shishirng:/mnt# strace -o /tmp/trace.afr /opt/qa/tools/locks/locktests -n 10 -f test
Init
process initalization
....................
--------------------------------------

TEST : TRY TO WRITE ON A READ  LOCK:==========
TEST : TRY TO WRITE ON A WRITE LOCK:==========
TEST : TRY TO READ  ON A READ  LOCK:==========
TEST : TRY TO READ  ON A WRITE LOCK:==========
TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:==========
TEST : TRY TO SET A WRITE LOCK ON A WRITE LOCK:Master: can't set lock
: Resource temporarily unavailable
Echec
: Resource temporarily unavailable

---------------------

Error logs:


[2011-12-29 12:43:53.939349] D [afr-lk-common.c:405:transaction_lk_op] 0-new1-replicate-0: lk op is for a transaction
[2011-12-29 12:43:53.939367] D [afr-lk-common.c:606:afr_unlock_inodelk] 0-new1-replicate-0: attempting data unlock range 0 0 by 139822922292356
[2011-12-29 12:43:53.939802] D [afr-lk-common.c:1426:afr_nonblocking_inodelk] 0-new1-replicate-0: attempting data lock range 0 62 by 139822922293936
[2011-12-29 12:43:53.940253] D [fuse-bridge.c:3043:fuse_setlk_cbk] 0-glusterfs-fuse: Returning EAGAIN Flock: start=0, len=0, pid=542, lk-owner=1483090
1126550067613

Comment 2 Anand Avati 2012-01-06 21:00:45 UTC
The issue is that write-behind is not barrier'ing the lk calls after the flush-"behind" call completion. Extending write-behind's barrier for all background operations and including the lk() call to enter the wb queue will fix this problem the right way.

Comment 3 Amar Tumballi 2012-02-28 03:24:49 UTC
Du, can you please resend your patches with rebase, review comments? (for master branch only for now).

Comment 4 Raghavendra G 2012-03-20 02:12:45 UTC
(In reply to comment #3)
> Du, can you please resend your patches with rebase, review comments? (for
> master branch only for now).

Its been done, the patch applies fine on 65c6e3706f529094717992 and passes tests.

regards,
Raghavendra.

Comment 5 Amar Tumballi 2012-10-11 10:07:46 UTC
http://review.gluster.org/2610 need a rebase..

Comment 6 Vijay Bellur 2013-04-01 12:25:34 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#11) for review on master by Raghavendra G (raghavendra)

Comment 7 Vijay Bellur 2013-04-02 02:27:29 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#12) for review on master by Raghavendra G (raghavendra)

Comment 8 Vijay Bellur 2013-04-02 02:47:25 UTC
REVIEW: http://review.gluster.org/2610 (performance/write-behind: implement lk.) posted (#13) for review on master by Raghavendra G (raghavendra)

Comment 9 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.