Bug 1408705

Summary:	[GNFS+EC] Cthon failures/issues with Lock/Special Test cases on disperse volume with GNFS mount
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Manisha Saini <msaini>
Component:	disperse	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	unspecified	Docs Contact:
Priority:	urgent
Version:	rhgs-3.2	CC:	amukherj, aspandey, jthottan, kkeithle, msaini, ndevos, pkarampu, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-14	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1410425 (view as bug list)		Environment:
Last Closed:	2017-03-23 06:00:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528, 1410425

Description Manisha Saini 2016-12-26 13:55:09 UTC

Description of problem:
Cthon Lock test cases are failing with EC volume and GNFS mount.
Cthon Lock test cases are passed with Distributed-Replicate volume with GNFS mount

Cthon special test cases are passing on EC Volume with GNFS mount,But some warnings are being observed in /var/log/glusterfs/nfs.log and in brick logs
These warnings are not observed with dist-repl volume

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-10.el7rhgs.x86_64

How reproducible:
Consistently

Steps to Reproduce:
1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it.
2.Mount volume on client with V3 and Run cthon lock test cases and special test cases


==========
Lock Test cases
==========

#cd /cthon04/ && ./server -l -o vers=3 -p /disperseVol -m /mnt/gnfs -N 1 dhcp47-58.lab.eng.blr.redhat.com

Test #1 - Test regions of an unlocked file.
	Parent: 1.1  - F_TEST  [               0,               1] PASSED.
	Parent: 1.2  - F_TEST  [               0,          ENDING] PASSED.
	Parent: 1.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Parent: 1.4  - F_TEST  [               1,               1] PASSED.
	Parent: 1.5  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 1.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Parent: 1.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Parent: 1.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Parent: 1.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.

Test #2 - Try to lock the whole file.
	Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  2.1  - F_TEST  [               0,               1] PASSED.
	Child:  2.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  2.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Child:  2.4  - F_TEST  [               1,               1] PASSED.
	Child:  2.5  - F_TEST  [               1,          ENDING] PASSED.
	Child:  2.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Child:  2.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  2.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  2.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.
	Parent: 2.10 - F_ULOCK [               0,          ENDING] PASSED.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [               0,               1] PASSED.
	Child:  3.1  - F_TEST  [               0,               1] PASSED.
	Child:  3.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  3.3  - F_TEST  [               1,               1] FAILED!
	Child:  **** Expected success, returned EACCES...
	Child:  **** Probably implementation error.

**  CHILD pass 1 results: 11/11 pass, 0/0 warn, 1/1 fail (pass/total).
	Parent: Child died

** PARENT pass 1 results: 12/12 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed

=======
Special test cases
========

#cd /cthon04/ && ./server -s -o vers=3 -p /disperseVol -m /mnt/gnfs -N 1 dhcp47-58.lab.eng.blr.redhat.com

#tailf /var/log/glusterfs/nfs.log
[2016-12-26 13:42:57.652491] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-1: remote operation failed [No such file or directory]
[2016-12-26 13:42:57.684467] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-9: remote operation failed [No such file or directory]
[2016-12-26 13:42:57.684567] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-10: remote operation failed [No such file or directory]
[2016-12-26 13:42:57.704277] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-8: remote operation failed [No such file or directory]
[2016-12-26 13:42:57.704363] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-7: remote operation failed [No such file or directory]
[2016-12-26 13:42:57.721634] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-disperseVol-client-11: remote operation failed [No such file or directory]

#tailf /var/log/glusterfs/bricks/gluster-brick1-1.log

[2016-12-26 13:42:55.651782] W [MSGID: 113055] [posix.c:5643:do_xattrop] 0-disperseVol-posix: failed to get pfd from fd=0x7f9e697a2ea4 [Exec format error]
[2016-12-26 13:42:55.651978] E [MSGID: 113040] [posix-helpers.c:1672:__posix_fd_ctx_get] 0-disperseVol-posix: Failed to get anonymous real_path: /gluster/brick1/1/.glusterfs/a5/f9/a5f98535-a5f7-4107-aa34-7e041ac3418a _fd = -1 [No such file or directory]
[2016-12-26 13:42:55.651995] W [MSGID: 113055] [posix.c:5643:do_xattrop] 0-disperseVol-posix: failed to get pfd from fd=0x7f9e697a2f30 [Exec format error]
[2016-12-26 13:42:57.651493] W [MSGID: 113055] [posix.c:5643:do_xattrop] 0-disperseVol-posix: failed to get pfd from fd=0x7f9e697a2be8 [Exec format error]


Actual results:
Cthon lock test cases are failing with EC volume and GNFS mount and Cthon special test cases are passing but showing warnings in nfs.logs

Expected results:
Cthon test cases should pass successfully without any warnings/failures 

Additional info:

Comment 3 Atin Mukherjee 2016-12-27 04:06:05 UTC

@Pranith - could you take a look at it? Looks like this only fails for EC volumes.

Comment 4 Soumya Koduri 2016-12-28 06:44:01 UTC

Also could you please check if this is regression from previous RHGS release?

Comment 6 Soumya Koduri 2017-01-04 10:32:44 UTC

Looks like there is an issue with posix locks in EC xlator.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [               0,               1] PASSED.
	Child:  3.1  - F_TEST  [               0,               1] PASSED.
	Child:  3.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  3.3  - F_TEST  [               1,               1] FAILED!
	Child:  **** Expected success, returned EACCES...
	Child:  **** Probably implementation error.

Here Parent process has lock starting at byte '0' and length '1'. Child process is trying to take a lock starting at byte '1' of length '1' which should have ideally got granted.


(gdb) bt
#0  client3_3_lk (frame=0x7f8e5388bdbc, this=0x7f8e44020b00, data=0x7f8e48db7ec0) at client-rpc-fops.c:5349
#1  0x00007f8e4836e4be in client_lk (frame=0x7f8e5388bdbc, this=<optimized out>, fd=<optimized out>, cmd=5, 
    lock=<optimized out>, xdata=<optimized out>) at client.c:1652
#2  0x00007f8e4370204d in ec_wind_lk (ec=0x7f8e44067700, fop=0x7f8e380dc06c, idx=0) at ec-locks.c:1003
#3  0x00007f8e436f7f08 in ec_dispatch_mask (fop=fop@entry=0x7f8e380dc06c, mask=63) at ec-common.c:521
#4  0x00007f8e436f8159 in ec_dispatch_all (fop=fop@entry=0x7f8e380dc06c) at ec-common.c:597
#5  0x00007f8e43704280 in ec_manager_lk (fop=0x7f8e380dc06c, state=<optimized out>) at ec-locks.c:1028
#6  0x00007f8e436f779b in __ec_manager (fop=0x7f8e380dc06c, error=0) at ec-common.c:2287
#7  0x00007f8e436f148c in ec_gf_lk (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, 
    cmd=<optimized out>, flock=<optimized out>, xdata=<optimized out>) at ec.c:888
#8  0x00007f8e48141865 in dht_lk (frame=frame@entry=0x7f8e53894808, this=this@entry=0x7f8e440320f0, 
    fd=fd@entry=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, xdata=xdata@entry=0x0)
    at dht-inode-read.c:1067
#9  0x00007f8e55e24e39 in default_lk (frame=0x7f8e53894808, this=<optimized out>, fd=0x7f8e5609d0f8, cmd=5, 
    lock=0x7f8e48db87a0, xdata=0x0) at defaults.c:2524
#10 0x00007f8e43bd39e4 in io_stats_lk (frame=0x7f8e53889a20, this=0x7f8e44033e70, fd=0x7f8e5609d0f8, cmd=5, 
    lock=0x7f8e48db87a0, xdata=0x0) at io-stats.c:3345
#11 0x00007f8e43495752 in nfs_fop_lk (nfsx=<optimized out>, xl=0x7f8e44033e70, nfu=nfu@entry=0x7f8e48db8bc0, 
    fd=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, cbk=0x7f8e434be0c0 <nlm4svc_test_cbk>, 
    local=0x7f8e398cbd04) at nfs-fops.c:1565
#12 0x00007f8e43498105 in nfs_lk (nfsx=<optimized out>, xl=<optimized out>, nfu=nfu@entry=0x7f8e48db8bc0, 
    fd=<optimized out>, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, 
    cbk=cbk@entry=0x7f8e434be0c0 <nlm4svc_test_cbk>, local=local@entry=0x7f8e398cbd04) at nfs-generics.c:151
#13 0x00007f8e434be6bc in nlm4_test_fd_resume (carg=carg@entry=0x7f8e398cbd04) at nlm4.c:806
#14 0x00007f8e434be74a in nlm4_test_resume (carg=0x7f8e398cbd04) at nlm4.c:830
#15 0x00007f8e434baa9c in nfs3_fh_resolve_inode_done (cs=cs@entry=0x7f8e398cbd04, inode=inode@entry=0x7f8e41e25128)
    at nfs3-helpers.c:3619
#16 0x00007f8e434bb2db in nfs3_fh_resolve_inode (cs=0x7f8e398cbd04) at nfs3-helpers.c:3828
#17 0x00007f8e434bb385 in nfs3_fh_resolve_resume (cs=cs@entry=0x7f8e398cbd04) at nfs3-helpers.c:3860
#18 0x00007f8e434bb5a8 in nfs3_fh_resolve_root (cs=cs@entry=0x7f8e398cbd04) at nfs3-helpers.c:3915
#19 0x00007f8e434bb7f1 in nfs3_fh_resolve_and_resume (cs=cs@entry=0x7f8e398cbd04, fh=fh@entry=0x7f8e48db98f0, 
    entry=entry@entry=0x0, resum_fn=resum_fn@entry=0x7f8e434be6f0 <nlm4_test_resume>) at nfs3-helpers.c:4011
#20 0x00007f8e434be311 in nlm4svc_test (req=0x7f8e48040b70) at nlm4.c:887
#21 0x00007f8e55b6f775 in rpcsvc_handle_rpc_call (svc=0x7f8e440480e0, trans=trans@entry=0x7f8e44522130, 
    msg=<optimized out>) at rpcsvc.c:695
#22 0x00007f8e55b6f95b in rpcsvc_notify (trans=0x7f8e44522130, mydata=<optimized out>, event=<optimized out>, 
    data=<optimized out>) at rpcsvc.c:789
#23 0x00007f8e55b71893 in rpc_transport_notify (this=this@entry=0x7f8e44522130, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f8e44422750) at rpc-transport.c:538
#24 0x00007f8e4a6632d4 in socket_event_poll_in (this=this@entry=0x7f8e44522130) at socket.c:2267
#25 0x00007f8e4a665785 in socket_event_handler (fd=<optimized out>, idx=31, data=0x7f8e44522130, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2397
#26 0x00007f8e55e05650 in event_dispatch_epoll_handler (event=0x7f8e48db9e80, event_pool=0x7f8e5791df00)
    at event-epoll.c:571
#27 event_dispatch_epoll_worker (data=0x7f8e579708b0) at event-epoll.c:674
#28 0x00007f8e54c0cdc5 in start_thread () from /lib64/libpthread.so.0
#29 0x00007f8e5455173d in clone () from /lib64/libc.so.6
(gdb) f 5
#5  0x00007f8e43704280 in ec_manager_lk (fop=0x7f8e380dc06c, state=<optimized out>) at ec-locks.c:1028
1028	            ec_dispatch_all(fop);
(gdb) l
1023	            }
1024	
1025	        /* Fall through */
1026	
1027	        case EC_STATE_DISPATCH:
1028	            ec_dispatch_all(fop);
1029	
1030	            return EC_STATE_PREPARE_ANSWER;
1031	
1032	        case EC_STATE_PREPARE_ANSWER:
(gdb) f 4
#4  0x00007f8e436f8159 in ec_dispatch_all (fop=fop@entry=0x7f8e380dc06c) at ec-common.c:597
597	                ec_dispatch_mask(fop, fop->remaining);
(gdb) l
592	
593	        if (ec_child_select(fop)) {
594	                fop->expected = gf_bits_count(fop->remaining);
595	                fop->first = 0;
596	
597	                ec_dispatch_mask(fop, fop->remaining);
598	        }
599	}
600	
601	void ec_dispatch_min(ec_fop_data_t * fop)
(gdb) f 3
#3  0x00007f8e436f7f08 in ec_dispatch_mask (fop=fop@entry=0x7f8e380dc06c, mask=63) at ec-common.c:521
521	            fop->wind(ec, fop, idx);
(gdb) l
516	    idx = 0;
517	    while (mask != 0)
518	    {
519	        if ((mask & 1) != 0)
520	        {
521	            fop->wind(ec, fop, idx);
522	        }
523	        idx++;
524	        mask >>= 1;
525	    }
(gdb) f 2
#2  0x00007f8e4370204d in ec_wind_lk (ec=0x7f8e44067700, fop=0x7f8e380dc06c, idx=0) at ec-locks.c:1003
1003	    STACK_WIND_COOKIE(fop->frame, ec_lk_cbk, (void *)(uintptr_t)idx,
(gdb) p fop->flock
$11 = {l_type = 1, l_whence = 0, l_start = 0, l_len = 512, l_pid = 164, l_owner = {len = 36, 
    data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}}
(gdb) f 8
#8  0x00007f8e48141865 in dht_lk (frame=frame@entry=0x7f8e53894808, this=this@entry=0x7f8e440320f0, 
    fd=fd@entry=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, xdata=xdata@entry=0x0)
    at dht-inode-read.c:1067
1067	        STACK_WIND (frame, dht_lk_cbk, lock_subvol, lock_subvol->fops->lk, fd,
(gdb) p *flock
$12 = {l_type = 1, l_whence = 0, l_start = 1, l_len = 1, l_pid = 164, l_owner = {len = 36, 
    data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}}

>>>> As we can see above, till dht layer flock->start was '1' and len is '1'.

(gdb) f 7
#7  0x00007f8e436f148c in ec_gf_lk (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, 
    cmd=<optimized out>, flock=<optimized out>, xdata=<optimized out>) at ec.c:888
888	    ec_lk(frame, this, -1, minimum, default_lk_cbk, NULL, fd, cmd,
(gdb) p *flock
value has been optimized out
(gdb) p fop
No symbol "fop" in current context.
(gdb) l
883	                 int32_t cmd, struct gf_flock * flock, dict_t * xdata)
884	{
885	    int32_t minimum = EC_MINIMUM_ALL;
886	    if (flock->l_type == F_UNLCK)
887	            minimum = EC_MINIMUM_ONE;
888	    ec_lk(frame, this, -1, minimum, default_lk_cbk, NULL, fd, cmd,
889	          flock, xdata);
890	
891	    return 0;
892	}
(gdb) p *flock
value has been optimized out
(gdb) f 6
#6  0x00007f8e436f779b in __ec_manager (fop=0x7f8e380dc06c, error=0) at ec-common.c:2287
2287	        fop->state = fop->handler(fop, fop->state);
(gdb) p fop
$13 = (ec_fop_data_t *) 0x7f8e380dc06c
(gdb) p fop->flock
$14 = {l_type = 1, l_whence = 0, l_start = 0, l_len = 512, l_pid = 164, l_owner = {len = 36, 
    data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}}
(gdb) 

>>> But in EC layer, fop->flock->start got reset to '0' and lock length to 512 resulting in  access denied for the other process trying to take lock in non-conflicting range.

Have check with Pranith on this. Since this issue seem to be in EC xlator, correcting the components.

Comment 7 Niels de Vos 2017-01-04 11:30:37 UTC

(In reply to Soumya Koduri from comment #6)
> Looks like there is an issue with posix locks in EC xlator.
...
> >>> But in EC layer, fop->flock->start got reset to '0' and lock length to 512 resulting in  access denied for the other process trying to take lock in non-conflicting range.
> 
> Have check with Pranith on this. Since this issue seem to be in EC xlator,
> correcting the components.

I suspect that this is expected behaviour. EC encodes blocks of data and can (most likely) not encode random partial blocks. When a write is happening, a whole block needs to be encoded, requiring a lock on the whole data-range. Thsi should fail in case a conflicting lock is present...

Ashish?

Comment 11 Pranith Kumar K 2017-01-20 13:08:43 UTC

http://review.gluster.org/#/c/16445 - upstream patch

Comment 14 Manisha Saini 2017-02-07 12:06:33 UTC

Verified this bug on glusterfs-ganesha-3.8.4-14.el7rhgs.x86_64.
Cthon lock test suit with EC+GNFS is passed successfully

Comment 16 errata-xmlrpc 2017-03-23 06:00:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html