Bug 1408705
Summary: | [GNFS+EC] Cthon failures/issues with Lock/Special Test cases on disperse volume with GNFS mount | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Manisha Saini <msaini> | |
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | |
Status: | CLOSED ERRATA | QA Contact: | Manisha Saini <msaini> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | urgent | |||
Version: | rhgs-3.2 | CC: | amukherj, aspandey, jthottan, kkeithle, msaini, ndevos, pkarampu, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-14 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1410425 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 06:00:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351528, 1410425 |
Description
Manisha Saini
2016-12-26 13:55:09 UTC
@Pranith - could you take a look at it? Looks like this only fails for EC volumes. Also could you please check if this is regression from previous RHGS release? Looks like there is an issue with posix locks in EC xlator. Test #3 - Try to lock just the 1st byte. Parent: 3.0 - F_TLOCK [ 0, 1] PASSED. Child: 3.1 - F_TEST [ 0, 1] PASSED. Child: 3.2 - F_TEST [ 0, ENDING] PASSED. Child: 3.3 - F_TEST [ 1, 1] FAILED! Child: **** Expected success, returned EACCES... Child: **** Probably implementation error. Here Parent process has lock starting at byte '0' and length '1'. Child process is trying to take a lock starting at byte '1' of length '1' which should have ideally got granted. (gdb) bt #0 client3_3_lk (frame=0x7f8e5388bdbc, this=0x7f8e44020b00, data=0x7f8e48db7ec0) at client-rpc-fops.c:5349 #1 0x00007f8e4836e4be in client_lk (frame=0x7f8e5388bdbc, this=<optimized out>, fd=<optimized out>, cmd=5, lock=<optimized out>, xdata=<optimized out>) at client.c:1652 #2 0x00007f8e4370204d in ec_wind_lk (ec=0x7f8e44067700, fop=0x7f8e380dc06c, idx=0) at ec-locks.c:1003 #3 0x00007f8e436f7f08 in ec_dispatch_mask (fop=fop@entry=0x7f8e380dc06c, mask=63) at ec-common.c:521 #4 0x00007f8e436f8159 in ec_dispatch_all (fop=fop@entry=0x7f8e380dc06c) at ec-common.c:597 #5 0x00007f8e43704280 in ec_manager_lk (fop=0x7f8e380dc06c, state=<optimized out>) at ec-locks.c:1028 #6 0x00007f8e436f779b in __ec_manager (fop=0x7f8e380dc06c, error=0) at ec-common.c:2287 #7 0x00007f8e436f148c in ec_gf_lk (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, cmd=<optimized out>, flock=<optimized out>, xdata=<optimized out>) at ec.c:888 #8 0x00007f8e48141865 in dht_lk (frame=frame@entry=0x7f8e53894808, this=this@entry=0x7f8e440320f0, fd=fd@entry=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, xdata=xdata@entry=0x0) at dht-inode-read.c:1067 #9 0x00007f8e55e24e39 in default_lk (frame=0x7f8e53894808, this=<optimized out>, fd=0x7f8e5609d0f8, cmd=5, lock=0x7f8e48db87a0, xdata=0x0) at defaults.c:2524 #10 0x00007f8e43bd39e4 in io_stats_lk (frame=0x7f8e53889a20, this=0x7f8e44033e70, fd=0x7f8e5609d0f8, cmd=5, lock=0x7f8e48db87a0, xdata=0x0) at io-stats.c:3345 #11 0x00007f8e43495752 in nfs_fop_lk (nfsx=<optimized out>, xl=0x7f8e44033e70, nfu=nfu@entry=0x7f8e48db8bc0, fd=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, cbk=0x7f8e434be0c0 <nlm4svc_test_cbk>, local=0x7f8e398cbd04) at nfs-fops.c:1565 #12 0x00007f8e43498105 in nfs_lk (nfsx=<optimized out>, xl=<optimized out>, nfu=nfu@entry=0x7f8e48db8bc0, fd=<optimized out>, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, cbk=cbk@entry=0x7f8e434be0c0 <nlm4svc_test_cbk>, local=local@entry=0x7f8e398cbd04) at nfs-generics.c:151 #13 0x00007f8e434be6bc in nlm4_test_fd_resume (carg=carg@entry=0x7f8e398cbd04) at nlm4.c:806 #14 0x00007f8e434be74a in nlm4_test_resume (carg=0x7f8e398cbd04) at nlm4.c:830 #15 0x00007f8e434baa9c in nfs3_fh_resolve_inode_done (cs=cs@entry=0x7f8e398cbd04, inode=inode@entry=0x7f8e41e25128) at nfs3-helpers.c:3619 #16 0x00007f8e434bb2db in nfs3_fh_resolve_inode (cs=0x7f8e398cbd04) at nfs3-helpers.c:3828 #17 0x00007f8e434bb385 in nfs3_fh_resolve_resume (cs=cs@entry=0x7f8e398cbd04) at nfs3-helpers.c:3860 #18 0x00007f8e434bb5a8 in nfs3_fh_resolve_root (cs=cs@entry=0x7f8e398cbd04) at nfs3-helpers.c:3915 #19 0x00007f8e434bb7f1 in nfs3_fh_resolve_and_resume (cs=cs@entry=0x7f8e398cbd04, fh=fh@entry=0x7f8e48db98f0, entry=entry@entry=0x0, resum_fn=resum_fn@entry=0x7f8e434be6f0 <nlm4_test_resume>) at nfs3-helpers.c:4011 #20 0x00007f8e434be311 in nlm4svc_test (req=0x7f8e48040b70) at nlm4.c:887 #21 0x00007f8e55b6f775 in rpcsvc_handle_rpc_call (svc=0x7f8e440480e0, trans=trans@entry=0x7f8e44522130, msg=<optimized out>) at rpcsvc.c:695 #22 0x00007f8e55b6f95b in rpcsvc_notify (trans=0x7f8e44522130, mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at rpcsvc.c:789 #23 0x00007f8e55b71893 in rpc_transport_notify (this=this@entry=0x7f8e44522130, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f8e44422750) at rpc-transport.c:538 #24 0x00007f8e4a6632d4 in socket_event_poll_in (this=this@entry=0x7f8e44522130) at socket.c:2267 #25 0x00007f8e4a665785 in socket_event_handler (fd=<optimized out>, idx=31, data=0x7f8e44522130, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397 #26 0x00007f8e55e05650 in event_dispatch_epoll_handler (event=0x7f8e48db9e80, event_pool=0x7f8e5791df00) at event-epoll.c:571 #27 event_dispatch_epoll_worker (data=0x7f8e579708b0) at event-epoll.c:674 #28 0x00007f8e54c0cdc5 in start_thread () from /lib64/libpthread.so.0 #29 0x00007f8e5455173d in clone () from /lib64/libc.so.6 (gdb) f 5 #5 0x00007f8e43704280 in ec_manager_lk (fop=0x7f8e380dc06c, state=<optimized out>) at ec-locks.c:1028 1028 ec_dispatch_all(fop); (gdb) l 1023 } 1024 1025 /* Fall through */ 1026 1027 case EC_STATE_DISPATCH: 1028 ec_dispatch_all(fop); 1029 1030 return EC_STATE_PREPARE_ANSWER; 1031 1032 case EC_STATE_PREPARE_ANSWER: (gdb) f 4 #4 0x00007f8e436f8159 in ec_dispatch_all (fop=fop@entry=0x7f8e380dc06c) at ec-common.c:597 597 ec_dispatch_mask(fop, fop->remaining); (gdb) l 592 593 if (ec_child_select(fop)) { 594 fop->expected = gf_bits_count(fop->remaining); 595 fop->first = 0; 596 597 ec_dispatch_mask(fop, fop->remaining); 598 } 599 } 600 601 void ec_dispatch_min(ec_fop_data_t * fop) (gdb) f 3 #3 0x00007f8e436f7f08 in ec_dispatch_mask (fop=fop@entry=0x7f8e380dc06c, mask=63) at ec-common.c:521 521 fop->wind(ec, fop, idx); (gdb) l 516 idx = 0; 517 while (mask != 0) 518 { 519 if ((mask & 1) != 0) 520 { 521 fop->wind(ec, fop, idx); 522 } 523 idx++; 524 mask >>= 1; 525 } (gdb) f 2 #2 0x00007f8e4370204d in ec_wind_lk (ec=0x7f8e44067700, fop=0x7f8e380dc06c, idx=0) at ec-locks.c:1003 1003 STACK_WIND_COOKIE(fop->frame, ec_lk_cbk, (void *)(uintptr_t)idx, (gdb) p fop->flock $11 = {l_type = 1, l_whence = 0, l_start = 0, l_len = 512, l_pid = 164, l_owner = {len = 36, data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}} (gdb) f 8 #8 0x00007f8e48141865 in dht_lk (frame=frame@entry=0x7f8e53894808, this=this@entry=0x7f8e440320f0, fd=fd@entry=0x7f8e5609d0f8, cmd=cmd@entry=5, flock=flock@entry=0x7f8e48db87a0, xdata=xdata@entry=0x0) at dht-inode-read.c:1067 1067 STACK_WIND (frame, dht_lk_cbk, lock_subvol, lock_subvol->fops->lk, fd, (gdb) p *flock $12 = {l_type = 1, l_whence = 0, l_start = 1, l_len = 1, l_pid = 164, l_owner = {len = 36, data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}} >>>> As we can see above, till dht layer flock->start was '1' and len is '1'. (gdb) f 7 #7 0x00007f8e436f148c in ec_gf_lk (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, cmd=<optimized out>, flock=<optimized out>, xdata=<optimized out>) at ec.c:888 888 ec_lk(frame, this, -1, minimum, default_lk_cbk, NULL, fd, cmd, (gdb) p *flock value has been optimized out (gdb) p fop No symbol "fop" in current context. (gdb) l 883 int32_t cmd, struct gf_flock * flock, dict_t * xdata) 884 { 885 int32_t minimum = EC_MINIMUM_ALL; 886 if (flock->l_type == F_UNLCK) 887 minimum = EC_MINIMUM_ONE; 888 ec_lk(frame, this, -1, minimum, default_lk_cbk, NULL, fd, cmd, 889 flock, xdata); 890 891 return 0; 892 } (gdb) p *flock value has been optimized out (gdb) f 6 #6 0x00007f8e436f779b in __ec_manager (fop=0x7f8e380dc06c, error=0) at ec-common.c:2287 2287 fop->state = fop->handler(fop, fop->state); (gdb) p fop $13 = (ec_fop_data_t *) 0x7f8e380dc06c (gdb) p fop->flock $14 = {l_type = 1, l_whence = 0, l_start = 0, l_len = 512, l_pid = 164, l_owner = {len = 36, data = "164.eng.blr.redhat.com", '\000' <repeats 987 times>}} (gdb) >>> But in EC layer, fop->flock->start got reset to '0' and lock length to 512 resulting in access denied for the other process trying to take lock in non-conflicting range. Have check with Pranith on this. Since this issue seem to be in EC xlator, correcting the components. (In reply to Soumya Koduri from comment #6) > Looks like there is an issue with posix locks in EC xlator. ... > >>> But in EC layer, fop->flock->start got reset to '0' and lock length to 512 resulting in access denied for the other process trying to take lock in non-conflicting range. > > Have check with Pranith on this. Since this issue seem to be in EC xlator, > correcting the components. I suspect that this is expected behaviour. EC encodes blocks of data and can (most likely) not encode random partial blocks. When a write is happening, a whole block needs to be encoded, requiring a lock on the whole data-range. Thsi should fail in case a conflicting lock is present... Ashish? http://review.gluster.org/#/c/16445 - upstream patch Verified this bug on glusterfs-ganesha-3.8.4-14.el7rhgs.x86_64. Cthon lock test suit with EC+GNFS is passed successfully Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |