Bug 1484709
Summary: | [geo-rep+qr]: Crashes observed at slave from qr_lookup_sbk during rename/hardlink/rebalance cases | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
Component: | distribute | Assignee: | bugs <bugs> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | atumball, bugs, nbalacha, pkarampu, ravishankar, rhinduja, rhs-bugs | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.13.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1484113 | |||
: | 1486538 (view as bug list) | Environment: | ||
Last Closed: | 2017-12-08 17:39:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1484113, 1486538 |
Description
Nithya Balachandran
2017-08-24 08:00:31 UTC
REVIEW: https://review.gluster.org/18115 (perf/qr: Use a ref-ed data to extract content) posted (#1) for review on master by N Balachandran (nbalacha) COMMIT: https://review.gluster.org/18115 committed in master by Raghavendra G (rgowdapp) ------ commit 414d3e92fc56f08e320a3aa65b6b18e65b384551 Author: N Balachandran <nbalacha> Date: Thu Aug 24 13:39:07 2017 +0530 perf/qr: Use a ref-ed data to extract content qr_content_extract used dict_get to get the value of the GF_CONTENT_KEY key. dict_get does not ref the data before returning it so QR could be acting on freed memory if another thread deletes the key before then. This patch also fixes a race in dict_get_with_ref. Fix: Use dict_get_with_ref to retrieve the file contents. Change-Id: Ib1a7a70bb92eed7e70747ec530e0b3edc53127ec BUG: 1484709 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18115 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Tested-by: Raghavendra G <rgowdapp> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> > I think the underlying cause of this is that qr_content_extract uses dict_get() to get the data value. As the data structure has not been ref-ed , it is possible that it was deleted elsewhere causing qr to use a freed structure. > I see afr_lookup_done calls > dict_del (replies[read_subvol].xdata, GF_CONTENT_KEY); > I need to check with the AFR team if this can be called in this code path. In the meantime, use dict_get_with_ref in qr_content_extract. Since the bug describes a distribute-replicate setup, I checked the code of afr_lookup_done. GF_CONTENT_KEY is deleted before afr calls STACK_UNWIND. So, this can't happen in parallel to qr_content_extract. RCA: (gdb) bt #0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6 #1 0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 #3 0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0, cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117, inode_ret=0x7f212c05bb60, buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at quick-read.c:423 #4 0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at io-cache.c:267 #5 0x00007f2146614627 in dht_discover_complete (this=this@entry=0x7f2140016f40, discover_frame=discover_frame@entry=0x7f212c061500) at dht-common.c:577 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 #7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624 #8 afr_discover_cbk (frame=frame@entry=0x7f212c0546c0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f212c05bb60, buf=buf@entry=0x7f2147537920, xdata=0x7f21400adca0, postparent=postparent@entry=0x7f2147537990) at afr-common.c:2669 #9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380) at client-rpc-fops.c:2947 #10 0x00007f21543f2840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2140080400, pollin=pollin@entry=0x7f21400ee830) at rpc-clnt.c:794 #11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at rpc-clnt.c:987 #12 0x00007f21543ee9e3 in rpc_transport_notify (this=this@entry=0x7f21400805a0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f21400ee830) at rpc-transport.c:538 #13 0x00007f2148fef3d6 in socket_event_poll_in (this=this@entry=0x7f21400805a0, notify_handled=<optimized out>) at socket.c:2306 #14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1, data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80, event_pool=0x56504ff3cee0) at event-epoll.c:572 #16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648 #17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f2152d5534d in clone () from /lib64/libc.so.6 (gdb) f 6 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 700 dht_discover_complete (this, frame); (gdb) l 695 * which calls STACK_DESTROY (frame). In the case of "attempt_unwind", 696 * this makes sure that the thread don't call dht_frame_return, till 697 * call to dht_discover_complete is done. 698 */ 699 if (attempt_unwind) { 700 dht_discover_complete (this, frame); <--- Here 701 } 702 703 this_call_cnt = dht_frame_return (frame); 704 dht_discover_cbk has found a file so it tries to unwind before all subvols have responded. So it is possible that other subvols have not yet responded. Checking the frame for dht_discover_cbk: gdb) f 6 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 700 dht_discover_complete (this, frame); (gdb) p local->file_count $6 = 3 Strangely a file with this gfid was found on _all_ 3 children. As local->file_count doe not distinguish between data files or a linkto files,it seems like there was a stale (?) linkto file on at least one subvol. So at this point all 3 subvols have at least unwound to dht_discover_cbk. (gdb) p local->dir_count $7 = 0 (gdb) p is_linkfile $8 = <optimized out> (gdb) p local->cached_subvol $9 = (xlator_t *) 0x7f2140014d20 (gdb) p local->hashed_subvol $10 = (xlator_t *) 0x0 (gdb) p *local->cached_subvol $11 = {name = 0x7f21400158c0 "Slave-replicate-1", type = 0x7f2140015a40 "cluster/replicate", instance_name = 0x0, next = 0x7f21400135b0, prev = 0x7f2140015e30, parents = 0x7f2140018630, children = 0x7f2140015d10, options = 0x7f2140015910, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140015b40, prev = 0x7f2140015b40}, fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f214005e0a0, init_succeeded = 1 '\001', private = 0x7f214005db00, mem_acct = 0x7f214005a0b0, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0} (gdb) p *prev $12 = {name = 0x7f2140012740 "Slave-replicate-0", type = 0x7f2140014260 "cluster/replicate", instance_name = 0x0, next = 0x7f21400118a0, prev = 0x7f2140014d20, parents = 0x7f2140018480, children = 0x7f2140014c00, options = 0x7f2140014130, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140014a30, prev = 0x7f2140014a30}, fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f2140065080, init_succeeded = 1 '\001', private = 0x7f2140064ae0, mem_acct = 0x7f2140061090, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0} As the local->cached subvol does not match prev, it is clear that local->cached_subvol was updated after this thread proceeded to unwind to the higher xlators. This means a data file was found on at least 2 subvols - this could be because a linkto file was in the process of being created when the lookup hit the brick. In dht_discover_complete () : local = discover_frame->local; layout = local->layout; conf = this->private; LOCK(&discover_frame->lock); { main_frame = local->main_frame; local->main_frame = NULL; } UNLOCK(&discover_frame->lock); if (!main_frame) return 0; <--- So far so good - only one thread will unwind here. However, dicover_frame->local is valid and _still_ accessible to other threads unwinding to dht_discover_cbk. In dht_discover_cbk: ... local = frame->local; ... LOCK (&frame->lock); { ... if (is_dir) { local->dir_count ++; } else { local->file_count ++; if (!is_linkfile) { /* real file */ local->cached_subvol = prev; attempt_unwind = 1; } else { goto unlock; } } ... if (local->xattr == NULL) { local->xattr = dict_ref (xattr); <-- say thread 1 (the one that crashed )did this } else { dht_aggregate_xattr (local->xattr, xattr); <--- and thread 2 called this (this should _only_ be done for directories.) } In dht_aggregate_xattr () -> dht_aggregate (): ret = dict_set (dst, key, value); if (ret) { gf_msg ("dht", GF_LOG_WARNING, 0, DHT_MSG_DICT_SET_FAILED, "Failed to set dictionary value: key = %s", key); } dict_set _replaces_ the key which unrefs the old data, causing its ref-count to become 0 and hence destroying it. So here is probably what happened: Time: subvol-0 subvol-1 t0 dht_discover_cbk local->xattr = dict_ref (dict0); dict_set (dict0, GF_CONTENT_KEY, data0) attempt_unwind = 1 t1 dht_discover_complete t2 qr_content_extract gets data0 t3 dht_discover_cbk dht_aggregate_xattr (local->xattr, xattr) |->dict_set (dict0, GF_CONTENT_KEY, data1) |-><-- data_unref (data0) <-- As ref-count = 0, data0 is destroyed attempt_unwind = 1 t3 qr_content_extract accesses data0 and crashes REVIEW: https://review.gluster.org/18137 (cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk) posted (#1) for review on master by N Balachandran (nbalacha) COMMIT: https://review.gluster.org/18137 committed in master by Raghavendra G (rgowdapp) ------ commit 9420022df0962b1fa4f3ea3774249be81bc945cc Author: N Balachandran <nbalacha> Date: Tue Aug 29 21:12:51 2017 +0530 cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk If dht_discover finds data files on more than one subvol, racing calls to dht_discover_cbk could end up calling dht_aggregate_xattr which could delete dictionary data that is being accessed by higher layer translators. Fixed to call dht_aggregate_xattr only for directories and consider only the first file to be found. Change-Id: I4f3d2a405ec735d4f1bb33a04b7255eb2d179f8a BUG: 1484709 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18137 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/ |