Bug 1484709
| Summary: | [geo-rep+qr]: Crashes observed at slave from qr_lookup_sbk during rename/hardlink/rebalance cases | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
| Component: | distribute | Assignee: | bugs <bugs> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | atumball, bugs, nbalacha, pkarampu, ravishankar, rhinduja, rhs-bugs | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.13.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1484113 | |||
| : | 1486538 (view as bug list) | Environment: | ||
| Last Closed: | 2017-12-08 17:39:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1484113, 1486538 | |||
|
Description
Nithya Balachandran
2017-08-24 08:00:31 UTC
REVIEW: https://review.gluster.org/18115 (perf/qr: Use a ref-ed data to extract content) posted (#1) for review on master by N Balachandran (nbalacha) COMMIT: https://review.gluster.org/18115 committed in master by Raghavendra G (rgowdapp) ------ commit 414d3e92fc56f08e320a3aa65b6b18e65b384551 Author: N Balachandran <nbalacha> Date: Thu Aug 24 13:39:07 2017 +0530 perf/qr: Use a ref-ed data to extract content qr_content_extract used dict_get to get the value of the GF_CONTENT_KEY key. dict_get does not ref the data before returning it so QR could be acting on freed memory if another thread deletes the key before then. This patch also fixes a race in dict_get_with_ref. Fix: Use dict_get_with_ref to retrieve the file contents. Change-Id: Ib1a7a70bb92eed7e70747ec530e0b3edc53127ec BUG: 1484709 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18115 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Tested-by: Raghavendra G <rgowdapp> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> > I think the underlying cause of this is that qr_content_extract uses dict_get() to get the data value. As the data structure has not been ref-ed , it is possible that it was deleted elsewhere causing qr to use a freed structure. > I see afr_lookup_done calls > dict_del (replies[read_subvol].xdata, GF_CONTENT_KEY); > I need to check with the AFR team if this can be called in this code path. In the meantime, use dict_get_with_ref in qr_content_extract. Since the bug describes a distribute-replicate setup, I checked the code of afr_lookup_done. GF_CONTENT_KEY is deleted before afr calls STACK_UNWIND. So, this can't happen in parallel to qr_content_extract. RCA:
(gdb) bt
#0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6
#1 0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279
#3 0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0, cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117, inode_ret=0x7f212c05bb60,
buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at quick-read.c:423
#4 0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>,
inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at io-cache.c:267
#5 0x00007f2146614627 in dht_discover_complete (this=this@entry=0x7f2140016f40, discover_frame=discover_frame@entry=0x7f212c061500) at dht-common.c:577
#6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117,
inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700
#7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624
#8 afr_discover_cbk (frame=frame@entry=0x7f212c0546c0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>,
inode=inode@entry=0x7f212c05bb60, buf=buf@entry=0x7f2147537920, xdata=0x7f21400adca0, postparent=postparent@entry=0x7f2147537990) at afr-common.c:2669
#9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380)
at client-rpc-fops.c:2947
#10 0x00007f21543f2840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2140080400, pollin=pollin@entry=0x7f21400ee830) at rpc-clnt.c:794
#11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at rpc-clnt.c:987
#12 0x00007f21543ee9e3 in rpc_transport_notify (this=this@entry=0x7f21400805a0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f21400ee830)
at rpc-transport.c:538
#13 0x00007f2148fef3d6 in socket_event_poll_in (this=this@entry=0x7f21400805a0, notify_handled=<optimized out>) at socket.c:2306
#14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1, data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458
#15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80, event_pool=0x56504ff3cee0) at event-epoll.c:572
#16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648
#17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f2152d5534d in clone () from /lib64/libc.so.6
(gdb) f 6
#6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117,
inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700
700 dht_discover_complete (this, frame);
(gdb) l
695 * which calls STACK_DESTROY (frame). In the case of "attempt_unwind",
696 * this makes sure that the thread don't call dht_frame_return, till
697 * call to dht_discover_complete is done.
698 */
699 if (attempt_unwind) {
700 dht_discover_complete (this, frame); <--- Here
701 }
702
703 this_call_cnt = dht_frame_return (frame);
704
dht_discover_cbk has found a file so it tries to unwind before all subvols have responded. So it is possible that other subvols have not yet responded.
Checking the frame for dht_discover_cbk:
gdb) f 6
#6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117,
inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700
700 dht_discover_complete (this, frame);
(gdb) p local->file_count
$6 = 3
Strangely a file with this gfid was found on _all_ 3 children. As local->file_count doe not distinguish between data files or a linkto files,it seems like there was a stale (?) linkto file on at least one subvol.
So at this point all 3 subvols have at least unwound to dht_discover_cbk.
(gdb) p local->dir_count
$7 = 0
(gdb) p is_linkfile
$8 = <optimized out>
(gdb) p local->cached_subvol
$9 = (xlator_t *) 0x7f2140014d20
(gdb) p local->hashed_subvol
$10 = (xlator_t *) 0x0
(gdb) p *local->cached_subvol
$11 = {name = 0x7f21400158c0 "Slave-replicate-1", type = 0x7f2140015a40 "cluster/replicate", instance_name = 0x0, next = 0x7f21400135b0, prev = 0x7f2140015e30,
parents = 0x7f2140018630, children = 0x7f2140015d10, options = 0x7f2140015910, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>,
cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140015b40, prev = 0x7f2140015b40},
fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>,
notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>},
history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f214005e0a0, init_succeeded = 1 '\001', private = 0x7f214005db00,
mem_acct = 0x7f214005a0b0, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0}
(gdb) p *prev
$12 = {name = 0x7f2140012740 "Slave-replicate-0", type = 0x7f2140014260 "cluster/replicate", instance_name = 0x0, next = 0x7f21400118a0, prev = 0x7f2140014d20,
parents = 0x7f2140018480, children = 0x7f2140014c00, options = 0x7f2140014130, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>,
cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140014a30, prev = 0x7f2140014a30},
fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>,
notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>},
history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f2140065080, init_succeeded = 1 '\001', private = 0x7f2140064ae0,
mem_acct = 0x7f2140061090, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0}
As the local->cached subvol does not match prev, it is clear that local->cached_subvol was updated after this thread proceeded to unwind to the higher xlators. This means a data file was found on at least 2 subvols - this could be because a linkto file was in the process of being created when the lookup hit the brick.
In dht_discover_complete () :
local = discover_frame->local;
layout = local->layout;
conf = this->private;
LOCK(&discover_frame->lock);
{
main_frame = local->main_frame;
local->main_frame = NULL;
}
UNLOCK(&discover_frame->lock);
if (!main_frame)
return 0; <--- So far so good - only one thread will unwind here.
However, dicover_frame->local is valid and _still_ accessible to other threads unwinding to dht_discover_cbk.
In dht_discover_cbk:
...
local = frame->local;
...
LOCK (&frame->lock);
{
...
if (is_dir) {
local->dir_count ++;
} else {
local->file_count ++;
if (!is_linkfile) {
/* real file */
local->cached_subvol = prev;
attempt_unwind = 1;
} else {
goto unlock;
}
}
...
if (local->xattr == NULL) {
local->xattr = dict_ref (xattr); <-- say thread 1 (the one that crashed )did this
} else {
dht_aggregate_xattr (local->xattr, xattr); <--- and thread 2 called this (this should _only_ be done for directories.)
}
In dht_aggregate_xattr () -> dht_aggregate ():
ret = dict_set (dst, key, value);
if (ret) {
gf_msg ("dht", GF_LOG_WARNING, 0,
DHT_MSG_DICT_SET_FAILED,
"Failed to set dictionary value: key = %s",
key);
}
dict_set _replaces_ the key which unrefs the old data, causing its ref-count to become 0 and hence destroying it.
So here is probably what happened:
Time: subvol-0 subvol-1
t0 dht_discover_cbk
local->xattr = dict_ref (dict0);
dict_set (dict0, GF_CONTENT_KEY, data0)
attempt_unwind = 1
t1 dht_discover_complete
t2 qr_content_extract gets data0
t3 dht_discover_cbk
dht_aggregate_xattr (local->xattr, xattr)
|->dict_set (dict0, GF_CONTENT_KEY, data1)
|-><-- data_unref (data0) <-- As ref-count = 0, data0 is destroyed
attempt_unwind = 1
t3 qr_content_extract accesses data0 and crashes
REVIEW: https://review.gluster.org/18137 (cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk) posted (#1) for review on master by N Balachandran (nbalacha) COMMIT: https://review.gluster.org/18137 committed in master by Raghavendra G (rgowdapp) ------ commit 9420022df0962b1fa4f3ea3774249be81bc945cc Author: N Balachandran <nbalacha> Date: Tue Aug 29 21:12:51 2017 +0530 cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk If dht_discover finds data files on more than one subvol, racing calls to dht_discover_cbk could end up calling dht_aggregate_xattr which could delete dictionary data that is being accessed by higher layer translators. Fixed to call dht_aggregate_xattr only for directories and consider only the first file to be found. Change-Id: I4f3d2a405ec735d4f1bb33a04b7255eb2d179f8a BUG: 1484709 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18137 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/ |