+++ This bug was initially created as a clone of Bug #1484709 +++ +++ This bug was initially created as a clone of Bug #1484113 +++ Description of problem: ======================= While trying rebalance as part of remove brick containing hardlinks/rename io pattern resulted in multiple crashes at slave with the following bt: (gdb) bt #0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6 #1 0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 #3 0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0, cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117, inode_ret=0x7f212c05bb60, buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at quick-read.c:423 #4 0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at io-cache.c:267 #5 0x00007f2146614627 in dht_discover_complete (this=this@entry=0x7f2140016f40, discover_frame=discover_frame@entry=0x7f212c061500) at dht-common.c:577 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 #7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624 #8 afr_discover_cbk (frame=frame@entry=0x7f212c0546c0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f212c05bb60, buf=buf@entry=0x7f2147537920, xdata=0x7f21400adca0, postparent=postparent@entry=0x7f2147537990) at afr-common.c:2669 #9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380) at client-rpc-fops.c:2947 #10 0x00007f21543f2840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2140080400, pollin=pollin@entry=0x7f21400ee830) at rpc-clnt.c:794 #11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at rpc-clnt.c:987 #12 0x00007f21543ee9e3 in rpc_transport_notify (this=this@entry=0x7f21400805a0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f21400ee830) at rpc-transport.c:538 #13 0x00007f2148fef3d6 in socket_event_poll_in (this=this@entry=0x7f21400805a0, notify_handled=<optimized out>) at socket.c:2306 #14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1, data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80, event_pool=0x56504ff3cee0) at event-epoll.c:572 #16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648 #17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f2152d5534d in clone () from /lib64/libc.so.6 (gdb) Similar bug was seen when Master volume was tiered and slave was DR as mentioned in BZ 1291617 . However this bug was verified after retrial of 3 times on the same setup. Raising this as new bug without tier volume and with heavy IO pattern. How reproducible: ================= 2/2 Steps Carried: ============== 1. Create Master volume (5x2) and Slave volume (4x2) 2. Setup geo-rep session between master and slave 3. Mount the Master volume and start the following IO: for i in {create,chmod,hardlink,chgrp,symlink,hardlink,truncate,hardlink,rename,hardlink,symlink,hardlink,rename,chown,rename,create,hardlink,hardlink,symlink,rename}; do crefi --multi -n 5 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master_nfs/b ; sleep 10 ; done 4. Wait for couple of mins for geo-rep to start syncing these files to slave. 5. While syncing is in progress, remove one subvolume from slave volume (3x2) via rebalance start 6. After about 10 mins, remove one subvolume from Master volume too (4x2) 7. Keep checking rebalance status, once it is completed at slave, commit the remove brick 8. Keep checking rebalance status, once it is completed at Master, stop the geo-replication, commit the remove brick and start the geo-replication. Actual results: =============== Crashes observed at slave Expected results: ================= No crashes should be seen --- Additional comment from Nithya Balachandran on 2017-08-23 06:09:56 EDT --- Here is what I have so far: gdb -c /core.4537 /usr/sbin/glusterfs Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'. Program terminated with signal 11, Segmentation fault. #0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.15.2-50.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6 #1 0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 #3 0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0, cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117, inode_ret=0x7f212c05bb60, buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at quick-read.c:423 #4 0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at io-cache.c:267 #5 0x00007f2146614627 in dht_discover_complete (this=this@entry=0x7f2140016f40, discover_frame=discover_frame@entry=0x7f212c061500) at dht-common.c:577 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 #7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624 #8 afr_discover_cbk (frame=frame@entry=0x7f212c0546c0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f212c05bb60, buf=buf@entry=0x7f2147537920, xdata=0x7f21400adca0, postparent=postparent@entry=0x7f2147537990) at afr-common.c:2669 #9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380) at client-rpc-fops.c:2947 #10 0x00007f21543f2840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2140080400, pollin=pollin@entry=0x7f21400ee830) at rpc-clnt.c:794 #11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at rpc-clnt.c:987 #12 0x00007f21543ee9e3 in rpc_transport_notify (this=this@entry=0x7f21400805a0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f21400ee830) at rpc-transport.c:538 #13 0x00007f2148fef3d6 in socket_event_poll_in (this=this@entry=0x7f21400805a0, notify_handled=<optimized out>) at socket.c:2306 #14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1, data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80, event_pool=0x56504ff3cee0) at event-epoll.c:572 #16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648 #17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f2152d5534d in clone () from /lib64/libc.so.6 (gdb) The process has crashed in __memcpy_sse2 which was called in qr_content_extract. (gdb) f 2 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 279 memcpy (content, data->data, data->len); (gdb) p *data $1 = {is_static = 0 '\000', is_const = 0 '\000', len = -1162167622, data = 0x7f2134000cf0 "0", refcount = 0, lock = {spinlock = 0, mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}} (gdb) p/x *data $2 = {is_static = 0x0, is_const = 0x0, len = 0xbabababa, data = 0x7f2134000cf0, refcount = 0x0, lock = {spinlock = 0x0, mutex = {__data = {__lock = 0x0, __count = 0x0, __owner = 0x0, __nusers = 0x0, __kind = 0xffffffff, __spins = 0x0, __elision = 0x0, __list = {__prev = 0x0, __next = 0x0}}, __size = { 0x0 <repeats 16 times>, 0xff, 0xff, 0xff, 0xff, 0x0 <repeats 20 times>}, __align = 0x0}}} (gdb) This data struct looks like it has already been freed. It looks like the memcpy has the wrong len (data->len = 0xbabababa ) which causes the crash. Analysing the core ------------------- First, check if the client returns the correct xdata: (gdb) f 9 #9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380) at client-rpc-fops.c:2947 2947 CLIENT_STACK_UNWIND (lookup, frame, rsp.op_ret, rsp.op_errno, inode, (gdb) p rsp.op_ret $3 = 0 (gdb) p *xdata $4 = {is_static = 0 '\000', hash_size = 1, count = 5, refcount = 2, members = 0x7f21400adcf8, members_list = 0x7f21400ce980, extra_free = 0x0, extra_stdfree = 0x0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, members_internal = 0x7f21400ce980, free_pair = { hash_next = 0x0, prev = 0x7f2140003b70, next = 0x0, value = 0x7f21400da6c0, key = 0x7f21400ea0f0 "link-count"}, free_pair_in_use = _gf_true} (gdb) p *xdata->members[0] $5 = {hash_next = 0x7f2140097640, prev = 0x0, next = 0x7f2140097640, value = 0x7f2140070740, key = 0x7f21400f0400 "glusterfs.content"} (gdb) p *xdata->members[0]->value $6 = {is_static = 0 '\000', is_const = 0 '\000', len = 5223, data = 0x7f21400f6aa0 "# /etc/services:\n# $Id: services,v 1.55 2013/04/14 ovasik Exp $\n#\n# Network services, Internet style\n# IANA services version: last updated 2013-04-10\n#\n# Note that it is presently the policy of IANA t"..., refcount = 1, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}} (gdb) The value of "glusterfs.content" in the xdata returned by this client is correct. However afr chooses a different subvol as the read subvol and returns a different xdata to the higher layer as can be seen by the xdata returned to dht_discover_cbk in f6 (xdata=0x7f21340658c0) versus the xdata this client returned to afr in f8 (xdata=0x7f21400adca0). xdata=0x7f21340658c0 does not have the file contents. (gdb) f 7 #7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624 2624 AFR_STACK_UNWIND (lookup, frame, local->op_ret, local->op_errno, (gdb) p local->replies[0]->xdata $16 = (dict_t *) 0x7f21400adca0 <---- good one (gdb) p local->replies[1]->xdata $17 = (dict_t *) 0x7f21340658c0 <---- "bad" one AFR returns local->replies[1]->xdata. (gdb) p *local->replies[1]->xdata $18 = {is_static = 0 '\000', hash_size = 1, count = 5, refcount = 2, members = 0x7f2134065918, members_list = 0x7f21340387f0, extra_free = 0x0, extra_stdfree = 0x0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, members_internal = 0x7f21340387f0, free_pair = { hash_next = 0x0, prev = 0x7f213406d630, next = 0x0, value = 0x7f21400f54e0, key = 0x7f213401b230 "link-count"}, free_pair_in_use = _gf_true} (gdb) p *local->replies[1]->xdata->members[0] $19 = {hash_next = 0x7f21340398d0, prev = 0x0, next = 0x7f21340398d0, value = 0x7f2140002b50, key = 0x7f213401af10 "glusterfs.content"} (gdb) p *local->replies[1]->xdata->members[0]->value $20 = {is_static = 0 '\000', is_const = 0 '\000', len = 0, data = 0x7f21400c35e0 <incomplete sequence \360\255\272>, refcount = 1, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}} (gdb) p local->replies[1]->xdata->members[0]->value $21 = (data_t *) 0x7f2140002b50 (gdb) p *local->replies[1]->xdata->members[0]->value $24 = {is_static = 0 '\000', is_const = 0 '\000', len = 0, data = 0x7f21400c35e0 <incomplete sequence \360\255\272>, refcount = 1, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}} (gdb) qr_content_extract should have got this data ( (data_t *) 0x7f2140002b50) when it called data = dict_get (xdata, GF_CONTENT_KEY); However the data it has here is a completely different value - one that does not exist in this xdata. (gdb) f 2 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 279 memcpy (content, data->data, data->len); (gdb) p data $22 = (data_t *) 0x7f2134015300 Why it got this value for data is still not clear. Will continue to look into this. --- Additional comment from Nithya Balachandran on 2017-08-23 06:22:16 EDT --- I see a lot of such messages in the brick logs: rhs-brick1-b2.log:[2017-08-19 12:36:12.886812] W [MSGID: 113096] [posix-handle.c:793:posix_handle_hard] 0-Slave-posix: mismatching ino/dev between file /rhs/brick1/b2/b/thread6/level00/level10/level20/level30/level40/level50/level60/level70/level80/5997ff7f%%3F00F9OEG1 (165686/64774) and handle /rhs/brick1/b2/.glusterfs/cc/de/ccde18b3-e913-40b8-9183-ae39457d2e62 (10535691/64774) Are these a cause for concern? --- Additional comment from Amar Tumballi on 2017-08-23 09:18:50 EDT --- >> I see a lot of such messages in the brick logs: rhs-brick1-b2.log:[2017-08-19 12:36:12.886812] W [MSGID: 113096] [posix-handle.c:793:posix_handle_hard] 0-Slave-posix: mismatching ino/dev between file /rhs/brick1/b2/b/thread6/level00/level10/level20/level30/level40/level50/level60/level70/level80/5997ff7f%%3F00F9OEG1 (165686/64774) and handle /rhs/brick1/b2/.glusterfs/cc/de/ccde18b3-e913-40b8-9183-ae39457d2e62 (10535691/64774) Looks to be concerning. Happens when we have separate mount point inside... but how did hardlink succeed? --- Additional comment from Nithya Balachandran on 2017-08-24 01:34:24 EDT --- Ravi, Any idea why AFR returned the xdata with the 0 len glusterfs.content ? In other words, why is the file with no content considered the good copy? Unfortunately the setup is no longer available so we cannot check the files on the bricks. Regards, Nithya --- Additional comment from Nithya Balachandran on 2017-08-24 03:01:13 EDT --- I think the underlying cause of this is that qr_content_extract uses dict_get() to get the data value. As the data structure has not been ref-ed , it is possible that it was deleted elsewhere causing qr to use a freed structure. I see afr_lookup_done calls dict_del (replies[read_subvol].xdata, GF_CONTENT_KEY); I need to check with the AFR team if this can be called in this code path. In the meantime, use dict_get_with_ref in qr_content_extract. --- Additional comment from Nithya Balachandran on 2017-08-24 03:43:05 EDT --- It looks like there are some nasty races in the dict code: static int dict_get_with_ref (dict_t *this, char *key, data_t **data) { ... hash = SuperFastHash (key, strlen (key)); LOCK (&this->lock); { pair = dict_lookup_common (this, key, hash); } UNLOCK (&this->lock); <-- at this point, pair might have been deleted as the dictionary is no longer locked. However, the function proceeds to use it anyway and we could end up accessing a different key-value pair if it has been reused in the meantime. if (pair) { ret = 0; *data = data_ref (pair->value); } err: return ret; } For functions like int dict_get_ptr (dict_t *this, char *key, void **ptr) { data_t * data = NULL; int ret = 0; if (!this || !key || !ptr) { ret = -EINVAL; goto err; } ret = dict_get_with_ref (this, key, &data); if (ret != 0) { goto err; } ret = data_to_ptr_common (data, ptr); if (ret != 0) { goto err; } err: if (data) data_unref (data); <-- if this key is deleted here, the data can be unrefed and the pointers being returned are no longer valid. return ret; } --- Additional comment from Worker Ant on 2017-08-24 04:17:09 EDT --- REVIEW: https://review.gluster.org/18115 (perf/qr: Use a ref-ed data to extract content) posted (#1) for review on master by N Balachandran (nbalacha) --- Additional comment from Worker Ant on 2017-08-28 02:42:31 EDT --- COMMIT: https://review.gluster.org/18115 committed in master by Raghavendra G (rgowdapp) ------ commit 414d3e92fc56f08e320a3aa65b6b18e65b384551 Author: N Balachandran <nbalacha> Date: Thu Aug 24 13:39:07 2017 +0530 perf/qr: Use a ref-ed data to extract content qr_content_extract used dict_get to get the value of the GF_CONTENT_KEY key. dict_get does not ref the data before returning it so QR could be acting on freed memory if another thread deletes the key before then. This patch also fixes a race in dict_get_with_ref. Fix: Use dict_get_with_ref to retrieve the file contents. Change-Id: Ib1a7a70bb92eed7e70747ec530e0b3edc53127ec BUG: 1484709 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18115 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Tested-by: Raghavendra G <rgowdapp> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> --- Additional comment from Raghavendra G on 2017-08-28 02:45:44 EDT --- > I think the underlying cause of this is that qr_content_extract uses dict_get() to get the data value. As the data structure has not been ref-ed , it is possible that it was deleted elsewhere causing qr to use a freed structure. > I see afr_lookup_done calls > dict_del (replies[read_subvol].xdata, GF_CONTENT_KEY); > I need to check with the AFR team if this can be called in this code path. In the meantime, use dict_get_with_ref in qr_content_extract. Since the bug describes a distribute-replicate setup, I checked the code of afr_lookup_done. GF_CONTENT_KEY is deleted before afr calls STACK_UNWIND. So, this can't happen in parallel to qr_content_extract. --- Additional comment from Nithya Balachandran on 2017-08-29 10:48:50 EDT --- RCA: (gdb) bt #0 0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6 #1 0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 qr_content_extract (xdata=xdata@entry=0x7f21340658c0) at quick-read.c:279 #3 0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0, cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117, inode_ret=0x7f212c05bb60, buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at quick-read.c:423 #4 0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at io-cache.c:267 #5 0x00007f2146614627 in dht_discover_complete (this=this@entry=0x7f2140016f40, discover_frame=discover_frame@entry=0x7f212c061500) at dht-common.c:577 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 #7 0x00007f21468c4d10 in afr_discover_done (this=<optimized out>, frame=0x7f212c0546c0) at afr-common.c:2624 #8 afr_discover_cbk (frame=frame@entry=0x7f212c0546c0, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f212c05bb60, buf=buf@entry=0x7f2147537920, xdata=0x7f21400adca0, postparent=postparent@entry=0x7f2147537990) at afr-common.c:2669 #9 0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380) at client-rpc-fops.c:2947 #10 0x00007f21543f2840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2140080400, pollin=pollin@entry=0x7f21400ee830) at rpc-clnt.c:794 #11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at rpc-clnt.c:987 #12 0x00007f21543ee9e3 in rpc_transport_notify (this=this@entry=0x7f21400805a0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f21400ee830) at rpc-transport.c:538 #13 0x00007f2148fef3d6 in socket_event_poll_in (this=this@entry=0x7f21400805a0, notify_handled=<optimized out>) at socket.c:2306 #14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1, data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80, event_pool=0x56504ff3cee0) at event-epoll.c:572 #16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648 #17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f2152d5534d in clone () from /lib64/libc.so.6 (gdb) f 6 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 700 dht_discover_complete (this, frame); (gdb) l 695 * which calls STACK_DESTROY (frame). In the case of "attempt_unwind", 696 * this makes sure that the thread don't call dht_frame_return, till 697 * call to dht_discover_complete is done. 698 */ 699 if (attempt_unwind) { 700 dht_discover_complete (this, frame); <--- Here 701 } 702 703 this_call_cnt = dht_frame_return (frame); 704 dht_discover_cbk has found a file so it tries to unwind before all subvols have responded. So it is possible that other subvols have not yet responded. Checking the frame for dht_discover_cbk: gdb) f 6 #6 0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500, cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>, op_errno=117, inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0, postparent=0x7f212c05e6b8) at dht-common.c:700 700 dht_discover_complete (this, frame); (gdb) p local->file_count $6 = 3 Strangely a file with this gfid was found on _all_ 3 children. As local->file_count doe not distinguish between data files or a linkto files,it seems like there was a stale (?) linkto file on at least one subvol. So at this point all 3 subvols have at least unwound to dht_discover_cbk. (gdb) p local->dir_count $7 = 0 (gdb) p is_linkfile $8 = <optimized out> (gdb) p local->cached_subvol $9 = (xlator_t *) 0x7f2140014d20 (gdb) p local->hashed_subvol $10 = (xlator_t *) 0x0 (gdb) p *local->cached_subvol $11 = {name = 0x7f21400158c0 "Slave-replicate-1", type = 0x7f2140015a40 "cluster/replicate", instance_name = 0x0, next = 0x7f21400135b0, prev = 0x7f2140015e30, parents = 0x7f2140018630, children = 0x7f2140015d10, options = 0x7f2140015910, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140015b40, prev = 0x7f2140015b40}, fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f214005e0a0, init_succeeded = 1 '\001', private = 0x7f214005db00, mem_acct = 0x7f214005a0b0, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0} (gdb) p *prev $12 = {name = 0x7f2140012740 "Slave-replicate-0", type = 0x7f2140014260 "cluster/replicate", instance_name = 0x0, next = 0x7f21400118a0, prev = 0x7f2140014d20, parents = 0x7f2140018480, children = 0x7f2140014c00, options = 0x7f2140014130, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>, volume_options = {next = 0x7f2140014a30, prev = 0x7f2140014a30}, fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure = 0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable = 0x7f2140065080, init_succeeded = 1 '\001', private = 0x7f2140064ae0, mem_acct = 0x7f2140061090, winds = 0, switched = 0 '\000', local_pool = 0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0} As the local->cached subvol does not match prev, it is clear that local->cached_subvol was updated after this thread proceeded to unwind to the higher xlators. This means a data file was found on at least 2 subvols - this could be because a linkto file was in the process of being created when the lookup hit the brick. In dht_discover_complete () : local = discover_frame->local; layout = local->layout; conf = this->private; LOCK(&discover_frame->lock); { main_frame = local->main_frame; local->main_frame = NULL; } UNLOCK(&discover_frame->lock); if (!main_frame) return 0; <--- So far so good - only one thread will unwind here. However, dicover_frame->local is valid and _still_ accessible to other threads unwinding to dht_discover_cbk. In dht_discover_cbk: ... local = frame->local; ... LOCK (&frame->lock); { ... if (is_dir) { local->dir_count ++; } else { local->file_count ++; if (!is_linkfile) { /* real file */ local->cached_subvol = prev; attempt_unwind = 1; } else { goto unlock; } } ... if (local->xattr == NULL) { local->xattr = dict_ref (xattr); <-- say thread 1 (the one that crashed )did this } else { dht_aggregate_xattr (local->xattr, xattr); <--- and thread 2 called this (this should _only_ be done for directories.) } In dht_aggregate_xattr () -> dht_aggregate (): ret = dict_set (dst, key, value); if (ret) { gf_msg ("dht", GF_LOG_WARNING, 0, DHT_MSG_DICT_SET_FAILED, "Failed to set dictionary value: key = %s", key); } dict_set _replaces_ the key which unrefs the old data, causing its ref-count to become 0 and hence destroying it. So here is probably what happened: Time: subvol-0 subvol-1 t0 dht_discover_cbk local->xattr = dict_ref (dict0); dict_set (dict0, GF_CONTENT_KEY, data0) attempt_unwind = 1 t1 dht_discover_complete t2 qr_content_extract gets data0 t3 dht_discover_cbk dht_aggregate_xattr (local->xattr, xattr) |->dict_set (dict0, GF_CONTENT_KEY, data1) |-><-- data_unref (data0) <-- As ref-count = 0, data0 is destroyed attempt_unwind = 1 t3 qr_content_extract accesses data0 and crashes --- Additional comment from Worker Ant on 2017-08-29 11:51:49 EDT --- REVIEW: https://review.gluster.org/18137 (cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk) posted (#1) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/18145 (perf/qr: Use a ref-ed data to extract content) posted (#1) for review on release-3.12 by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/18146 (cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk) posted (#1) for review on release-3.12 by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/18145 committed in release-3.12 by jiffin tony Thottan (jthottan) ------ commit a862ca969cfd4dc9309e8123e6e90c4e47f9b89b Author: N Balachandran <nbalacha> Date: Wed Aug 30 10:01:54 2017 +0530 perf/qr: Use a ref-ed data to extract content qr_content_extract used dict_get to get the value of the GF_CONTENT_KEY key. dict_get does not ref the data before returning it so QR could be acting on freed memory if another thread deletes the key before then. This patch also fixes a race in dict_get_with_ref. Fix: Use dict_get_with_ref to retrieve the file contents. > BUG: 1484709 > Signed-off-by: N Balachandran <nbalacha> > Reviewed-on: https://review.gluster.org/18115 > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Amar Tumballi <amarts> > Tested-by: Raghavendra G <rgowdapp> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Raghavendra G <rgowdapp> Change-Id: Ib1a7a70bb92eed7e70747ec530e0b3edc53127ec BUG: 1486538 (cherry picked from commit 414d3e92fc56f08e320a3aa65b6b18e65b384551) Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18145 Reviewed-by: Raghavendra G <rgowdapp> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: MOHIT AGRAWAL <moagrawa>
Moving this back to Post as https://review.gluster.org/#/c/18146/ has not yet been merged.
COMMIT: https://review.gluster.org/18146 committed in release-3.12 by jiffin tony Thottan (jthottan) ------ commit 268bb635aca729a24242412c6f9dd9d5ba2a2654 Author: N Balachandran <nbalacha> Date: Wed Aug 30 10:36:45 2017 +0530 cluster/dht: Aggregate xattrs only for dirs in dht_discover_cbk If dht_discover finds data files on more than one subvol, racing calls to dht_discover_cbk could end up calling dht_aggregate_xattr which could delete dictionary data that is being accessed by higher layer translators. Fixed to call dht_aggregate_xattr only for directories and consider only the first file to be found. > BUG: 1484709 > Signed-off-by: N Balachandran <nbalacha> > Reviewed-on: https://review.gluster.org/18137 > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Raghavendra G <rgowdapp> Change-Id: I4f3d2a405ec735d4f1bb33a04b7255eb2d179f8a BUG: 1486538 (cherry picked from commit 9420022df0962b1fa4f3ea3774249be81bc945cc) Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/18146 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Amar Tumballi <amarts> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.1, please open a new bug report. glusterfs-glusterfs-3.12.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-September/032441.html [2] https://www.gluster.org/pipermail/gluster-users/