+++ This bug was initially created as a clone of Bug #1466110 +++ Description of problem: In https://build.gluster.org/job/centos6-regression/5186/console ./tests/bugs/distribute/bug-1117851.t: 1 new core files Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Nithya Balachandran on 2017-06-29 01:15:05 EDT --- Thanks to Jeff Darcy for debugging this: Core was generated by `glusterfs --entry-timeout=0 --attribute-timeout=0 -s slave1.cloud.gluster.org -'. Program terminated with signal 11, Segmentation fault. #0 0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, (gdb) bt #0 0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581 #1 0x00007f00df1496c3 in dht_inodelk_done (lock_frame=0x7f00d80f9690) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:684 #2 0x00007f00df14b073 in dht_blocking_inodelk_cbk (frame=0x7f00d80f9690, cookie=0x1, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:1066 #3 0x00007f00df3e17ce in afr_fop_lock_unwind (frame=0x7f00d0056f10, op=GF_FOP_INODELK, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3557 #4 0x00007f00df3e3ca4 in afr_fop_lock_done (frame=0x7f00d0056f10, this=0x7f00d801a800) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3831 #5 0x00007f00df3e4050 in afr_parallel_lock_cbk (frame=0x7f00d0056f10, cookie=0x1, this=0x7f00d801a800, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3923 #6 0x00007f00df636ece in client3_3_inodelk_cbk (req=0x7f00d00877c0, iov=0x7f00d0087800, count=1, myframe=0x7f00d00749c0) at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:1510 #7 0x00007f00ec4f584d in rpc_clnt_handle_reply (clnt=0x7f00d806aa30, pollin=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:778 #8 0x00007f00ec4f5e17 in rpc_clnt_notify (trans=0x7f00d806ac60, mydata=0x7f00d806aa60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:971 #9 0x00007f00ec4f1dac in rpc_transport_notify (this=0x7f00d806ac60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538 #10 0x00007f00e1aa456a in socket_event_poll_in (this=0x7f00d806ac60, notify_handled=_gf_true) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2315 #11 0x00007f00e1aa4bb5 in socket_event_handler (fd=10, idx=1, gen=10, data=0x7f00d806ac60, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2467 #12 0x00007f00ec7a153a in event_dispatch_epoll_handler (event_pool=0x23bd050, event=0x7f00dd147e70) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572 #13 0x00007f00ec7a183c in event_dispatch_epoll_worker (data=0x7f00d806a770) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:648 #14 0x00007f00eba08aa1 in start_thread () from ./lib64/libpthread.so.0 #15 0x00007f00eb370bcd in clone () from ./lib64/libc.so.6 (gdb) l 1576 * do a gfid based resolution). So once a lock is granted, make sure the file 1577 * exists with the name that the client requested with. 1578 * */ 1579 1580 for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) { 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, 1582 local->lock[0].layout.parent_layout.locks[i]->xl, 1583 local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup, 1584 ((gf_uuid_compare (local->loc.gfid, \ 1585 local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ? (gdb) p frame $1 = (call_frame_t *) 0x7f00d80ea130 (gdb) p *frame $2 = {root = 0x7f00deadc0de00, parent = 0x7f00d80382c000, frames = {next = 0x7f000000003000, prev = 0xe000}, local = 0x7f00d80ea13000, this = 0x7f00deadc0de00, ret = 0x7f00d80382c000, ref_count = 12288, lock = {spinlock = 57344, mutex = {__data = {__lock = 57344, __count = 0, __owner = 245444608, __nusers = 8323288, __kind = -1379869184, __spins = 8323294, __list = {__prev = 0x7f00d80382c000, __next = 0x7f000000003000}}, __size = "\000\340\000\000\000\000\000\000\000\060\241\016\330\000\177\000\000\336\300\255\336\000\177\000\000\300\202\003\330\000\177\000\000\060\000\000\000\000\177", __align = 57344}}, cookie = 0xe000, complete = (unknown: 245444608), op = 8323288, begin = {tv_sec = 35748278440091136, tv_usec = 35748249814089728}, end = {tv_sec = 35747322042265600, tv_usec = 57344}, wind_from = 0x7f00d80ea13000 <Address 0x7f00d80ea13000 out of bounds>, wind_to = 0x7f00deadc0de00 <Address 0x7f00deadc0de00 out of bounds>, unwind_from = 0x7f00d80382c000 <Address 0x7f00d80382c000 out of bounds>, unwind_to = 0x7f000000003000 <Address 0x7f000000003000 out of bounds>} (gdb) l 1576 * do a gfid based resolution). So once a lock is granted, make sure the file 1577 * exists with the name that the client requested with. 1578 * */ 1579 1580 for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) { 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, 1582 local->lock[0].layout.parent_layout.locks[i]->xl, 1583 local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup, 1584 ((gf_uuid_compare (local->loc.gfid, \ 1585 local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ? (gdb) p local->lock[0].layout.parent_layout.lk_count $1 = 79368192 (gdb) p local->loc.gfid $2 = "\000\336\300\255\336\000\177\000\000\020\273\004\330\000\177" (gdb) p local->lock[0].layout.parent_layout.locks[i]->loc.gfid Cannot access memory at address 0x7f00deadc0de10 (gdb) p local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup Cannot access memory at address 0x7f00deadc0de10 (gdb) p local->lock[0].layout.parent_layout.locks[i]->xl Cannot access memory at address 0x7f00deadc0de10 Both frame and local have obviously been freed. The issue here is that the for loop uses a member of the local variable to check the limits. This is unsafe as the frames could have been released at this point. --- Additional comment from Worker Ant on 2017-06-29 01:24:38 EDT --- REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on master by N Balachandran (nbalacha) --- Additional comment from Worker Ant on 2017-06-29 03:08:42 EDT --- REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim --- Additional comment from Worker Ant on 2017-06-29 03:34:54 EDT --- REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim (potatogim) --- Additional comment from Worker Ant on 2017-06-29 04:11:20 EDT --- REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#3) for review on master by Nigel Babu (nigelb) --- Additional comment from Worker Ant on 2017-06-29 13:09:47 EDT --- COMMIT: https://review.gluster.org/17645 committed in master by Shyamsundar Ranganathan (srangana) ------ commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0 Author: N Balachandran <nbalacha> Date: Thu Jun 29 10:52:37 2017 +0530 cluster:dht Fix crash in dht_rename_lock_cbk Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466110 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17645 Smoke: Gluster Build System <jenkins.org> Tested-by: Nigel Babu <nigelb> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Jeff Darcy <jeff.us> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
REVIEW: https://review.gluster.org/17664 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17664 committed in release-3.11 by Shyamsundar Ranganathan (srangana) ------ commit d4adffdb8da96dfbbe68a8d325fc28941e1f8627 Author: N Balachandran <nbalacha> Date: Thu Jun 29 10:52:37 2017 +0530 cluster:dht Fix crash in dht_rename_lock_cbk Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed > BUG: 1466110 > Signed-off-by: N Balachandran <nbalacha> > Reviewed-on: https://review.gluster.org/17645 > Smoke: Gluster Build System <jenkins.org> > Tested-by: Nigel Babu <nigelb> > Reviewed-by: Amar Tumballi <amarts> > Reviewed-by: Jeff Darcy <jeff.us> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Shyamsundar Ranganathan <srangana> (cherry picked from commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0) Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466859 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17664 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.2, please open a new bug report. glusterfs-3.11.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-July/031908.html [2] https://www.gluster.org/pipermail/gluster-users/