Bug 1466859 - dht_rename_lock_cbk crashes in upstream regression test
Summary: dht_rename_lock_cbk crashes in upstream regression test
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1466110 1466863
Blocks: 1466321
TreeView+ depends on / blocked
 
Reported: 2017-06-30 15:09 UTC by Nithya Balachandran
Modified: 2017-08-12 13:07 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.11.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1466110
Environment:
Last Closed: 2017-08-12 13:07:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2017-06-30 15:09:31 UTC
+++ This bug was initially created as a clone of Bug #1466110 +++

Description of problem:

In https://build.gluster.org/job/centos6-regression/5186/console

./tests/bugs/distribute/bug-1117851.t: 1 new core files


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Nithya Balachandran on 2017-06-29 01:15:05 EDT ---

Thanks to Jeff Darcy for debugging this:

Core was generated by `glusterfs --entry-timeout=0 --attribute-timeout=0 -s slave1.cloud.gluster.org -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,


(gdb) bt
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
#1  0x00007f00df1496c3 in dht_inodelk_done (lock_frame=0x7f00d80f9690) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:684
#2  0x00007f00df14b073 in dht_blocking_inodelk_cbk (frame=0x7f00d80f9690, cookie=0x1, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:1066
#3  0x00007f00df3e17ce in afr_fop_lock_unwind (frame=0x7f00d0056f10, op=GF_FOP_INODELK, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3557
#4  0x00007f00df3e3ca4 in afr_fop_lock_done (frame=0x7f00d0056f10, this=0x7f00d801a800)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3831
#5  0x00007f00df3e4050 in afr_parallel_lock_cbk (frame=0x7f00d0056f10, cookie=0x1, this=0x7f00d801a800, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3923
#6  0x00007f00df636ece in client3_3_inodelk_cbk (req=0x7f00d00877c0, iov=0x7f00d0087800, count=1, myframe=0x7f00d00749c0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:1510
#7  0x00007f00ec4f584d in rpc_clnt_handle_reply (clnt=0x7f00d806aa30, pollin=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:778
#8  0x00007f00ec4f5e17 in rpc_clnt_notify (trans=0x7f00d806ac60, mydata=0x7f00d806aa60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:971
#9  0x00007f00ec4f1dac in rpc_transport_notify (this=0x7f00d806ac60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538
#10 0x00007f00e1aa456a in socket_event_poll_in (this=0x7f00d806ac60, notify_handled=_gf_true)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2315
#11 0x00007f00e1aa4bb5 in socket_event_handler (fd=10, idx=1, gen=10, data=0x7f00d806ac60, poll_in=1, poll_out=0, poll_err=0)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2467
#12 0x00007f00ec7a153a in event_dispatch_epoll_handler (event_pool=0x23bd050, event=0x7f00dd147e70)
    at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572
#13 0x00007f00ec7a183c in event_dispatch_epoll_worker (data=0x7f00d806a770) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:648
#14 0x00007f00eba08aa1 in start_thread () from ./lib64/libpthread.so.0
#15 0x00007f00eb370bcd in clone () from ./lib64/libc.so.6
(gdb) l
1576	         * do a gfid based resolution). So once a lock is granted, make sure the file
1577	         * exists with the name that the client requested with.
1578	         * */
1579	
1580	        for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) {
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,
1582	                                   local->lock[0].layout.parent_layout.locks[i]->xl,
1583	                                   local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584	                                   ((gf_uuid_compare (local->loc.gfid, \
1585	                                     local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p frame
$1 = (call_frame_t *) 0x7f00d80ea130
(gdb) p *frame
$2 = {root = 0x7f00deadc0de00, parent = 0x7f00d80382c000, frames = {next = 0x7f000000003000, prev = 0xe000}, local = 0x7f00d80ea13000, this = 0x7f00deadc0de00, 
  ret = 0x7f00d80382c000, ref_count = 12288, lock = {spinlock = 57344, mutex = {__data = {__lock = 57344, __count = 0, __owner = 245444608, __nusers = 8323288, 
        __kind = -1379869184, __spins = 8323294, __list = {__prev = 0x7f00d80382c000, __next = 0x7f000000003000}}, 
      __size = "\000\340\000\000\000\000\000\000\000\060\241\016\330\000\177\000\000\336\300\255\336\000\177\000\000\300\202\003\330\000\177\000\000\060\000\000\000\000\177", __align = 57344}}, cookie = 0xe000, complete = (unknown: 245444608), op = 8323288, begin = {tv_sec = 35748278440091136, tv_usec = 35748249814089728}, 
  end = {tv_sec = 35747322042265600, tv_usec = 57344}, wind_from = 0x7f00d80ea13000 <Address 0x7f00d80ea13000 out of bounds>, 
  wind_to = 0x7f00deadc0de00 <Address 0x7f00deadc0de00 out of bounds>, unwind_from = 0x7f00d80382c000 <Address 0x7f00d80382c000 out of bounds>, 
  unwind_to = 0x7f000000003000 <Address 0x7f000000003000 out of bounds>}





(gdb) l
1576	         * do a gfid based resolution). So once a lock is granted, make sure the file
1577	         * exists with the name that the client requested with.
1578	         * */
1579	
1580	        for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) {
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,
1582	                                   local->lock[0].layout.parent_layout.locks[i]->xl,
1583	                                   local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584	                                   ((gf_uuid_compare (local->loc.gfid, \
1585	                                     local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p local->lock[0].layout.parent_layout.lk_count
$1 = 79368192
(gdb) p local->loc.gfid
$2 = "\000\336\300\255\336\000\177\000\000\020\273\004\330\000\177"
(gdb) p local->lock[0].layout.parent_layout.locks[i]->loc.gfid
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl
Cannot access memory at address 0x7f00deadc0de10

Both frame and local have obviously been freed.

The issue here is that the for loop uses a member of the local variable to check the limits. This is unsafe as the frames could have been released at this point.

--- Additional comment from Worker Ant on 2017-06-29 01:24:38 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Worker Ant on 2017-06-29 03:08:42 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-06-29 03:34:54 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim (potatogim)

--- Additional comment from Worker Ant on 2017-06-29 04:11:20 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#3) for review on master by Nigel Babu (nigelb)

--- Additional comment from Worker Ant on 2017-06-29 13:09:47 EDT ---

COMMIT: https://review.gluster.org/17645 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0
Author: N Balachandran <nbalacha>
Date:   Thu Jun 29 10:52:37 2017 +0530

    cluster:dht Fix crash in dht_rename_lock_cbk
    
    Use a local variable to store the call count
    in the STACK_WIND for loop. Using frame->local
    is dangerous as it could be freed while the loop
    is still being processed
    
    Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
    BUG: 1466110
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17645
    Smoke: Gluster Build System <jenkins.org>
    Tested-by: Nigel Babu <nigelb>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Jeff Darcy <jeff.us>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 1 Worker Ant 2017-06-30 15:36:01 UTC
REVIEW: https://review.gluster.org/17664 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-07-05 14:10:24 UTC
COMMIT: https://review.gluster.org/17664 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit d4adffdb8da96dfbbe68a8d325fc28941e1f8627
Author: N Balachandran <nbalacha>
Date:   Thu Jun 29 10:52:37 2017 +0530

    cluster:dht Fix crash in dht_rename_lock_cbk
    
    Use a local variable to store the call count
    in the STACK_WIND for loop. Using frame->local
    is dangerous as it could be freed while the loop
    is still being processed
    
    > BUG: 1466110
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: https://review.gluster.org/17645
    > Smoke: Gluster Build System <jenkins.org>
    > Tested-by: Nigel Babu <nigelb>
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: Jeff Darcy <jeff.us>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Shyamsundar Ranganathan <srangana>
    
    (cherry picked from commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0)
    Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
    BUG: 1466859
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17664
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Shyamsundar 2017-08-12 13:07:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.2, please open a new bug report.

glusterfs-3.11.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-July/031908.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.