Bug 1466863 - dht_rename_lock_cbk crashes in upstream regression test
Summary: dht_rename_lock_cbk crashes in upstream regression test
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard:
Depends On: 1466110
Blocks: glusterfs-3.10.4 1466321 1466859
TreeView+ depends on / blocked
 
Reported: 2017-06-30 15:11 UTC by Nithya Balachandran
Modified: 2018-06-20 18:28 UTC (History)
3 users (show)

Fixed In Version:
Clone Of: 1466110
Environment:
Last Closed: 2018-06-20 18:28:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2017-06-30 15:11:35 UTC
+++ This bug was initially created as a clone of Bug #1466110 +++

Description of problem:

In https://build.gluster.org/job/centos6-regression/5186/console

./tests/bugs/distribute/bug-1117851.t: 1 new core files


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Nithya Balachandran on 2017-06-29 01:15:05 EDT ---

Thanks to Jeff Darcy for debugging this:

Core was generated by `glusterfs --entry-timeout=0 --attribute-timeout=0 -s slave1.cloud.gluster.org -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,


(gdb) bt
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
#1  0x00007f00df1496c3 in dht_inodelk_done (lock_frame=0x7f00d80f9690) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:684
#2  0x00007f00df14b073 in dht_blocking_inodelk_cbk (frame=0x7f00d80f9690, cookie=0x1, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:1066
#3  0x00007f00df3e17ce in afr_fop_lock_unwind (frame=0x7f00d0056f10, op=GF_FOP_INODELK, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3557
#4  0x00007f00df3e3ca4 in afr_fop_lock_done (frame=0x7f00d0056f10, this=0x7f00d801a800)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3831
#5  0x00007f00df3e4050 in afr_parallel_lock_cbk (frame=0x7f00d0056f10, cookie=0x1, this=0x7f00d801a800, op_ret=0, op_errno=0, xdata=0x0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3923
#6  0x00007f00df636ece in client3_3_inodelk_cbk (req=0x7f00d00877c0, iov=0x7f00d0087800, count=1, myframe=0x7f00d00749c0)
    at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:1510
#7  0x00007f00ec4f584d in rpc_clnt_handle_reply (clnt=0x7f00d806aa30, pollin=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:778
#8  0x00007f00ec4f5e17 in rpc_clnt_notify (trans=0x7f00d806ac60, mydata=0x7f00d806aa60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:971
#9  0x00007f00ec4f1dac in rpc_transport_notify (this=0x7f00d806ac60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538
#10 0x00007f00e1aa456a in socket_event_poll_in (this=0x7f00d806ac60, notify_handled=_gf_true)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2315
#11 0x00007f00e1aa4bb5 in socket_event_handler (fd=10, idx=1, gen=10, data=0x7f00d806ac60, poll_in=1, poll_out=0, poll_err=0)
    at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2467
#12 0x00007f00ec7a153a in event_dispatch_epoll_handler (event_pool=0x23bd050, event=0x7f00dd147e70)
    at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572
#13 0x00007f00ec7a183c in event_dispatch_epoll_worker (data=0x7f00d806a770) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:648
#14 0x00007f00eba08aa1 in start_thread () from ./lib64/libpthread.so.0
#15 0x00007f00eb370bcd in clone () from ./lib64/libc.so.6
(gdb) l
1576	         * do a gfid based resolution). So once a lock is granted, make sure the file
1577	         * exists with the name that the client requested with.
1578	         * */
1579	
1580	        for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) {
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,
1582	                                   local->lock[0].layout.parent_layout.locks[i]->xl,
1583	                                   local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584	                                   ((gf_uuid_compare (local->loc.gfid, \
1585	                                     local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p frame
$1 = (call_frame_t *) 0x7f00d80ea130
(gdb) p *frame
$2 = {root = 0x7f00deadc0de00, parent = 0x7f00d80382c000, frames = {next = 0x7f000000003000, prev = 0xe000}, local = 0x7f00d80ea13000, this = 0x7f00deadc0de00, 
  ret = 0x7f00d80382c000, ref_count = 12288, lock = {spinlock = 57344, mutex = {__data = {__lock = 57344, __count = 0, __owner = 245444608, __nusers = 8323288, 
        __kind = -1379869184, __spins = 8323294, __list = {__prev = 0x7f00d80382c000, __next = 0x7f000000003000}}, 
      __size = "\000\340\000\000\000\000\000\000\000\060\241\016\330\000\177\000\000\336\300\255\336\000\177\000\000\300\202\003\330\000\177\000\000\060\000\000\000\000\177", __align = 57344}}, cookie = 0xe000, complete = (unknown: 245444608), op = 8323288, begin = {tv_sec = 35748278440091136, tv_usec = 35748249814089728}, 
  end = {tv_sec = 35747322042265600, tv_usec = 57344}, wind_from = 0x7f00d80ea13000 <Address 0x7f00d80ea13000 out of bounds>, 
  wind_to = 0x7f00deadc0de00 <Address 0x7f00deadc0de00 out of bounds>, unwind_from = 0x7f00d80382c000 <Address 0x7f00d80382c000 out of bounds>, 
  unwind_to = 0x7f000000003000 <Address 0x7f000000003000 out of bounds>}





(gdb) l
1576	         * do a gfid based resolution). So once a lock is granted, make sure the file
1577	         * exists with the name that the client requested with.
1578	         * */
1579	
1580	        for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) {
1581	                STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i,
1582	                                   local->lock[0].layout.parent_layout.locks[i]->xl,
1583	                                   local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584	                                   ((gf_uuid_compare (local->loc.gfid, \
1585	                                     local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p local->lock[0].layout.parent_layout.lk_count
$1 = 79368192
(gdb) p local->loc.gfid
$2 = "\000\336\300\255\336\000\177\000\000\020\273\004\330\000\177"
(gdb) p local->lock[0].layout.parent_layout.locks[i]->loc.gfid
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl
Cannot access memory at address 0x7f00deadc0de10

Both frame and local have obviously been freed.

The issue here is that the for loop uses a member of the local variable to check the limits. This is unsafe as the frames could have been released at this point.

--- Additional comment from Worker Ant on 2017-06-29 01:24:38 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Worker Ant on 2017-06-29 03:08:42 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-06-29 03:34:54 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim (potatogim)

--- Additional comment from Worker Ant on 2017-06-29 04:11:20 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#3) for review on master by Nigel Babu (nigelb)

--- Additional comment from Worker Ant on 2017-06-29 13:09:47 EDT ---

COMMIT: https://review.gluster.org/17645 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0
Author: N Balachandran <nbalacha>
Date:   Thu Jun 29 10:52:37 2017 +0530

    cluster:dht Fix crash in dht_rename_lock_cbk
    
    Use a local variable to store the call count
    in the STACK_WIND for loop. Using frame->local
    is dangerous as it could be freed while the loop
    is still being processed
    
    Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
    BUG: 1466110
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17645
    Smoke: Gluster Build System <jenkins.org>
    Tested-by: Nigel Babu <nigelb>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Jeff Darcy <jeff.us>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 1 Worker Ant 2017-06-30 15:52:01 UTC
REVIEW: https://review.gluster.org/17665 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on release-3.10 by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-07-01 15:54:23 UTC
COMMIT: https://review.gluster.org/17665 committed in release-3.10 by Raghavendra Talur (rtalur) 
------
commit 9eb41658918f6e0d01879e5cdf69b401cabfd60d
Author: N Balachandran <nbalacha>
Date:   Thu Jun 29 10:52:37 2017 +0530

    cluster:dht Fix crash in dht_rename_lock_cbk
    
    Use a local variable to store the call count
    in the STACK_WIND for loop. Using frame->local
    is dangerous as it could be freed while the loop
    is still being processed
    
    > BUG: 1466110
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: https://review.gluster.org/17645
    > Smoke: Gluster Build System <jenkins.org>
    > Tested-by: Nigel Babu <nigelb>
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: Jeff Darcy <jeff.us>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Shyamsundar Ranganathan <srangana>
    (cherry picked from commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0)
    Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
    BUG: 1466863
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17665
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Shyamsundar 2018-06-20 18:28:30 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.