Bug 1466110
Summary: | dht_rename_lock_cbk crashes in upstream regression test | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, jeff, nbalacha | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1466321 1466859 1466863 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-05 17:35:13 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1466321, 1466859, 1466863 |
Description
Nithya Balachandran
2017-06-29 05:09:07 UTC
Thanks to Jeff Darcy for debugging this: Core was generated by `glusterfs --entry-timeout=0 --attribute-timeout=0 -s slave1.cloud.gluster.org -'. Program terminated with signal 11, Segmentation fault. #0 0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, (gdb) bt #0 0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130, cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581 #1 0x00007f00df1496c3 in dht_inodelk_done (lock_frame=0x7f00d80f9690) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:684 #2 0x00007f00df14b073 in dht_blocking_inodelk_cbk (frame=0x7f00d80f9690, cookie=0x1, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:1066 #3 0x00007f00df3e17ce in afr_fop_lock_unwind (frame=0x7f00d0056f10, op=GF_FOP_INODELK, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3557 #4 0x00007f00df3e3ca4 in afr_fop_lock_done (frame=0x7f00d0056f10, this=0x7f00d801a800) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3831 #5 0x00007f00df3e4050 in afr_parallel_lock_cbk (frame=0x7f00d0056f10, cookie=0x1, this=0x7f00d801a800, op_ret=0, op_errno=0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3923 #6 0x00007f00df636ece in client3_3_inodelk_cbk (req=0x7f00d00877c0, iov=0x7f00d0087800, count=1, myframe=0x7f00d00749c0) at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:1510 #7 0x00007f00ec4f584d in rpc_clnt_handle_reply (clnt=0x7f00d806aa30, pollin=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:778 #8 0x00007f00ec4f5e17 in rpc_clnt_notify (trans=0x7f00d806ac60, mydata=0x7f00d806aa60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:971 #9 0x00007f00ec4f1dac in rpc_transport_notify (this=0x7f00d806ac60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538 #10 0x00007f00e1aa456a in socket_event_poll_in (this=0x7f00d806ac60, notify_handled=_gf_true) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2315 #11 0x00007f00e1aa4bb5 in socket_event_handler (fd=10, idx=1, gen=10, data=0x7f00d806ac60, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2467 #12 0x00007f00ec7a153a in event_dispatch_epoll_handler (event_pool=0x23bd050, event=0x7f00dd147e70) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572 #13 0x00007f00ec7a183c in event_dispatch_epoll_worker (data=0x7f00d806a770) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:648 #14 0x00007f00eba08aa1 in start_thread () from ./lib64/libpthread.so.0 #15 0x00007f00eb370bcd in clone () from ./lib64/libc.so.6 (gdb) l 1576 * do a gfid based resolution). So once a lock is granted, make sure the file 1577 * exists with the name that the client requested with. 1578 * */ 1579 1580 for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) { 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, 1582 local->lock[0].layout.parent_layout.locks[i]->xl, 1583 local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup, 1584 ((gf_uuid_compare (local->loc.gfid, \ 1585 local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ? (gdb) p frame $1 = (call_frame_t *) 0x7f00d80ea130 (gdb) p *frame $2 = {root = 0x7f00deadc0de00, parent = 0x7f00d80382c000, frames = {next = 0x7f000000003000, prev = 0xe000}, local = 0x7f00d80ea13000, this = 0x7f00deadc0de00, ret = 0x7f00d80382c000, ref_count = 12288, lock = {spinlock = 57344, mutex = {__data = {__lock = 57344, __count = 0, __owner = 245444608, __nusers = 8323288, __kind = -1379869184, __spins = 8323294, __list = {__prev = 0x7f00d80382c000, __next = 0x7f000000003000}}, __size = "\000\340\000\000\000\000\000\000\000\060\241\016\330\000\177\000\000\336\300\255\336\000\177\000\000\300\202\003\330\000\177\000\000\060\000\000\000\000\177", __align = 57344}}, cookie = 0xe000, complete = (unknown: 245444608), op = 8323288, begin = {tv_sec = 35748278440091136, tv_usec = 35748249814089728}, end = {tv_sec = 35747322042265600, tv_usec = 57344}, wind_from = 0x7f00d80ea13000 <Address 0x7f00d80ea13000 out of bounds>, wind_to = 0x7f00deadc0de00 <Address 0x7f00deadc0de00 out of bounds>, unwind_from = 0x7f00d80382c000 <Address 0x7f00d80382c000 out of bounds>, unwind_to = 0x7f000000003000 <Address 0x7f000000003000 out of bounds>} (gdb) l 1576 * do a gfid based resolution). So once a lock is granted, make sure the file 1577 * exists with the name that the client requested with. 1578 * */ 1579 1580 for (i = 0; i < local->lock[0].layout.parent_layout.lk_count; i++) { 1581 STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void *)(long)i, 1582 local->lock[0].layout.parent_layout.locks[i]->xl, 1583 local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup, 1584 ((gf_uuid_compare (local->loc.gfid, \ 1585 local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ? (gdb) p local->lock[0].layout.parent_layout.lk_count $1 = 79368192 (gdb) p local->loc.gfid $2 = "\000\336\300\255\336\000\177\000\000\020\273\004\330\000\177" (gdb) p local->lock[0].layout.parent_layout.locks[i]->loc.gfid Cannot access memory at address 0x7f00deadc0de10 (gdb) p local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup Cannot access memory at address 0x7f00deadc0de10 (gdb) p local->lock[0].layout.parent_layout.locks[i]->xl Cannot access memory at address 0x7f00deadc0de10 Both frame and local have obviously been freed. The issue here is that the for loop uses a member of the local variable to check the limits. This is unsafe as the frames could have been released at this point. REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#1) for review on master by N Balachandran (nbalacha) REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim (potatogim) REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in dht_rename_lock_cbk) posted (#3) for review on master by Nigel Babu (nigelb) COMMIT: https://review.gluster.org/17645 committed in master by Shyamsundar Ranganathan (srangana) ------ commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0 Author: N Balachandran <nbalacha> Date: Thu Jun 29 10:52:37 2017 +0530 cluster:dht Fix crash in dht_rename_lock_cbk Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466110 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17645 Smoke: Gluster Build System <jenkins.org> Tested-by: Nigel Babu <nigelb> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Jeff Darcy <jeff.us> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ |