+++ This bug was initially created as a clone of Bug #1471918 +++ Description of problem: ============================= following crash is seen upon rmdirs. (gdb) bt #0 0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974 #1 0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010) at dht-common.c:8223 #2 0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>, entries=<optimized out>, xdata=0x0) at dht-common.c:8345 #3 0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234 #4 0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214) at client-rpc-fops.c:2650 #5 0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794 #6 0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987 #7 0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2d944f6de0) at rpc-transport.c:538 #8 0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272 #9 0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402 #10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571 #11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674 #12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6 (gdb) p parent $1 = (loc_t *) 0x7f2d98ea2adc (gdb) p parent->inode $2 = (inode_t *) 0x0 (gdb) [root@dhcp43-27 ~]# file /core.4510 /core.4510: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64' [root@dhcp43-27 ~]# Steps to Reproduce: ==================== Ran geo-replication automation cases which does following fops on the master in order with different crawl methods {create, chmod, chown, chgrp, hardlink, softlink, truncate, rename, remove} Actual results: ================ Crashes seen upon rmdirs Expected results: ================== There should be no crash --- Additional comment from Nithya Balachandran on 2017-07-19 01:41:50 EDT --- RCA : ==== [root@dhcp43-27 ~]# gdb -c /core.4510 /usr/sbin/glusterfsd ... Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'. Program terminated with signal 11, Segmentation fault. #0 0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974 974 child->inode = inode_new (parent->inode->table); Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.15.2-50.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974 #1 0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010) at dht-common.c:8223 #2 0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>, entries=<optimized out>, xdata=0x0) at dht-common.c:8345 #3 0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234 #4 0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214) at client-rpc-fops.c:2650 #5 0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794 #6 0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987 #7 0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2d944f6de0) at rpc-transport.c:538 #8 0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272 #9 0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402 #10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571 #11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674 #12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6 (gdb) l 969 child->name = strrchr (child->path, '/'); 970 if (child->name) 971 child->name++; 972 973 child->parent = inode_ref (parent->inode); 974 child->inode = inode_new (parent->inode->table); 975 976 if (!child->inode) { 977 goto err; 978 } (gdb) p parent->inode $1 = (inode_t *) 0x0 (gdb) p parent->inode->table Cannot access memory at address 0x0 (gdb) f 1 #1 0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010) at dht-common.c:8223 8223 build_ret = dht_build_child_loc (this, &lookup_local->loc, (gdb) l 8218 8219 lookup_frame->local = lookup_local; 8220 lookup_local->main_frame = frame; 8221 lookup_local->hashed_subvol = src; 8222 8223 build_ret = dht_build_child_loc (this, &lookup_local->loc, 8224 &local->loc, trav->d_name); 8225 if (build_ret != 0) 8226 goto err; 8227 (gdb) p local->loc $2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>} (gdb) (gdb) p *frame $3 = {root = 0x7f2da6c985b4, parent = 0x0, frames = {next = 0x7f2da6e67b94, prev = 0x7f2da6e67b94}, local = 0x0, this = 0x7f2d9401cdd0, ret = 0x0, ref_count = 0, lock = {spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x0, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0} (gdb) p *local $4 = {call_cnt = 0, loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, op_ret = -1, op_errno = 39, layout_mismatch = 0, stbuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = { It looks like the frame has been freed.
REVIEW: https://review.gluster.org/17825 (cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty) posted (#1) for review on master by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17825 committed in master by Shyamsundar Ranganathan (srangana) ------ commit fdc431063f33cf4f5572771742e5502565f2a3ca Author: N Balachandran <nbalacha> Date: Wed Jul 19 21:44:55 2017 +0530 cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty The local->call_cnt was being accessed and updated inside the loop where the entries were being processed and the calls were being wound. This could end up in a scenario where the local->call_cnt became 0 before the processing was complete causing the crash when the next entry was being processed. Change-Id: I930f61f1a1d1948f90d4e58e80b7d6680cf27f2f BUG: 1472949 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17825 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Shyamsundar Ranganathan <srangana>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/