Description of problem: ============================= While validating 3.2.0 async build () for geo-replication, following crash is seen upon rmdirs. (gdb) bt #0 0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974 #1 0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010) at dht-common.c:8223 #2 0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>, entries=<optimized out>, xdata=0x0) at dht-common.c:8345 #3 0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234 #4 0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214) at client-rpc-fops.c:2650 #5 0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794 #6 0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987 #7 0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2d944f6de0) at rpc-transport.c:538 #8 0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272 #9 0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402 #10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571 #11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674 #12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6 (gdb) p parent $1 = (loc_t *) 0x7f2d98ea2adc (gdb) p parent->inode $2 = (inode_t *) 0x0 (gdb) [root@dhcp43-27 ~]# file /core.4510 /core.4510: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64' [root@dhcp43-27 ~]# Version-Release number of selected component (if applicable): ========================================================= glusterfs-geo-replication-3.8.4-18.5.el7rhgs.x86_64 How reproducible: ===================== 1/4 times Steps to Reproduce: ==================== Ran geo-replication automation cases which does following fops on the master in order with different crawl methods {create, chmod, chown, chgrp, hardlink, softlink, truncate, rename, remove} Actual results: ================ Crashes seen upon rmdirs Expected results: ================== There should be no crash
We have hit this issue again on 3.3.0 builds (glusterfs-3.8.4-35.el7rhgs.x86_64) during our automation regression sanity check during rmdir fop. bt information is at: [root@dhcp42-177 ~]# gdb glusterfs /core.26013 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. warning: core file may not match specified executable file. [New LWP 26027] [New LWP 26016] [New LWP 26013] [New LWP 26065] [New LWP 26014] [New LWP 26015] [New LWP 26018] [New LWP 26019] [New LWP 26064] [New LWP 26021] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'. Program terminated with signal 11, Segmentation fault. #0 0x00007f15baef31ee in dht_build_child_loc (this=this@entry=0x7f15b4025ad0, child=child@entry=0x7f15b0232a58, parent=parent@entry=0x7f15b42cce68, name=name@entry=0x7f15b0229ef8 "59769be5%%O6KZ5MRVGT") at dht-helper.c:1275 1275 child->inode = inode_new (parent->inode->table); Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.15.2-50.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007f15baef31ee in dht_build_child_loc (this=this@entry=0x7f15b4025ad0, child=child@entry=0x7f15b0232a58, parent=parent@entry=0x7f15b42cce68, name=name@entry=0x7f15b0229ef8 "59769be5%%O6KZ5MRVGT") at dht-helper.c:1275 #1 0x00007f15baf3828d in dht_rmdir_is_subvol_empty (frame=0x7f15b42f8260, this=this@entry=0x7f15b4025ad0, entries=0x7f15b88448d0, src=src@entry=0x7f15b40249c0) at dht-common.c:8568 #2 0x00007f15baf38ef1 in dht_rmdir_readdirp_cbk (frame=0x7f15b42f8260, cookie=0x7f15b40249c0, this=0x7f15b4025ad0, op_ret=4, op_errno=<optimized out>, entries=<optimized out>, xdata=0x0) at dht-common.c:8688 #3 0x00007f15bb17b996 in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234 #4 0x00007f15bb40c4ba in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f15b42de4d0) at client-rpc-fops.c:2652 #5 0x00007f15c8cfe840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f15b4090610, pollin=pollin@entry=0x7f15b022a4b0) at rpc-clnt.c:794 #6 0x00007f15c8cfeb27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f15b4090640, event=<optimized out>, data=0x7f15b022a4b0) at rpc-clnt.c:987 #7 0x00007f15c8cfa9e3 in rpc_transport_notify (this=this@entry=0x7f15b4090810, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f15b022a4b0) at rpc-transport.c:538 #8 0x00007f15bd8fb3d6 in socket_event_poll_in (this=this@entry=0x7f15b4090810, notify_handled=<optimized out>) at socket.c:2306 #9 0x00007f15bd8fd97c in socket_event_handler (fd=13, idx=2, gen=7, data=0x7f15b4090810, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458 #10 0x00007f15c8f900f6 in event_dispatch_epoll_handler (event=0x7f15b8844e80, event_pool=0x55a3ccc77ee0) at event-epoll.c:572 #11 event_dispatch_epoll_worker (data=0x7f15b4090340) at event-epoll.c:648 #12 0x00007f15c7d94e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f15c766134d in clone () from /lib64/libc.so.6 (gdb) quit [root@dhcp42-177 ~]# file /core.26013 /core.26013: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64' [root@dhcp42-177 ~]#
Have tried rm / rmdir cases more than 5 times on build glusterfs-3.8.4-37 and glusterfs-3.8.4-38. No crash has been seen. Moving this bug to verified. Will reopen if seen again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774
*** Bug 1605230 has been marked as a duplicate of this bug. ***