Bug 1471918 - [distribute] crashes seen upon rmdirs
Summary: [distribute] crashes seen upon rmdirs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Nithya Balachandran
QA Contact: Rochelle
URL:
Whiteboard:
: 1605230 (view as bug list)
Depends On:
Blocks: 1417151 1472949
TreeView+ depends on / blocked
 
Reported: 2017-07-17 16:27 UTC by Rochelle
Modified: 2022-03-13 14:21 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1472949 (view as bug list)
Environment:
Last Closed: 2017-09-21 05:02:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Rochelle 2017-07-17 16:27:43 UTC
Description of problem:
=============================
While validating 3.2.0 async build () for geo-replication, following crash is seen upon rmdirs. 



(gdb) bt
#0  0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc,
    name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974
#1  0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010)
    at dht-common.c:8223
#2  0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>,
    entries=<optimized out>, xdata=0x0) at dht-common.c:8345
#3  0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>,
    xdata=0x0) at afr-dir-read.c:234
#4  0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214) at client-rpc-fops.c:2650
#5  0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794
#6  0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987
#7  0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2d944f6de0)
    at rpc-transport.c:538
#8  0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272
#9  0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402
#10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571
#11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674
#12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6
(gdb) p parent
$1 = (loc_t *) 0x7f2d98ea2adc
(gdb) p parent->inode
$2 = (inode_t *) 0x0
(gdb)

[root@dhcp43-27 ~]# file /core.4510
/core.4510: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64'
[root@dhcp43-27 ~]#





Version-Release number of selected component (if applicable):
=========================================================

glusterfs-geo-replication-3.8.4-18.5.el7rhgs.x86_64

How reproducible:
=====================

1/4 times


Steps to Reproduce:
====================

Ran geo-replication automation cases which does following fops on the master in order with different crawl methods {create, chmod, chown, chgrp, hardlink, softlink, truncate, rename, remove}

Actual results:
================
Crashes seen upon rmdirs

Expected results:
==================
There should be no crash

Comment 8 Rochelle 2017-07-25 06:36:24 UTC
We have hit this issue again on 3.3.0 builds (glusterfs-3.8.4-35.el7rhgs.x86_64) during our automation regression sanity check during rmdir fop. 


bt information is at:

[root@dhcp42-177 ~]# gdb glusterfs /core.26013 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 26027]
[New LWP 26016]
[New LWP 26013]
[New LWP 26065]
[New LWP 26014]
[New LWP 26015]
[New LWP 26018]
[New LWP 26019]
[New LWP 26064]
[New LWP 26021]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f15baef31ee in dht_build_child_loc (this=this@entry=0x7f15b4025ad0, child=child@entry=0x7f15b0232a58, parent=parent@entry=0x7f15b42cce68, 
    name=name@entry=0x7f15b0229ef8 "59769be5%%O6KZ5MRVGT") at dht-helper.c:1275
1275	        child->inode = inode_new (parent->inode->table);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.15.2-50.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007f15baef31ee in dht_build_child_loc (this=this@entry=0x7f15b4025ad0, child=child@entry=0x7f15b0232a58, parent=parent@entry=0x7f15b42cce68, 
    name=name@entry=0x7f15b0229ef8 "59769be5%%O6KZ5MRVGT") at dht-helper.c:1275
#1  0x00007f15baf3828d in dht_rmdir_is_subvol_empty (frame=0x7f15b42f8260, this=this@entry=0x7f15b4025ad0, entries=0x7f15b88448d0, 
    src=src@entry=0x7f15b40249c0) at dht-common.c:8568
#2  0x00007f15baf38ef1 in dht_rmdir_readdirp_cbk (frame=0x7f15b42f8260, cookie=0x7f15b40249c0, this=0x7f15b4025ad0, op_ret=4, 
    op_errno=<optimized out>, entries=<optimized out>, xdata=0x0) at dht-common.c:8688
#3  0x00007f15bb17b996 in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, 
    subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234
#4  0x00007f15bb40c4ba in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f15b42de4d0)
    at client-rpc-fops.c:2652
#5  0x00007f15c8cfe840 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f15b4090610, pollin=pollin@entry=0x7f15b022a4b0) at rpc-clnt.c:794
#6  0x00007f15c8cfeb27 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f15b4090640, event=<optimized out>, data=0x7f15b022a4b0)
    at rpc-clnt.c:987
#7  0x00007f15c8cfa9e3 in rpc_transport_notify (this=this@entry=0x7f15b4090810, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, 
    data=data@entry=0x7f15b022a4b0) at rpc-transport.c:538
#8  0x00007f15bd8fb3d6 in socket_event_poll_in (this=this@entry=0x7f15b4090810, notify_handled=<optimized out>) at socket.c:2306
#9  0x00007f15bd8fd97c in socket_event_handler (fd=13, idx=2, gen=7, data=0x7f15b4090810, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458
#10 0x00007f15c8f900f6 in event_dispatch_epoll_handler (event=0x7f15b8844e80, event_pool=0x55a3ccc77ee0) at event-epoll.c:572
#11 event_dispatch_epoll_worker (data=0x7f15b4090340) at event-epoll.c:648
#12 0x00007f15c7d94e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f15c766134d in clone () from /lib64/libc.so.6
(gdb) quit 
[root@dhcp42-177 ~]# file /core.26013 
/core.26013: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64'
[root@dhcp42-177 ~]#

Comment 18 Rochelle 2017-08-09 07:10:26 UTC
Have tried rm / rmdir cases more than 5 times on build glusterfs-3.8.4-37 and glusterfs-3.8.4-38. No crash has been seen.
Moving this bug to verified. 
Will reopen if seen again.

Comment 20 errata-xmlrpc 2017-09-21 05:02:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 21 Nithya Balachandran 2018-09-05 03:37:57 UTC
*** Bug 1605230 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.