Bug 1472949

Summary: [distribute] crashes seen upon rmdirs
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, rallan, rhs-bugs, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1471918 Environment:
Last Closed: 2017-09-05 17:37:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1471918    
Bug Blocks:    

Description Nithya Balachandran 2017-07-19 16:13:38 UTC
+++ This bug was initially created as a clone of Bug #1471918 +++

Description of problem:
=============================
following crash is seen upon rmdirs. 



(gdb) bt
#0  0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc,
    name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974
#1  0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010)
    at dht-common.c:8223
#2  0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>,
    entries=<optimized out>, xdata=0x0) at dht-common.c:8345
#3  0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, subvol_entries=<optimized out>,
    xdata=0x0) at afr-dir-read.c:234
#4  0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214) at client-rpc-fops.c:2650
#5  0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794
#6  0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987
#7  0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2d944f6de0)
    at rpc-transport.c:538
#8  0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272
#9  0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402
#10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571
#11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674
#12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6
(gdb) p parent
$1 = (loc_t *) 0x7f2d98ea2adc
(gdb) p parent->inode
$2 = (inode_t *) 0x0
(gdb)

[root@dhcp43-27 ~]# file /core.4510
/core.4510: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfs', platform: 'x86_64'
[root@dhcp43-27 ~]#




Steps to Reproduce:
====================

Ran geo-replication automation cases which does following fops on the master in order with different crawl methods {create, chmod, chown, chgrp, hardlink, softlink, truncate, rename, remove}

Actual results:
================
Crashes seen upon rmdirs

Expected results:
==================
There should be no crash

--- Additional comment from Nithya Balachandran on 2017-07-19 01:41:50 EDT ---

RCA :
====

[root@dhcp43-27 ~]# gdb -c /core.4510 /usr/sbin/glusterfsd

...
Core was generated by `/usr/sbin/glusterfs --aux-gfid-mount --acl --log-file=/var/log/glusterfs/geo-re'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, 
    name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974
974	        child->inode = inode_new (parent->inode->table);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.15.2-50.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007f2d9b00bf1e in dht_build_child_loc (this=this@entry=0x7f2d9401cdd0, child=child@entry=0x7f2d98ea1da8, parent=parent@entry=0x7f2d98ea2adc, 
    name=name@entry=0x7f2d9451ecd8 "596ae61d%%050WLDDC18") at dht-helper.c:974
#1  0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010)
    at dht-common.c:8223
#2  0x00007f2d9b04ec4b in dht_rmdir_readdirp_cbk (frame=0x7f2da6e67b84, cookie=0x7f2da6e665fc, this=0x7f2d9401cdd0, op_ret=4, op_errno=<optimized out>, 
    entries=<optimized out>, xdata=0x0) at dht-common.c:8345
#3  0x00007f2d9b29097c in afr_readdir_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=4, op_errno=2, 
    subvol_entries=<optimized out>, xdata=0x0) at afr-dir-read.c:234
#4  0x00007f2d9b5217a1 in client3_3_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f2da6e62214)
    at client-rpc-fops.c:2650
#5  0x00007f2da91cb860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2d941da920, pollin=pollin@entry=0x7f2d944f6de0) at rpc-clnt.c:794
#6  0x00007f2da91cbb4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f2d941da950, event=<optimized out>, data=0x7f2d944f6de0) at rpc-clnt.c:987
#7  0x00007f2da91c79f3 in rpc_transport_notify (this=this@entry=0x7f2d941ea5e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, 
    data=data@entry=0x7f2d944f6de0) at rpc-transport.c:538
#8  0x00007f2d9da11314 in socket_event_poll_in (this=this@entry=0x7f2d941ea5e0) at socket.c:2272
#9  0x00007f2d9da137c5 in socket_event_handler (fd=<optimized out>, idx=2, data=0x7f2d941ea5e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402
#10 0x00007f2da945b9e0 in event_dispatch_epoll_handler (event=0x7f2d9bf59e80, event_pool=0x55a569c27e10) at event-epoll.c:571
#11 event_dispatch_epoll_worker (data=0x55a569c890d0) at event-epoll.c:674
#12 0x00007f2da8262e25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f2da7b2f34d in clone () from /lib64/libc.so.6



(gdb) l
969	        child->name = strrchr (child->path, '/');
970	        if (child->name)
971	                child->name++;
972	
973	        child->parent = inode_ref (parent->inode);
974	        child->inode = inode_new (parent->inode->table);
975	
976	        if (!child->inode) {
977	                goto err;
978	        }
(gdb) p parent->inode
$1 = (inode_t *) 0x0
(gdb) p parent->inode->table
Cannot access memory at address 0x0

(gdb) f 1
#1  0x00007f2d9b04dfed in dht_rmdir_is_subvol_empty (frame=0x7f2da6e67b84, this=this@entry=0x7f2d9401cdd0, entries=0x7f2d9bf598f0, src=0x7f2d94019010)
    at dht-common.c:8223
8223	                build_ret = dht_build_child_loc (this, &lookup_local->loc,
(gdb) l
8218	
8219	                lookup_frame->local = lookup_local;
8220	                lookup_local->main_frame = frame;
8221	                lookup_local->hashed_subvol = src;
8222	
8223	                build_ret = dht_build_child_loc (this, &lookup_local->loc,
8224	                                                 &local->loc, trav->d_name);
8225	                if (build_ret != 0)
8226	                        goto err;
8227	
(gdb) p local->loc
$2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
(gdb) 
(gdb) p *frame
$3 = {root = 0x7f2da6c985b4, parent = 0x0, frames = {next = 0x7f2da6e67b94, prev = 0x7f2da6e67b94}, local = 0x0, this = 0x7f2d9401cdd0, ret = 0x0, 
  ref_count = 0, lock = {spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, 
        __list = {__prev = 0x0, __next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, cookie = 0x0, complete = _gf_false, 
  op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, 
  unwind_to = 0x0}

(gdb) p *local
$4 = {call_cnt = 0, loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
  loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, op_ret = -1, 
  op_errno = 39, layout_mismatch = 0, stbuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {

It looks like the frame has been freed.

Comment 1 Worker Ant 2017-07-19 16:20:18 UTC
REVIEW: https://review.gluster.org/17825 (cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-07-20 12:46:03 UTC
COMMIT: https://review.gluster.org/17825 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit fdc431063f33cf4f5572771742e5502565f2a3ca
Author: N Balachandran <nbalacha>
Date:   Wed Jul 19 21:44:55 2017 +0530

    cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty
    
    The local->call_cnt was being accessed and updated inside
    the loop where the entries were being processed and the calls
    were being wound.
    This could end up in a scenario where the local->call_cnt became
    0 before the processing was complete causing the crash when the
    next entry was being processed.
    
    Change-Id: I930f61f1a1d1948f90d4e58e80b7d6680cf27f2f
    BUG: 1472949
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17825
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Shyamsundar 2017-09-05 17:37:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/