Bug 1183988

Summary:	DHT:Quota:- brick process crashed after deleting .glusterfs from backend
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	shylesh <shmohan>
Component:	quota	Assignee:	Vijaikumar Mallikarjuna <vmallika>
Status:	CLOSED ERRATA	QA Contact:	Anil Shah <ashah>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.1	CC:	achauras, annair, asrivast, nlevinki, nsathyan, rcyriac, rhs-bugs, smohan, storage-qa-internal, vagarwal, vbellur
Target Milestone:	---
Target Release:	RHGS 3.1.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.1-3	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1203629 (view as bug list)		Environment:
Last Closed:	2015-07-29 04:37:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1203629
Bug Blocks:	1202842, 1217419

Description shylesh 2015-01-20 11:27:06 UTC

Description of problem:
After add-brick without rebalance , trying to delete a directory from mount fails, hence to cleanup if we delete the directory directly from backend including .glusterfs  brick process will crash.

Version-Release number of selected component (if applicable):
3.4.0.72rhs-1.el6rhs.x86_64

How reproducible:


Steps to Reproduce:
1.created a 6x2 dist-rep volume and enable quota and quota-deem-statfs
2.untarred the linux kernel on the mount
3.Add-brick and rebalance
4. again add-brick and try to delete kernel directory
5. now delete all the files from backend including .glusterfs


Actual results:
After some time brick process crashed. Though this is not a supported scenario still bricks should be able to handle errors  and should not crash

Expected results:


Additional info:

Program terminated with signal 11, Segmentation fault.
#0  uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44
44                      *cp1++ = *cp2++;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-14.el6_4.4.x86_64 libgcc-4.4.7-3.1.el6_4.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.14.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44
#1  0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/",
    pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238)
    at posix-handle.c:89
#2  0x00007f8db73783c2 in posix_make_ancestryfromgfid (this=0x97a6d0, path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=4097,
    head=0x7f8d98194e50, type=3, gfid=<value optimized out>, handle_size=66, priv_base_path=0x9a89c0 "/home/adp13", itable=0x9b3b60, parent=0x7f8db43696b8,
    xdata=0x7f8dba7a5238) at posix-handle.c:179
#3  0x00007f8db7371962 in posix_get_ancestry_directory (this=0x97a6d0, real_path=<value optimized out>, loc=0x7f8dba83106c, dict=0x7f8dba7a5724, type=2,
    op_errno=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:2921
#4  0x00007f8db737265c in posix_getxattr (frame=0x7f8dbadad560, this=0x97a6d0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:3428
#5  0x00000038f0e1e4fb in default_getxattr (frame=0x7f8dbadad560, this=0x97bf80, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry",
    xdata=<value optimized out>) at defaults.c:1083
#6  0x00007f8db6d41fd3 in posix_acl_getxattr (frame=0x7f8dbadab1c4, this=0x97d7e0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry",
    xdata=0x7f8dba7a5238) at posix-acl.c:1945
#7  0x00007f8db6b2f28b in pl_getxattr (frame=0x7f8dbadaeae0, this=0x97e8c0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:502
#8  0x00007f8db6912fea in iot_getxattr_wrapper (frame=0x7f8dbadb3824, this=0x97f9b0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry",
    xdata=0x7f8dba7a5238) at io-threads.c:1743
#9  0x00000038f0e32789 in call_resume_wind (stub=0x7f8dba83102c) at call-stub.c:2248
#10 call_resume (stub=0x7f8dba83102c) at call-stub.c:2645
#11 0x00007f8db691aad8 in iot_worker (data=0x9a4480) at io-threads.c:191
#12 0x0000003ea9607851 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003ea92e85ad in clone () from /lib64/libc.so.6


(gdb) f 1
#1  0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", 
    pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238)
    at posix-handle.c:89
89                      uuid_copy (loc.gfid, inode->gfid);
(gdb) p inode
$1 = (inode_t *) 0x0



for some reason inode is null hence the crash.


bt full
========
#0  uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44
        cp1 = 0x7f8db4368430 ""
        cp2 = 0x8 <Address 0x8 out of bounds>
        i = <value optimized out>
#1  0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", 
    pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238)
    at posix-handle.c:89
        entry = 0x7f8d981a9a80
        real_path = "/home/adp13//linux-2.6.32.65/Documentation/ABI/testing/", '\000' <repeats 4041 times>
        len = <value optimized out>
        loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
        ret = -1
        __FUNCTION__ = "posix_make_ancestral_node"
#2  0x00007f8db73783c2 in posix_make_ancestryfromgfid (this=0x97a6d0, path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=4097, 
    head=0x7f8d98194e50, type=3, gfid=<value optimized out>, handle_size=66, priv_base_path=0x9a89c0 "/home/adp13", itable=0x9b3b60, parent=0x7f8db43696b8, 
    xdata=0x7f8dba7a5238) at posix-handle.c:179
        linkname = 0x7f8db43684d0 "../../3f/95/3f95e550-2482-40ed-8387-eeb10176e136"
        dir_handle = 0x7f8db43694e0 "/home/adp13/.glusterfs/2f/25/2f254d78-a443-4c8a-aaa8-e25380429c9a"
        dir_name = <value optimized out>
        pgfidstr = <value optimized out>
        saveptr = <value optimized out>
        len = <value optimized out>
        inode = 0x0
        iabuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
            owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
              write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
          ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}
        ret = <value optimized out>
        tmp_gfid = "?\225\345P$\202@탇\356\261\001v\341\066"
        __FUNCTION__ = "posix_make_ancestryfromgfid"
#3  0x00007f8db7371962 in posix_get_ancestry_directory (this=0x97a6d0, real_path=<value optimized out>, loc=0x7f8dba83106c, dict=0x7f8dba7a5724, type=2, 
    op_errno=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:2921
        size = 0
        handle_size = 66
        value = 0x0
        priv = <value optimized out>
        head = 0x7f8d98194e50
        dirpath = "/linux-2.6.32.65/Documentation/ABI/testing/", '\000' <repeats 4053 times>
        inode = 0x7f8db4cfecf8
        ret = -1
        __FUNCTION__ = "posix_get_ancestry_directory"
#4  0x00007f8db737265c in posix_getxattr (frame=0x7f8dbadad560, this=0x97a6d0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:3428
        type = <value optimized out>
        priv = 0x9a8760
        op_ret = -1
        op_errno = 0
     host_buf = '\000' <repeats 1023 times>
        value = 0x0
        real_path = 0x7f8db436a730 "/home/adp13/.glusterfs/3f/95/3f95e550-2482-40ed-8387-eeb10176e136/testing"
        dict = 0x7f8dba7a5724
        file_contents = 0x0
        ret = <value optimized out>
        path = 0x0
        rpath = 0x0
        dyn_rpath = 0x0
        size = 0
        list = 0x0
        list_offset = 0
        remaining_size = 0
        key = '\000' <repeats 4095 times>
        __FUNCTION__ = "posix_getxattr"
#5  0x00000038f0e1e4fb in default_getxattr (frame=0x7f8dbadad560, this=0x97bf80, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", 
    xdata=<value optimized out>) at defaults.c:1083
        old_THIS = 0x97bf80
#6  0x00007f8db6d41fd3 in posix_acl_getxattr (frame=0x7f8dbadab1c4, this=0x97d7e0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", 
    xdata=0x7f8dba7a5238) at posix-acl.c:1945
        _new = 0x7f8dbadad560
        old_THIS = 0x97d7e0
        tmp_cbk = 0x7f8db6d3ce80 <posix_acl_getxattr_cbk>
        __FUNCTION__ = "posix_acl_getxattr"
#7  0x00007f8db6b2f28b in pl_getxattr (frame=0x7f8dbadaeae0, this=0x97e8c0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:502
        _new = 0x7f8dbadab1c4
        old_THIS = 0x97e8c0
        tmp_cbk = 0x7f8db6b28e20 <pl_getxattr_cbk>
        op_errno = 22
        op_ret = -1
        bcount = 0
        gcount = 0
        key = '\000' <repeats 4095 times>
        lk_summary = 0x0
        pl_inode = 0x0
        dict = 0x0
        args = {type = 0, kind = 0, opts = 0x0}
        brickname = 0x0
        __FUNCTION__ = "pl_getxattr"
#8  0x00007f8db6912fea in iot_getxattr_wrapper (frame=0x7f8dbadb3824, this=0x97f9b0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", 
    xdata=0x7f8dba7a5238) at io-threads.c:1743
        _new = 0x7f8dbadaeae0
        old_THIS = 0x97f9b0
        tmp_cbk = 0x7f8db690f320 <iot_getxattr_cbk>
        __FUNCTION__ = "iot_getxattr_wrapper"
#9  0x00000038f0e32789 in call_resume_wind (stub=0x7f8dba83102c) at call-stub.c:2248

Comment 4 Amit Chaurasia 2015-03-05 08:17:17 UTC

A little indicates that the issue is only with a folder:
 ls -ltrh /rhs/brick1/gv0/data1/shd/gluster/test/

The getfattr of the folder shows that there is no gfid for the folder.

[root@dht-rhs-19 ~]# getfattr -d -m . -e hex /rhs/brick1/gv0/data1/shd/gluster/test/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/gv0/data1/shd/gluster/test/
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.d11e1ed1-88d4-4cf2-9d10-8c77a63cb206.contri=0x00000000000f9e00
trusted.glusterfs.quota.dirty=0x3100
trusted.glusterfs.quota.size=0x00000000000f9e00

As part of one my testcase, I deleted the this gfid using setfattr. Ideally, this xattr should have been healed through lookup. But, instead of healing, it dumps core.

Comment 5 Vijaikumar Mallikarjuna 2015-03-19 10:26:52 UTC

Patch submitted upstream: http://review.gluster.com/#/c/3842/

Comment 7 Vijaikumar Mallikarjuna 2015-06-09 06:06:56 UTC

Below upstream patch fixes the issue:
http://review.gluster.org/#/c/9941/

Comment 8 Anil Shah 2015-07-07 06:51:18 UTC

Delete the .glusterfs folder from backend. Didn't not see any core.
However the brick process was kill. This is expected behaviour as discussed with developer.

Earlier brick process was getting killed unexpectedly with core file generated. Now its fails gracefully.

bricks logs
========================================================
[2015-07-07 06:19:32.297714] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-vol0-server: accepted client from darkknight-3012-2015/07/07-06:19:31:178889-vol0-client-0-0-0 (version: 3.7.1)
[2015-07-07 06:20:27.232896] W [MSGID: 113075] [posix-helpers.c:1676:posix_fs_health_check] 0-vol0-posix: open() on /rhs/brick1/b001/.glusterfs/health_check returned [No such file or directory]
[2015-07-07 06:20:27.232989] W [MSGID: 113075] [posix-helpers.c:1741:posix_health_check_thread_proc] 0-vol0-posix: health_check on /rhs/brick1/b001 returned [No such file or directory]
[2015-07-07 06:20:27.233018] M [MSGID: 113075] [posix-helpers.c:1762:posix_health_check_thread_proc] 0-vol0-posix: health-check failed, going down
[2015-07-07 06:20:57.234507] M [MSGID: 113075] [posix-helpers.c:1768:posix_health_check_thread_proc] 0-vol0-posix: still alive! -> SIGTERM

Bug verified on build glusterfs-3.7.1-7.el6rhs.x86_64

Comment 10 errata-xmlrpc 2015-07-29 04:37:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html