Description of problem: After add-brick without rebalance , trying to delete a directory from mount fails, hence to cleanup if we delete the directory directly from backend including .glusterfs brick process will crash. Version-Release number of selected component (if applicable): 3.4.0.72rhs-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1.created a 6x2 dist-rep volume and enable quota and quota-deem-statfs 2.untarred the linux kernel on the mount 3.Add-brick and rebalance 4. again add-brick and try to delete kernel directory 5. now delete all the files from backend including .glusterfs Actual results: After some time brick process crashed. Though this is not a supported scenario still bricks should be able to handle errors and should not crash Expected results: Additional info: Program terminated with signal 11, Segmentation fault. #0 uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44 44 *cp1++ = *cp2++; Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-14.el6_4.4.x86_64 libgcc-4.4.7-3.1.el6_4.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.14.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44 #1 0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238) at posix-handle.c:89 #2 0x00007f8db73783c2 in posix_make_ancestryfromgfid (this=0x97a6d0, path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=4097, head=0x7f8d98194e50, type=3, gfid=<value optimized out>, handle_size=66, priv_base_path=0x9a89c0 "/home/adp13", itable=0x9b3b60, parent=0x7f8db43696b8, xdata=0x7f8dba7a5238) at posix-handle.c:179 #3 0x00007f8db7371962 in posix_get_ancestry_directory (this=0x97a6d0, real_path=<value optimized out>, loc=0x7f8dba83106c, dict=0x7f8dba7a5724, type=2, op_errno=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:2921 #4 0x00007f8db737265c in posix_getxattr (frame=0x7f8dbadad560, this=0x97a6d0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:3428 #5 0x00000038f0e1e4fb in default_getxattr (frame=0x7f8dbadad560, this=0x97bf80, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=<value optimized out>) at defaults.c:1083 #6 0x00007f8db6d41fd3 in posix_acl_getxattr (frame=0x7f8dbadab1c4, this=0x97d7e0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=0x7f8dba7a5238) at posix-acl.c:1945 #7 0x00007f8db6b2f28b in pl_getxattr (frame=0x7f8dbadaeae0, this=0x97e8c0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:502 #8 0x00007f8db6912fea in iot_getxattr_wrapper (frame=0x7f8dbadb3824, this=0x97f9b0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=0x7f8dba7a5238) at io-threads.c:1743 #9 0x00000038f0e32789 in call_resume_wind (stub=0x7f8dba83102c) at call-stub.c:2248 #10 call_resume (stub=0x7f8dba83102c) at call-stub.c:2645 #11 0x00007f8db691aad8 in iot_worker (data=0x9a4480) at io-threads.c:191 #12 0x0000003ea9607851 in start_thread () from /lib64/libpthread.so.0 #13 0x0000003ea92e85ad in clone () from /lib64/libc.so.6 (gdb) f 1 #1 0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238) at posix-handle.c:89 89 uuid_copy (loc.gfid, inode->gfid); (gdb) p inode $1 = (inode_t *) 0x0 for some reason inode is null hence the crash. bt full ======== #0 uuid_copy (dst=0x7f8db4368430 "", src=<value optimized out>) at ../../contrib/uuid/copy.c:44 cp1 = 0x7f8db4368430 "" cp2 = 0x8 <Address 0x8 out of bounds> i = <value optimized out> #1 0x00007f8db7377e92 in posix_make_ancestral_node (priv_base_path=0x9a89c0 "/home/adp13", path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=<value optimized out>, head=0x7f8d98194e50, dir_name=0x7f8db4368501 "testing/", iabuf=0x7f8db4369590, inode=0x0, type=3, xdata=0x7f8dba7a5238) at posix-handle.c:89 entry = 0x7f8d981a9a80 real_path = "/home/adp13//linux-2.6.32.65/Documentation/ABI/testing/", '\000' <repeats 4041 times> len = <value optimized out> loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>} ret = -1 __FUNCTION__ = "posix_make_ancestral_node" #2 0x00007f8db73783c2 in posix_make_ancestryfromgfid (this=0x97a6d0, path=0x7f8db43696c0 "/linux-2.6.32.65/Documentation/ABI/testing/", pathsize=4097, head=0x7f8d98194e50, type=3, gfid=<value optimized out>, handle_size=66, priv_base_path=0x9a89c0 "/home/adp13", itable=0x9b3b60, parent=0x7f8db43696b8, xdata=0x7f8dba7a5238) at posix-handle.c:179 linkname = 0x7f8db43684d0 "../../3f/95/3f95e550-2482-40ed-8387-eeb10176e136" dir_handle = 0x7f8db43694e0 "/home/adp13/.glusterfs/2f/25/2f254d78-a443-4c8a-aaa8-e25380429c9a" dir_name = <value optimized out> pgfidstr = <value optimized out> saveptr = <value optimized out> len = <value optimized out> inode = 0x0 iabuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0} ret = <value optimized out> tmp_gfid = "?\225\345P$\202@탇\356\261\001v\341\066" __FUNCTION__ = "posix_make_ancestryfromgfid" #3 0x00007f8db7371962 in posix_get_ancestry_directory (this=0x97a6d0, real_path=<value optimized out>, loc=0x7f8dba83106c, dict=0x7f8dba7a5724, type=2, op_errno=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:2921 size = 0 handle_size = 66 value = 0x0 priv = <value optimized out> head = 0x7f8d98194e50 dirpath = "/linux-2.6.32.65/Documentation/ABI/testing/", '\000' <repeats 4053 times> inode = 0x7f8db4cfecf8 ret = -1 __FUNCTION__ = "posix_get_ancestry_directory" #4 0x00007f8db737265c in posix_getxattr (frame=0x7f8dbadad560, this=0x97a6d0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:3428 type = <value optimized out> priv = 0x9a8760 op_ret = -1 op_errno = 0 host_buf = '\000' <repeats 1023 times> value = 0x0 real_path = 0x7f8db436a730 "/home/adp13/.glusterfs/3f/95/3f95e550-2482-40ed-8387-eeb10176e136/testing" dict = 0x7f8dba7a5724 file_contents = 0x0 ret = <value optimized out> path = 0x0 rpath = 0x0 dyn_rpath = 0x0 size = 0 list = 0x0 list_offset = 0 remaining_size = 0 key = '\000' <repeats 4095 times> __FUNCTION__ = "posix_getxattr" #5 0x00000038f0e1e4fb in default_getxattr (frame=0x7f8dbadad560, this=0x97bf80, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=<value optimized out>) at defaults.c:1083 old_THIS = 0x97bf80 #6 0x00007f8db6d41fd3 in posix_acl_getxattr (frame=0x7f8dbadab1c4, this=0x97d7e0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=0x7f8dba7a5238) at posix-acl.c:1945 _new = 0x7f8dbadad560 old_THIS = 0x97d7e0 tmp_cbk = 0x7f8db6d3ce80 <posix_acl_getxattr_cbk> __FUNCTION__ = "posix_acl_getxattr" #7 0x00007f8db6b2f28b in pl_getxattr (frame=0x7f8dbadaeae0, this=0x97e8c0, loc=0x7f8dba83106c, name=<value optimized out>, xdata=0x7f8dba7a5238) at posix.c:502 _new = 0x7f8dbadab1c4 old_THIS = 0x97e8c0 tmp_cbk = 0x7f8db6b28e20 <pl_getxattr_cbk> op_errno = 22 op_ret = -1 bcount = 0 gcount = 0 key = '\000' <repeats 4095 times> lk_summary = 0x0 pl_inode = 0x0 dict = 0x0 args = {type = 0, kind = 0, opts = 0x0} brickname = 0x0 __FUNCTION__ = "pl_getxattr" #8 0x00007f8db6912fea in iot_getxattr_wrapper (frame=0x7f8dbadb3824, this=0x97f9b0, loc=0x7f8dba83106c, name=0xa9aaa0 "glusterfs.ancestry.dentry", xdata=0x7f8dba7a5238) at io-threads.c:1743 _new = 0x7f8dbadaeae0 old_THIS = 0x97f9b0 tmp_cbk = 0x7f8db690f320 <iot_getxattr_cbk> __FUNCTION__ = "iot_getxattr_wrapper" #9 0x00000038f0e32789 in call_resume_wind (stub=0x7f8dba83102c) at call-stub.c:2248
A little indicates that the issue is only with a folder: ls -ltrh /rhs/brick1/gv0/data1/shd/gluster/test/ The getfattr of the folder shows that there is no gfid for the folder. [root@dht-rhs-19 ~]# getfattr -d -m . -e hex /rhs/brick1/gv0/data1/shd/gluster/test/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/gv0/data1/shd/gluster/test/ trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.d11e1ed1-88d4-4cf2-9d10-8c77a63cb206.contri=0x00000000000f9e00 trusted.glusterfs.quota.dirty=0x3100 trusted.glusterfs.quota.size=0x00000000000f9e00 As part of one my testcase, I deleted the this gfid using setfattr. Ideally, this xattr should have been healed through lookup. But, instead of healing, it dumps core.
Patch submitted upstream: http://review.gluster.com/#/c/3842/
Below upstream patch fixes the issue: http://review.gluster.org/#/c/9941/
Delete the .glusterfs folder from backend. Didn't not see any core. However the brick process was kill. This is expected behaviour as discussed with developer. Earlier brick process was getting killed unexpectedly with core file generated. Now its fails gracefully. bricks logs ======================================================== [2015-07-07 06:19:32.297714] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-vol0-server: accepted client from darkknight-3012-2015/07/07-06:19:31:178889-vol0-client-0-0-0 (version: 3.7.1) [2015-07-07 06:20:27.232896] W [MSGID: 113075] [posix-helpers.c:1676:posix_fs_health_check] 0-vol0-posix: open() on /rhs/brick1/b001/.glusterfs/health_check returned [No such file or directory] [2015-07-07 06:20:27.232989] W [MSGID: 113075] [posix-helpers.c:1741:posix_health_check_thread_proc] 0-vol0-posix: health_check on /rhs/brick1/b001 returned [No such file or directory] [2015-07-07 06:20:27.233018] M [MSGID: 113075] [posix-helpers.c:1762:posix_health_check_thread_proc] 0-vol0-posix: health-check failed, going down [2015-07-07 06:20:57.234507] M [MSGID: 113075] [posix-helpers.c:1768:posix_health_check_thread_proc] 0-vol0-posix: still alive! -> SIGTERM Bug verified on build glusterfs-3.7.1-7.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html