Description of problem: after executing many quota commands eventually all the bricks of a distributed-replicate volume are crashed Version-Release number of selected component (if applicable): [root@gqac024 tmp]# rpm -qa| grep gluster glusterfs-libs-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-api-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-devel-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-server-3.4.0.20rhsquota5-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. I had a 4x2 distributed-replicate volume 2. randomly limiting usage on root and subdirectories and executing "quota list" command Actual results: eventually all the bricks of the volume were crashed Additional info: Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 5926689f-441d-40e7-b65a-e890eb886b09 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gqac022.sbu.lab.eng.bos.redhat.com:/home/dr0 Brick2: gqac023.sbu.lab.eng.bos.redhat.com:/home/dr0 Brick3: gqac024.sbu.lab.eng.bos.redhat.com:/home/dr1 Brick4: gqac025.sbu.lab.eng.bos.redhat.com:/home/dr1 Brick5: gqac022.sbu.lab.eng.bos.redhat.com:/home/dr2 Brick6: gqac023.sbu.lab.eng.bos.redhat.com:/home/dr2 Brick7: gqac024.sbu.lab.eng.bos.redhat.com:/home/dr3 Brick8: gqac025.sbu.lab.eng.bos.redhat.com:/home/dr3 Options Reconfigured: features.quota: on cluster info ========= gqac022.sbu.lab.eng.bos.redhat.com: gqac023.sbu.lab.eng.bos.redhat.com: gqac024.sbu.lab.eng.bos.redhat.com: gqac025.sbu.lab.eng.bos.redhat.com: mounted on ============ gqac022.sbu.lab.eng.bos.redhat.com mount point ========== /mnt Core was generated by `/usr/sbin/glusterfsd -s gqac024.sbu.lab.eng.bos.redhat.com --volfile-id dist-re'. Program terminated with signal 11, Segmentation fault. #0 0x0000003178332d5f in __strlen_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.2.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.4.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x0000003178332d5f in __strlen_sse42 () from /lib64/libc.so.6 #1 0x00007f463da5e629 in posix_make_ancestryfromgfid (this=0xa20d70, path=0x7f46374179b0 "", pathsize=4097, head=0x7f4620003350, type=1, gfid=<value optimized out>, handle_size=64, priv_base_path=0xa4e4d0 "/home/dr3", itable=0xa595f0, parent=0x7f46374189b8, xdata=0x7f4640c8564c) at posix-handle.c:155 #2 0x00007f463da583dd in posix_get_ancestry_directory (this=0xa20d70, real_path=<value optimized out>, loc=0x7f4640d0dc70, dict=0x7f4640c85994, type=1, op_errno=<value optimized out>, xdata=0x7f4640c8564c) at posix.c:2748 #3 0x00007f463da5b216 in _posix_xattr_get_set (xattr_req=0x7f4Core was generated by `/usr/sbin/glusterfsd -s gqac024.sbu.lab.eng.bos.redhat.com --volfile-id dist-re'. Program terminated with signal 11, Segmentation fault. #0 0x0000003178332d5f in __strlen_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.2.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.4.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-14.el6_4.2.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x0000003178332d5f in __strlen_sse42 () from /lib64/libc.so.6 #1 0x00007f463da5e629 in posix_make_ancestryfromgfid (this=0xa20d70, path=0x7f46374179b0 "", pathsize=4097, head=0x7f4620003350, type=1, gfid=<value optimized out>, handle_size=64, priv_base_path=0xa4e4d0 "/home/dr3", itable=0xa595f0, parent=0x7f46374189b8, xdata=0x7f4640c8564c) at posix-handle.c:155 #2 0x00007f463da583dd in posix_get_ancestry_directory (this=0xa20d70, real_path=<value optimized out>, loc=0x7f4640d0dc70, dict=0x7f4640c85994, type=1, op_errno=<value optimized out>, xdata=0x7f4640c8564c) at posix.c:2748 #3 0x00007f463da5b216 in _posix_xattr_get_set (xattr_req=0x7f4640c8564c, key=<value optimized out>, data=0x7f4640aa9200, xattrargs=0x7f4637418a80) at posix-helpers.c:319 #4 0x00007f4642478025 in dict_foreach (dict=0x7f4640c8564c, fn=0x7f463da5afc0 <_posix_xattr_get_set>, data=0x7f4637418a80) at dict.c:1109 #5 0x00007f463da5a8f5 in posix_lookup_xattr_fill (this=0xa20d70, real_path=0x7f4637418b20 "/home/dr3/another/", loc=0x7f4640d0dc70, xattr_req=0x7f4640c8564c, buf=<value optimized out>) at posix-helpers.c:558 #6 0x00007f463da57401 in posix_lookup (frame=0x7f46412d3a68, this=0xa20d70, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:153 #7 0x00007f464247fefd in default_lookup (frame=0x7f46412d3a68, this=0xa225a0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at defaults.c:1253 #8 0x00007f463d42a8c2 in posix_acl_lookup (frame=0x7f46412d39bc, this=0xa23760, loc=0x7f4640d0dc70, xattr=<value optimized out>) at posix-acl.c:793 #9 0x00007f463d212892 in pl_lookup (frame=0x7f46412d3910, this=0xa248c0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:2081 #10 0x00007f463cffeebc in iot_lookup_wrapper (frame=0x7f46412d3864, this=0xa258e0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at io-threads.c:346 #11 0x00007f4642494172 in call_resume_wind (stub=0x7f4640d0dc30) at call-stub.c:2312 #12 call_resume (stub=0x7f4640d0dc30) at call-stub.c:2645 #13 0x00007f463d0039f8 in iot_worker (data=0xa49f90) at io-threads.c:191 #14 0x0000003178a07851 in start_thread () from /lib64/libpthread.so.0 #15 0x00000031782e890d in clone () from /lib64/libc.so.6 640c8564c, key=<value optimized out>, data=0x7f4640aa9200, xattrargs=0x7f4637418a80) at posix-helpers.c:319 #4 0x00007f4642478025 in dict_foreach (dict=0x7f4640c8564c, fn=0x7f463da5afc0 <_posix_xattr_get_set>, data=0x7f4637418a80) at dict.c:1109 #5 0x00007f463da5a8f5 in posix_lookup_xattr_fill (this=0xa20d70, real_path=0x7f4637418b20 "/home/dr3/another/", loc=0x7f4640d0dc70, xattr_req=0x7f4640c8564c, buf=<value optimized out>) at posix-helpers.c:558 #6 0x00007f463da57401 in posix_lookup (frame=0x7f46412d3a68, this=0xa20d70, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:153 #7 0x00007f464247fefd in default_lookup (frame=0x7f46412d3a68, this=0xa225a0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at defaults.c:1253 #8 0x00007f463d42a8c2 in posix_acl_lookup (frame=0x7f46412d39bc, this=0xa23760, loc=0x7f4640d0dc70, xattr=<value optimized out>) at posix-acl.c:793 #9 0x00007f463d212892 in pl_lookup (frame=0x7f46412d3910, this=0xa248c0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:2081 #10 0x00007f463cffeebc in iot_lookup_wrapper (frame=0x7f46412d3864, this=0xa258e0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at io-threads.c:346 #11 0x00007f4642494172 in call_resume_wind (stub=0x7f4640d0dc30) at call-stub.c:2312 #12 call_resume (stub=0x7f4640d0dc30) at call-stub.c:2645 #13 0x00007f463d0039f8 in iot_worker (data=0xa49f90) at io-threads.c:191 #14 0x0000003178a07851 in start_thread () from /lib64/libpthread.so.0 #15 0x00000031782e890d in clone () from /lib64/libc.so.6 bt full ======= (gdb) bt full #0 0x0000003178332d5f in __strlen_sse42 () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f463da5e629 in posix_make_ancestryfromgfid (this=0xa20d70, path=0x7f46374179b0 "", pathsize=4097, head=0x7f4620003350, type=1, gfid=<value optimized out>, handle_size=64, priv_base_path=0xa4e4d0 "/home/dr3", itable=0xa595f0, parent=0x7f46374189b8, xdata=0x7f4640c8564c) at posix-handle.c:155 linkname = 0x7f4637417620 "" dir_handle = <value optimized out> dir_name = 0x0 pgfidstr = <value optimized out> saveptr = <value optimized out> len = <value optimized out> inode = 0x0 iabuf = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0} ret = -1 tmp_gfid = '\000' <repeats 15 times> #2 0x00007f463da583dd in posix_get_ancestry_directory (this=0xa20d70, real_path=<value optimized out>, loc=0x7f4640d0dc70, dict=0x7f4640c85994, type=1, op_errno=<value optimized out>, xdata=0x7f4640c8564c) at posix.c:2748 size = 0 handle_size = 64 value = 0x0 priv = <value optimized out> head = 0x7f4620003350 dirpath = '\000' <repeats 4096 times> inode = 0x0 ret = -1 __FUNCTION__ = "posix_get_ancestry_directory" #3 0x00007f463da5b216 in _posix_xattr_get_set (xattr_req=0x7f4640c8564c, key=<value optimized out>, data=0x7f4640aa9200, xattrargs=0x7f4637418a80) at posix-helpers.c:319 filler = 0x7f4637418a80 ret = -1 databuf = 0x0 _fd = -1 loc = 0x0 req_size = 0 __FUNCTION__ = "_posix_xattr_get_set" #4 0x00007f4642478025 in dict_foreach (dict=0x7f4640c8564c, fn=0x7f463da5afc0 <_posix_xattr_get_set>, data=0x7f4637418a80) at dict.c:1109 __FUNCTION__ = "dict_foreach" ret = <value optimized out> pairs = <value optimized out> next = 0x7f4640b7a9fc #5 0x00007f463da5a8f5 in posix_lookup_xattr_fill (this=0xa20d70, real_path=0x7f4637418b20 "/home/dr3/another/", loc=0x7f4640d0dc70, xattr_req=0x7f4640c8564c, buf=<value optimized out>) at posix-helpers.c:558 xattr = 0x7f4640c85994 filler = {this = 0xa20d70, real_path = 0x7f4637418b20 "/home/dr3/another/", xattr = 0x7f4640c85994, stbuf = 0x7f4637418c00, loc = 0x7f4640d0dc70, inode = 0x0, fd = 0, flags = 0, op_errno = 0} #6 0x00007f463da57401 in posix_lookup (frame=0x7f46412d3a68, this=0xa20d70, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:153 buf = {ia_ino = 9402059841903500362, ia_gfid = "\261TJ\032\070|F\247\202zӜ\251\237\234J", ia_dev = 64770, ia_type = IA_IFDIR, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001', write = 0 '\000', exec = 1 '\001'}, other = { read = 1 '\001', write = 0 '\000', exec = 1 '\001'}}, ia_nlink = 2, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 6, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1377670599, ia_atime_nsec = 276521591, ia_mtime = 1377670599, ia_mtime_nsec = 276521591, ia_ctime = 1377670599, ia_ctime_nsec = 281521788} op_ret = 0 entry_ret = 0 op_errno = 0 xattr = 0x0 real_path = 0x7f4637418b20 "/home/dr3/another/" par_path = 0x0 postparent = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0} gfidless = 1 pgfid_xattr_key = 0x0 nlink_samepgfid = 0 __FUNCTION__ = "posix_lookup" #7 0x00007f464247fefd in default_lookup (frame=0x7f46412d3a68, this=0xa225a0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at defaults.c:1253 old_THIS = 0xa225a0 #8 0x00007f463d42a8c2 in posix_acl_lookup (frame=0x7f46412d39bc, this=0xa23760, loc=0x7f4640d0dc70, xattr=<value optimized out>) at posix-acl.c:793 _new = 0x7f46412d3a68 old_THIS = 0xa23760 tmp_cbk = 0x7f463d42bf20 <posix_acl_lookup_cbk> ret = <value optimized out> my_xattr = 0x7f4640c8564c __FUNCTION__ = "posix_acl_lookup" #9 0x00007f463d212892 in pl_lookup (frame=0x7f46412d3910, this=0xa248c0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at posix.c:2081 _new = 0x7f46412d39bc old_THIS = 0xa248c0 tmp_cbk = 0x7f463d212fd0 <pl_lookup_cbk> local = <value optimized out> __FUNCTION__ = "pl_lookup" #10 0x00007f463cffeebc in iot_lookup_wrapper (frame=0x7f46412d3864, this=0xa258e0, loc=0x7f4640d0dc70, xdata=0x7f4640c8564c) at io-threads.c:346 _new = 0x7f46412d3910 old_THIS = 0xa258e0 tmp_cbk = 0x7f463cffa4e0 <iot_lookup_cbk> __FUNCTION__ = "iot_lookup_wrapper" #11 0x00007f4642494172 in call_resume_wind (stub=0x7f4640d0dc30) at call-stub.c:2312 No locals. #12 call_resume (stub=0x7f4640d0dc30) at call-stub.c:2645 old_THIS = 0xa258e0 __FUNCTION__ = "call_resume" #13 0x00007f463d0039f8 in iot_worker (data=0xa49f90) at io-threads.c:191 conf = 0xa49f90 this = <value optimized out> stub = <value optimized out> sleep_till = {tv_sec = 1377670722, tv_nsec = 0} ret = <value optimized out> pri = 0 timeout = 0 '\000' bye = 0 '\000' sleep = {tv_sec = 0, tv_nsec = 0} __FUNCTION__ = "iot_worker" #14 0x0000003178a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #15 0x00000031782e890d in clone () from /lib64/libc.so.6 No symbol table info available. attached the sosreports
Should be fixed in v3.4.0.30rhs. If the release was made on bigbend-quota downstream branch, following patch fixes the issue https://code.engineering.redhat.com/gerrit/12036 Most likely a duplicate of BZ 1001631 *** This bug has been marked as a duplicate of bug 1001631 ***