Description of problem: While acls and quota was enabled for the volume. The data creation is hung as nfs-ganesha has coredumped. During this I/O, add-brick and rebalance were also attemtpted successfully. Version-Release number of selected component (if applicable): glusterfs-3.7.1-14.el7rhgs.x86_64 nfs-ganesha-2.2.0-7.el7rhgs.x86_64 How reproducible: Test was executed once only Steps to Reproduce: 1. create a volume of 6x2 type, start it 2. configure nfs-ganesha 3. enable acls for the volume 4. enable quota on the volume and set a limit of 100GB on "/" 5. mount the volume, start creating data 6. while data creation is going on, execute add-brick and rebalance Actual results: data creation is hung, as nfs-ganesha has coredumped, (gdb) bt #0 0x00007fef17b045d7 in raise () from /lib64/libc.so.6 #1 0x00007fef17b05cc8 in abort () from /lib64/libc.so.6 #2 0x00007fef17b44e07 in __libc_message () from /lib64/libc.so.6 #3 0x00007fef17b4c1fd in _int_free () from /lib64/libc.so.6 #4 0x00000000004f9ae5 in gsh_free () #5 0x00000000004f9eb3 in nfs4_ace_free () #6 0x00000000004f9ee7 in nfs4_acl_free () #7 0x00000000004faca5 in nfs4_acl_release_entry () #8 0x00000000004d9a0e in cache_inode_refresh_attrs () #9 0x00000000004dcc69 in cache_inode_lock_trust_attrs () #10 0x000000000047015a in cache_inode_get_changeid4 () #11 0x00000000004735b8 in nfs4_op_open () #12 0x000000000045eab5 in nfs4_Compound () #13 0x0000000000453a01 in nfs_rpc_execute () #14 0x00000000004545ad in worker_run () #15 0x000000000050afeb in fridgethr_start_routine () #16 0x00007fef1809fdf5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007fef17bc51ad in clone () from /lib64/libc.so.6 Expected results: NFS-ganesha should crash with the operations mentioned above. Additional info:
I cannot reproduce the issue in my setup by using above mentioned steps. But I got same bt and similar problem if and only if the quota size exceeds. I ran it twice and I happened once.
I was able to see the same coredump again with similar steps as mentioned in description section
I didn't RCAed bug till now. But if I include latest acl related changes(already merged in upstream) , it is not reproduced any more. So for that we may need to backport two more ganesha patches to downstream https://review.gerrithub.io/#/c/236924/ (BZ1251471) https://review.gerrithub.io/#/c/240757/ (BZ1242148) These bugs were deferred to 3.1.2.
tested on , nfs-ganesha-2.2.0-10.el7rhgs.x86_64, glusterfs-3.7.5-0.3.el7rhgs.x86_64 with similar steps.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html