Description of problem: Got to see this the crash for nfs-ganesha process. here is the bt, (gdb) bt #0 0x0000003e96632625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003e96633e05 in abort () at abort.c:92 #2 0x0000003e96670537 in __libc_message (do_abort=2, fmt=0x3e96758900 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x0000003e96675f4e in malloc_printerr (action=3, str=0x3e96758c28 "free(): invalid next size (fast)", ptr=<value optimized out>, ar_ptr=<value optimized out>) at malloc.c:6350 #4 0x0000003e96678ca0 in _int_free (av=0x7fa61c000020, p=0x7fa61c03a1c0, have_lock=0) at malloc.c:4836 #5 0x00000000004fbff1 in gsh_free () #6 0x00000000004fc3bd in nfs4_ace_free () #7 0x00000000004fc3f2 in nfs4_acl_free () #8 0x00000000004fd20f in nfs4_acl_release_entry () #9 0x00000000004dbdd7 in cache_inode_refresh_attrs () #10 0x00000000004df070 in cache_inode_lock_trust_attrs () #11 0x00000000004cd4cc in cache_inode_getattr () #12 0x000000000048d1cc in cache_entry_To_Fattr () #13 0x0000000000466d53 in nfs4_op_getattr () #14 0x000000000045fe0d in nfs4_Compound () #15 0x00000000004549a1 in nfs_rpc_execute () #16 0x000000000045562a in worker_run () #17 0x000000000050d836 in fridgethr_start_routine () #18 0x0000003e96a07a51 in start_thread (arg=0x7fa635f31700) at pthread_create.c:301 #19 0x0000003e966e896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Version-Release number of selected component (if applicable): glusterfs-3.7.1-9.el6rhs.x86_64 nfs-ganesha-2.2.0-5.el6rhs.x86_64 How reproducible: just once Steps to Reproduce: Not pretty much sure of the steps, I was executing cases related nfs4_setfacl and nfs4_getfacl. Post this I did a "kill -9 <pid of ganesha>" inorder to trigger a failover while I/O is going on. Actual results: as mentioned in the bt Expected results: no crash for nfs-ganesha process. Additional info:
Created attachment 1051850 [details] nfs11 ganesha.log
Please collect ganesha-gfapi.log and coredump from this url, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1242957/
Verified this bug by performing several acl cases followed by failover/failback scenarios, no crash is seen on any nodes of the cluster. Based on the above observation, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1288