Created attachment 1084605 [details] vm1 messages Description of problem: I created a tiering volume and started I/O on the nfs-ganesha with vers=4. I/O being ltp test suite. The tests are hung as the nfs-ganesha server process got seg faulted and failover happens, still I/O does not resume. Version-Release number of selected component (if applicable): nfs-ganesha-2.3-0.rc6.el7.centos.x86_64 glusterfs-3.7.5-1.el7.x86_64 How reproducible: seg fault seen in first attempt Steps to Reproduce: 1. create a volume of type dist-rep with tiering enabled 2. export the volume over nfs-ganesha and mount it with vers=4 3. execute the fs-sanity test suite Actual results: while ltp test suite is getting executed, the nfs-ganesha process sees a segfault, as can be seen with the logs in /var/log/messages, Oct 20 05:21:20 vm1 kernel: ganesha.nfsd[9750]: segfault at 0 ip 00000000004b0ede sp 00007f59122a0ae0 error 4 in ganesha.nfsd[400000+1df000] Oct 20 05:21:21 vm1 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV Oct 20 05:21:21 vm1 systemd: Unit nfs-ganesha.service entered failed state. Oct 20 05:21:31 vm1 cibadmin[21227]: notice: Additional logging available in /var/log/pacemaker.log Oct 20 05:21:31 vm1 cibadmin[21227]: notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe Oct 20 05:21:31 vm1 crmd[19954]: notice: Operation vm1-dead_ip-1_monitor_0: not running (node=vm1, call=119, rc=7, cib-update=142, confirmed=true) Oct 20 05:21:31 vm1 crmd[19954]: notice: Operation vm1-dead_ip-1_start_0: ok (node=vm1, call=120, rc=0, cib-update=143, confirmed=true) Oct 20 05:21:38 vm1 IPaddr(vm1-cluster_ip-1)[21296]: INFO: IP status = ok, IP_CIP= Oct 20 05:21:38 vm1 crmd[19954]: notice: Operation vm1-cluster_ip-1_stop_0: ok (node=vm1, call=123, rc=0, cib-update=145, confirmed=true) Oct 20 05:21:38 vm1 crmd[19954]: notice: Operation nfs-grace_stop_0: ok (node=vm1, call=125, rc=0, cib-update=146, confirmed=true) Oct 20 05:21:38 vm1 crmd[19954]: notice: Operation vm1-trigger_ip-1_stop_0: ok (node=vm1, call=127, rc=0, cib-update=147, confirmed=true) Oct 20 05:21:38 vm1 crmd[19954]: notice: Operation nfs-grace_start_0: ok (node=vm1, call=128, rc=0, cib-update=148, confirmed=true) Oct 20 05:21:48 vm1 logger: warning: pcs resource create vm1-dead_ip-1 ocf:heartbeat:Dummy failed Even the failover happens, as per the pcs status mentioned below, vm1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started vm4 vm1-trigger_ip-1 (ocf::heartbeat:Dummy): Started vm4 vm2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started vm2 vm2-trigger_ip-1 (ocf::heartbeat:Dummy): Started vm2 vm3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started vm3 vm3-trigger_ip-1 (ocf::heartbeat:Dummy): Started vm3 vm4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started vm4 vm4-trigger_ip-1 (ocf::heartbeat:Dummy): Started vm4 vm1-dead_ip-1 (ocf::heartbeat:Dummy): Started vm1 But even after failover, the I/O doesn't resume as the nfs-ganesha logs on the failed-over node has errors being mentioned, 19/10/2015 23:49:57 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol3 exported at : '/' 20/10/2015 05:22:16 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-16] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:16 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-16] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f781c0109c0 20/10/2015 05:22:16 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-12] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:16 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-12] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f7814026c30 20/10/2015 05:22:21 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-14] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:21 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-14] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f781000b1d0 20/10/2015 05:22:24 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-8] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:24 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-8] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f77f803ee00 20/10/2015 05:22:24 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-14] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:24 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-14] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f78180352c0 20/10/2015 05:22:24 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-16] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-16] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f77f8020bd0 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f77f803ee00 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-10] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-10] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f784c037f50 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f784c037f50 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-4] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:25 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-4] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f77f803ee00 20/10/2015 05:22:26 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] file_close :FSAL :CRIT :Error : close returns with Transport endpoint is not connected 20/10/2015 05:22:26 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-9] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 37(CACHE_INODE_SERVERFAULT) for entry 0x7f77f803ee00 20/10/2015 05:22:28 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-1] cache_inode_lookup_impl :INODE :EVENT :FSAL returned STALE from a lookup. 20/10/2015 05:22:28 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[work-3] cache_inode_lookup_impl :INODE :EVENT :FSAL returned STALE from a lookup. 20/10/2015 06:09:51 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat 20/10/2015 06:11:06 : epoch 56215bb5 : vm4 : ganesha.nfsd-16329[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Expected results: Even if the nfs-ganesha has seg-faulted the failover should let the I/O resume. Also, we need to overcome the problem of this segfault Additional info: The segfault related coredump is not found, now I will the test again and see if can be reproduced.
Created attachment 1084606 [details] vm4 messages
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.