1213352 – nfs-ganesha: HA issue, the iozone process is not moving ahead, once the nfs-ganesha is killed

Bug 1213352 - nfs-ganesha: HA issue, the iozone process is not moving ahead, once the nfs-ganesha is killed

Summary: nfs-ganesha: HA issue, the iozone process is not moving ahead, once the nfs-g...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	ganesha-nfs
Sub Component:
Version:	3.7.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kaleb KEITHLEY
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-04-20 11:12 UTC by Saurabh
Modified:	2023-09-14 02:58 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-06 17:50:09 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfs-ganesha logs from nfs2 (14.78 KB, text/plain) 2015-04-20 12:31 UTC, Saurabh	no flags	Details
nfs-ganesha logs from nfs3 (8.24 KB, text/plain) 2015-04-20 12:32 UTC, Saurabh	no flags	Details
View All

Description Saurabh 2015-04-20 11:12:28 UTC

Description of problem:
As per the HA functionality, the IO should resume after the grace period.
NOw we are providing nfs-ganesha cluster with HA functionality, so the IO that is going should resume even in case the nfs-ganesha process is killed. This should IO resumption should happen after the grace period completion, as the failover would have happened to another node.

So, this is not happening for the present setup and it is a problem.

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2-0.rc8.el6.x86_64
glusterfs-3.7dev-0.1017.git7fb85e3.el6.x86_64

How reproducible:
Tried HA for first time.

Steps to Reproduce:
1. do cluster setup for nfs-ganesha, as per guidelines
2. once the nfs-ganehsa is up, mount the volume on a client
3. start the  iozone
4. kill the nfs-ganesha process on the server node.


Actual results:
step 1. -- cluster done, nfs -ganesha come up only on 2 nodes out of four nodes

step 2. -- mount it done using vers=4,
step 3. -- iozone started
step 4. kill the nfs ganesha process using the kill command,

iozone stuck on mount point, not moving ahead


It is not moving ahead from this point.
Expected results:
IO should move ahead, as HA functionality should allow failover of the nfs-ganesha process to the other node.

Additional info:

Comment 1 Meghana 2015-04-20 11:16:09 UTC

Need the following information,

1. showmount -e VIP  output
2.NFS-Ganesha logs
3. pcs status output

Comment 2 Saurabh 2015-04-20 12:29:28 UTC

So I am having four nodes, namely nfs[1,2,3,4]

nfs-ganehsa came up only on nfs2 and nfs3
and presently I killed nfs-ganesha process on nfs2
so collected the showmount output from nfs3,

[root@nfs3 ~]# showmount -e 10.70.36.217
Export list for 10.70.36.217:
/vol0 (everyone)
[root@nfs3 ~]# showmount -e 10.70.36.218
Export list for 10.70.36.218:
/vol0 (everyone)
[root@nfs3 ~]# showmount -e 10.70.36.219
Export list for 10.70.36.219:
/vol0 (everyone)
[root@nfs3 ~]# showmount -e 10.70.36.220
Export list for 10.70.36.220:
/vol0 (everyone)


node 1,
#####################################
[root@nfs1 ~]# ps -eaf | grep nfs
root      5338  6760  0 14:57 pts/0    00:00:00 grep nfs


[root@nfs1 ~]# pcs status
Cluster name: ganesha-ha-2
Last updated: Mon Apr 20 14:58:03 2015
Last change: Mon Apr 20 12:28:04 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
22 Resources configured


Online: [ nfs1 nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs_start-clone [nfs_start]
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs3 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs1 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs2 (unmanaged) 
     Stopped: [ nfs4 ]
 nfs1-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs2 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs2 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 

Failed actions:
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms


node 2,
##########################################
[root@nfs2 ~]# ps -eaf | grep nfs
root      5260 16826  0 14:58 pts/0    00:00:00 grep nfs
root      6216     1  0 12:27 ?        00:00:05 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid


[root@nfs2 ~]# pcs status
Cluster name: ganesha-ha-2
Last updated: Mon Apr 20 14:58:49 2015
Last change: Mon Apr 20 12:28:04 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
22 Resources configured


Online: [ nfs1 nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs_start-clone [nfs_start]
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs3 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs1 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs2 (unmanaged) 
     Stopped: [ nfs4 ]
 nfs1-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs2 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs2 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 

Failed actions:
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms

node 3,
#############################################

[root@nfs3 ~]# ps -eaf | grep nfs
root     20901 18085  0 14:59 pts/0    00:00:00 grep nfs
root     26369     1  0 12:27 ?        00:00:05 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid


[root@nfs3 ~]# pcs status
Cluster name: ganesha-ha-2
Last updated: Mon Apr 20 14:59:22 2015
Last change: Mon Apr 20 12:28:04 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
22 Resources configured


Online: [ nfs1 nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs_start-clone [nfs_start]
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs3 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs1 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs2 (unmanaged) 
     Stopped: [ nfs4 ]
 nfs1-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs2 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs2 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 

Failed actions:
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms


node 4,
######################################

[root@nfs4 ~]# ps -eaf | grep nfs
root     16073 27004  0 04:12 pts/0    00:00:00 grep nfs

[root@nfs4 ~]# pcs status
Cluster name: ganesha-ha-2
Last updated: Mon Apr 20 04:13:00 2015
Last change: Mon Apr 20 01:41:11 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
22 Resources configured


Online: [ nfs1 nfs2 nfs3 nfs4 ]

Full list of resources:

 Clone Set: nfs_start-clone [nfs_start]
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs3 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs1 (unmanaged) 
     nfs_start	(ocf::heartbeat:ganesha_nfsd):	FAILED nfs2 (unmanaged) 
     Stopped: [ nfs4 ]
 nfs1-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs2 nfs3 nfs4 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs2 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs2 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 

Failed actions:
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs3 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs1 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40001ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms
    nfs_start_stop_0 on nfs2 'unknown error' (1): call=20, status=Timed Out, last-rc-change='Mon Apr 20 12:27:09 2015', queued=0ms, exec=40002ms

Comment 3 Saurabh 2015-04-20 12:31:42 UTC

Created attachment 1016358 [details]
nfs-ganesha logs from nfs2

Comment 4 Saurabh 2015-04-20 12:32:49 UTC

Created attachment 1016359 [details]
nfs-ganesha logs from nfs3

Comment 5 Meghana 2015-04-23 07:22:12 UTC

Saurabh, can you make sure if you used VIP of the server to mount the volume on client? Without it, failover will invariably fail.

Comment 7 Red Hat Bugzilla 2023-09-14 02:58:13 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.