1370090 – [GSS] - Unable to Failover Gluster NFS with CTDB

Bug 1370090 - [GSS] - Unable to Failover Gluster NFS with CTDB

Summary: [GSS] - Unable to Failover Gluster NFS with CTDB

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	ctdb
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1371178
Blocks:	1408949 RHGS-3.4-GSS-proposed-tracker
TreeView+	depends on / blocked

Reported:	2016-08-25 09:51 UTC by Mukul Malhotra
Modified:	2022-03-13 14:05 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-12-03 12:52:21 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mukul Malhotra 2016-08-25 09:51:09 UTC

Description of problem:

There are couple of customers who are currently testing gluster nfs failover with CTDB in RHEL6 & RHEL7 but the complete gluster nfs failover configuration with CTDB is not mentioned in the Admin guide & not sure if this feature is completely tested by QE so I wanted to prioritised this bz to engineering.

As I am currently working on a customer case with similar requirement & tested this feature in-house.

Here is the configuration steps :

RHEL 6 (Gluster nfs with CTDB):
 
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.8 (Santiago)
 
# cat /etc/redhat-storage-release
Red Hat Gluster Storage Server 3.1 Update 3
 
# rpm -qa ctdb
ctdb-4.4.3-8.el6rhs.x86_64
 
Tested with two scenarios:

1). Without adding "CTDB_MANAGES_NFS=yes" and NFS_HOSTNAME="nfs_ctdb" parameter in /etc/sysconfig/nfs file
 
i. Among two, only one node was healthy(OK). Other one was UNHEALTHY.
 
ii. Even though one node was unhealthy, the failover took place without any problem when node was down.
 
iii. When failover took place, the status of node turned into Healthy(OK).
 
iv. When nfs process was killed manually, the failover didn't take place.

-------------------------------------------------------------------------------------------------------------
 
2). Adding CTDB_MANAGES_NFS=yes and NFS_HOSTNAME="nfs_ctdb" parameter in /etc/sysconfig/nfs
 
i. Among two, only one node was healthy(OK). Other one was UNHEALTHY.
 
ii. Even though one node was unhealthy, the failover took place without any problem when node was down.
 
iii. When failover took place, the status of node turned into Healthy(OK).
 
iv. When nfs process was killed manually, the failover didn't take place.

==============================================

RHEL 7 (Gluster nfs with CTDB):
 
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
 
# cat /etc/redhat-storage-release
Red Hat Gluster Storage Server 3.1 Update 3

# rpm -qa ctdb
ctdb-4.4.3-8.el7rhgs.x86_64
 
Tested with two scenarios:

1. Without adding CTDB_MANAGES_NFS=yes parameter in /etc/sysconfig/nfs

i. All nodes were Healthy. There was public IP running on one of the node.
 
ii. On client side, mounted volume with NFS (vers=3)
 
iii. When nfs process was killed, then client goes into stale.

But even after some time, it doesn't work out. As gluster NFS service is not monitored by CTDB so failover doesn't takes place.
 
iv. Even though nfs service was killed on one of the node, the cdtb status was showing all nodes as HEALTHY.
 
v. Client starts working only after restarting glusterd daemon or after rebooting the node.
 
======================================================
 
2). Adding CTDB_MANAGES_NFS=yes parameter in /etc/sysconfig/nfs
 
i. All nodes were Healthy. There was public IP running on one of the node.
 
ii. On client side, mounted volume with NFS (vers=3).
 
iii. When nfs process was killed, then client goes into hung state.
 
iv. Failover of public IP took place within approximately 30 seconds. After that, client was working fine.
 
Version-Release number of selected component (if applicable):

RHGS 3.1.3
ctdb-4.4.3-8.el6rhs.x86_64

How reproducible:

Gluster nfs failover working on RHEL 7 but not on RHEL 6

Actual results:

Gluster nfs node failover as well as nfs process failover not working on RHEL 6 

Expected results:

Gluster nfs failover & process failover should work on RHEL 6

1. Require complete configuration for configuring gluster nfs with CTDB after verified by QE

2. Require tuning parameter to improve the failover time which is currently 30 sec

Additional info:

Thanks
Mukul

Comment 5 Mukul Malhotra 2016-08-25 14:46:01 UTC

Neils/Surabhi,

During my test on RHEL6, I had also observed that after applying CTDB_MANAGES_NFS=yes parameter in "/etc/sysconfig/nfs" or "/etc/sysconfig/ctdb" the failover state does not change & failover only works when a node is rebooted & not with gnfs process failover.

Mukul

Comment 7 Mukul Malhotra 2016-08-26 10:03:13 UTC

Surabhi,

RHEL 7:

#  grep -v ^# /etc/sysconfig/nfs
RPCNFSDARGS=""
RPCMOUNTDOPTS=""
STATDARG=""
SMNOTIFYARGS=""
RPCIDMAPDARGS=""
RPCGSSDARGS=""
GSS_USE_PROXY="yes"
RPCSVCGSSDARGS=""
BLKMAPDARGS=""
CTDB_MANAGES_NFS=yes

---------------------------------------

RHEL 6:

# grep -v ^# /etc/sysconfig/nfs
CTDB_MANAGES_NFS=yes

Thanks
Mukul

Comment 9 Mukul Malhotra 2016-08-26 15:21:42 UTC

Thanks Surabhi

>The node reboot and shutdown cases works fine.

Yes, which is already been tested & it was working. So, the "CTDB_MANAGES_NFS=yes" parameter does not make any difference as Its for kernel nfs.

>For ctdb to monitor gluster-nfs process, there might be additional configuration or settings needs to be done which I am not aware of atm.

Ok, this is the primary concern & require guidlines or configuration steps after verified by QE which I can provide to the customer.

Thanks
Mukul

Comment 10 Niels de Vos 2016-08-28 09:46:30 UTC

(In reply to Mukul Malhotra from comment #9)
...
> >For ctdb to monitor gluster-nfs process, there might be additional configuration or settings needs to be done which I am not aware of atm.
> 
> Ok, this is the primary concern & require guidlines or configuration steps
> after verified by QE which I can provide to the customer.

I'm still missing a pointer to the script that CTDB uses to monitor the NFS-server. It should be sufficient for such a script to check the output of 'showmount -e localhost', as this is handled by the same process as the actual NFSv3 operations.

In general, the CTDB configuration for Gluster/NFS is less mature than the newer NFS-Ganesha integration with pacemaker. Gluster/NFS is going to be deprecated in favor of NFS-Ganesha & pacemaker, any problems or questions about that solution need to be reported and addressed (file other bugs for them).

Could you let us know if all your questions/concerns have been addressed? If there is something missing, please let me know. Otherwise you can close this :)

Comment 19 Mukul Malhotra 2016-08-29 13:54:01 UTC

Thanks Michael.

I had opened RFE, bz#1371178 to extend the gnfs process failover capability with ctdb.

Mukul

Note You need to log in before you can comment on or make changes to this bug.