1566569 – nfs-ganesha not failing back post reboot on setup deployed by colonizer

Bug 1566569 - nfs-ganesha not failing back post reboot on setup deployed by colonizer

Summary: nfs-ganesha not failing back post reboot on setup deployed by colonizer

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-colonizer
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Dustin Black
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:	1551186
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-12 14:23 UTC by Nag Pavan Chilakam
Modified:	2018-11-20 05:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-20 05:34:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2018-04-12 14:23:42 UTC

Description of problem:
----------------------
I deployed a nfsganesha+gpnas setup from colonizer successfully.
When a node is rebooted the VIP failovers to the one of the remaining nodes, as expected.
However, once the node is up, the VIP must failback over to the node.
However, this is not happening for a ganesha cluster deployed using colonizer

Seem like the mistake is due to the fact that we are making multiple duplicate entries in below conf file


[root@g1-1 ~]# #tail /run/gluster/shared_storage/nfs-ganesha/ganesha.conf 
%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"

We must be making only one entry


Version-Release number of selected component (if applicable):
----------------
colonizer-1.1-2

How reproducible:
--------------
always

Steps to Reproduce:
1.setup a gpnas+ganesha deployment on 4 node through colonizer
2.after setup is successful, mount the arbiter volume on nfs protocol using one of the vips (say of node3)
3.now reboot node3

Actual results:
--------------
the vip fails over to another node successfully, but doenst failback to node3 post it coming online

Comment 2 Dustin Black 2018-04-13 17:59:18 UTC

(In reply to nchilaka from comment #0)
> Seem like the mistake is due to the fact that we are making multiple
> duplicate entries in below conf file
> 
> 
> [root@g1-1 ~]# #tail /run/gluster/shared_storage/nfs-ganesha/ganesha.conf 
> %include
> "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
> %include
> "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
> %include
> "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"
> %include
> "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf"

I can't reproduce this situation in my lab -- there is only a single include entry in the ganesha.conf file after a NFS deployment on the GP-NAS configuration.

I also don't see what in the gluster-colonizer could lead to this. Adding that include line is not an operation performed by the colonizer plays, but rather the result of the standard nfs-ganesha enablement. It's conceivably possible, though, that a play that should execute on only one node is instead executing on multiple nodes, leading to this result.

Can you provide the gluster-colonizer.log file from a run that included this effect of multiple entries in the ganesha.conf file?

Comment 3 Dustin Black 2018-04-13 18:07:45 UTC

Lab tests also show that fail-back is working when the shutdown node comes back online. We'll need more information about the problem specifics to triage this further.

Comment 6 Dustin Black 2018-06-20 18:06:56 UTC

This has popped up in another physical lab. I've tried to step through the the playbooks to watch for where the problem happens, but when I do, it doesn't happen at all. So the reproducer for the problem is elusive.

One option is to simply proactively mitigate the problem with an additional play or two.

Note You need to log in before you can comment on or make changes to this bug.