Hide Forgot
Description of problem: ---------------------- I deployed a nfsganesha+gpnas setup from colonizer successfully. When a node is rebooted the VIP failovers to the one of the remaining nodes, as expected. However, once the node is up, the VIP must failback over to the node. However, this is not happening for a ganesha cluster deployed using colonizer Seem like the mistake is due to the fact that we are making multiple duplicate entries in below conf file [root@g1-1 ~]# #tail /run/gluster/shared_storage/nfs-ganesha/ganesha.conf %include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" %include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" %include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" %include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" We must be making only one entry Version-Release number of selected component (if applicable): ---------------- colonizer-1.1-2 How reproducible: -------------- always Steps to Reproduce: 1.setup a gpnas+ganesha deployment on 4 node through colonizer 2.after setup is successful, mount the arbiter volume on nfs protocol using one of the vips (say of node3) 3.now reboot node3 Actual results: -------------- the vip fails over to another node successfully, but doenst failback to node3 post it coming online
(In reply to nchilaka from comment #0) > Seem like the mistake is due to the fact that we are making multiple > duplicate entries in below conf file > > > [root@g1-1 ~]# #tail /run/gluster/shared_storage/nfs-ganesha/ganesha.conf > %include > "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" > %include > "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" > %include > "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" > %include > "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.gluster1.conf" I can't reproduce this situation in my lab -- there is only a single include entry in the ganesha.conf file after a NFS deployment on the GP-NAS configuration. I also don't see what in the gluster-colonizer could lead to this. Adding that include line is not an operation performed by the colonizer plays, but rather the result of the standard nfs-ganesha enablement. It's conceivably possible, though, that a play that should execute on only one node is instead executing on multiple nodes, leading to this result. Can you provide the gluster-colonizer.log file from a run that included this effect of multiple entries in the ganesha.conf file?
Lab tests also show that fail-back is working when the shutdown node comes back online. We'll need more information about the problem specifics to triage this further.
This has popped up in another physical lab. I've tried to step through the the playbooks to watch for where the problem happens, but when I do, it doesn't happen at all. So the reproducer for the problem is elusive. One option is to simply proactively mitigate the problem with an additional play or two.