1425919 – [GANESHA] Adding a node to existing ganesha cluster is failing on rhel 6.9

Bug 1425919 - [GANESHA] Adding a node to existing ganesha cluster is failing on rhel 6.9

Summary: [GANESHA] Adding a node to existing ganesha cluster is failing on rhel 6.9

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	common-ha
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Assignee:	Jiffin
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1425748
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-22 18:28 UTC by Jiffin
Modified:	2017-03-06 17:46 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:	1425748
Environment:
Last Closed:	2017-03-06 17:46:49 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jiffin 2017-02-22 18:28:57 UTC

Description of problem:
Adding a node to existing 4 node ganesha cluster is failing  

Version-Release number of selected component (if applicable):
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.9 Beta (Santiago)

glusterfs-ganesha-3.8.4-14.el6rhs.x86_64


How reproducible:
Consistently

Steps to Reproduce:
1.Create a 4 Node ganesha cluster.
2.Perform pre-requisite for adding a node to existing cluster
3.Perform Add node from 1 of the node in existing cluster

#/usr/libexec/ganesha/ganesha-ha.sh --add /var/run/gluster/shared_storage/nfs-ganesha/ dhcp42-191.lab.eng.blr.redhat.com 10.70.42.135

PCS Status on 5th Node

# pcs status
Cluster name: ganesha-ha-360
WARNING: no stonith devices and stonith-enabled is not false
Stack: cman
Current DC: dhcp42-191.lab.eng.blr.redhat.com (version 1.1.15-5.el6-e174ec8) - partition WITHOUT quorum
Last updated: Wed Feb 22 20:18:59 2017		Last change: Wed Feb 22 20:13:27 2017 by root via crmd on dhcp42-191.lab.eng.blr.redhat.com

5 nodes and 0 resources configured

Node dhcp42-237.lab.eng.blr.redhat.com: UNCLEAN (offline)
Node dhcp43-151.lab.eng.blr.redhat.com: UNCLEAN (offline)
Node dhcp43-171.lab.eng.blr.redhat.com: UNCLEAN (offline)
Node dhcp43-235.lab.eng.blr.redhat.com: UNCLEAN (offline)
Online: [ dhcp42-191.lab.eng.blr.redhat.com ]

No resources


Daemon Status:
  cman: active/disabled
  corosync: active/disabled
  pacemaker: active/enabled
  pcsd: active/enabled

Actual results:
Add node is not successfull

Expected results:
Add node should be successfull

Additional info:

While running add node a warning is displayed to restart cluster after add node

============================
# /usr/libexec/ganesha/ganesha-ha.sh --add /var/run/gluster/shared_storage/nfs-ganesha/ dhcp42-191.lab.eng.blr.redhat.com 10.70.42.135
Starting ganesha.nfsd: [  OK  ]
Disabling SBD service...
dhcp42-191.lab.eng.blr.redhat.com: sbd disabled
dhcp42-237.lab.eng.blr.redhat.com: Corosync updated
dhcp43-151.lab.eng.blr.redhat.com: Corosync updated
dhcp43-235.lab.eng.blr.redhat.com: Corosync updated
dhcp43-171.lab.eng.blr.redhat.com: Corosync updated
Setting up corosync...
dhcp42-191.lab.eng.blr.redhat.com: Updated cluster.conf...
dhcp42-191.lab.eng.blr.redhat.com: Starting Cluster...
Synchronizing pcsd certificates on nodes dhcp42-191.lab.eng.blr.redhat.com...
dhcp42-191.lab.eng.blr.redhat.com: Success

Restarting pcsd on the nodes in order to reload the certificates...
dhcp42-191.lab.eng.blr.redhat.com: Success
Warning: Using udpu transport on a RHEL 6 cluster, cluster restart is required to apply node addition
dhcp42-191.lab.eng.blr.redhat.com: Starting Cluster...
Removing group: dhcp42-237.lab.eng.blr.redhat.com-group (and all resources within group)
Stopping all resources in group: dhcp42-237.lab.eng.blr.redhat.com-group...


==========================



--- Additional comment from Jiffin on 2017-02-22 05:58:56 EST ---

From the warning message "Warning: Using udpu transport on a RHEL 6 cluster, cluster restart is required to apply node addition"

it looks like regression caused by this change http://review.gluster.org/16122
Requesting Kaleb to have a look

Comment 1 Worker Ant 2017-02-22 18:29:24 UTC

REVIEW: https://review.gluster.org/16721 (ganesha/scripts : restart pcs cluster during add node) posted (#1) for review on release-3.10 by jiffin tony Thottan (jthottan)

Comment 2 Worker Ant 2017-02-23 10:31:03 UTC

REVIEW: https://review.gluster.org/16721 (ganesha/scripts : restart pcs cluster during add node) posted (#2) for review on release-3.10 by jiffin tony Thottan (jthottan)

Comment 3 Worker Ant 2017-02-23 10:56:04 UTC

REVIEW: https://review.gluster.org/16721 (ganesha/scripts : restart pcs cluster during add node) posted (#3) for review on release-3.10 by jiffin tony Thottan (jthottan)

Comment 4 Worker Ant 2017-02-23 15:27:19 UTC

COMMIT: https://review.gluster.org/16721 committed in release-3.10 by Kaleb KEITHLEY (kkeithle) 
------
commit 24eed72f7359db33df9fe4b02fe3f2d3ce4a5665
Author: Jiffin Tony Thottan <jthottan>
Date:   Wed Feb 22 18:30:53 2017 +0530

    ganesha/scripts : restart pcs cluster during add node
    
    In RHEL 6 due to this change https://review.gluster.org/#/c/16122/
    restart of HA cluster become requirement after adding a node to
    cluster
    After add node "pcs cluster node add <hostname>" following message is
    coming up :
    Warning: Using udpu transport on a RHEL 6 cluster, cluster restart is required to apply node addition
    
    Thanks Manisha for founding issue and suggesting the fix.
    
    Change-Id: I9e55d4ba04ed2555d27f26f71b95b8bd6a67f94c
    BUG: 1425919
    Signed-off-by: Jiffin Tony Thottan <jthottan>
    Reviewed-on: https://review.gluster.org/16721
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Tested-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: soumya k <skoduri>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 5 Shyamsundar 2017-03-06 17:46:49 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.