1426324 – common-ha: setup after teardown often fails

Bug 1426324 - common-ha: setup after teardown often fails

Summary: common-ha: setup after teardown often fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	common-ha
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kaleb KEITHLEY
QA Contact:	surabhi
Docs Contact:
URL:
Whiteboard:
Depends On:	1426323
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2017-02-23 17:03 UTC by Kaleb KEITHLEY
Modified:	2017-03-23 06:05 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.8.4-16
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1426323
Environment:
Last Closed:	2017-03-23 06:05:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Kaleb KEITHLEY 2017-02-23 17:03:00 UTC

+++ This bug was initially created as a clone of Bug #1426323 +++

Description of problem:

Tomas Jelinek tells me in IRC

 there's no need to stop or remove nodes one by one if you use destroy --all

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Kaleb KEITHLEY 2017-02-23 19:54:26 UTC

Note: we would only take the change
-    pcs cluster destroy
+    pcs cluster destroy --all

Comment 3 Ambarish 2017-02-24 04:11:43 UTC

Proposing this as a blocker post discussion with Kaleb.

Comment 4 Soumya Koduri 2017-02-24 06:40:04 UTC

I understand the fix may be the cleaner approach but would like to understand if it is blocker for this release at this point.

We do run --cleanup internally as part of "gluster nfs-ganesha disable" command which does teardown. The only issue was if the setup fails or comes in inconsistent state, admin needs to manually run --teardown & --cleanup prior to re-setup which is being documented as part of bug1399122 in Troubleshooting seection. Are those steps not enough?

Comment 5 Kaleb KEITHLEY 2017-02-24 22:39:28 UTC

It appears in testing that the loop at line 326 is not always run, thus the
  ...
  pcs cluster stop $server --force
  pcs cluster node remove $server
  ...
aren't being performed. As a result, they remain members of the cluster, and since cleanup only runs on $this node, the other nodes' membership in the cluster is "remembered" the next time a cluster is created.

Tom Jelinek, the pcs/pcsd developer says shutting down by stoping+removing one node at a time is problematic, e.g. quorum state could cause (unspecified) issues. He says the `pcs cluster destroy -all` is a better implementation.

Comment 6 Atin Mukherjee 2017-02-25 04:32:49 UTC

upstream patch : https://review.gluster.org/#/c/16737/

Comment 7 Atin Mukherjee 2017-03-01 08:51:10 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/98648/

Comment 9 surabhi 2017-03-09 15:51:58 UTC

Tried teardown with gluster nfs-ganesha disable and then enabling again on both RHEL6 and RHEL7 , don't see any issue with bringing up the cluster again.

With this fix there is no need to do manual cleanup after teardown and before enabling ganesha.
Marking the BZ verified.

Comment 11 errata-xmlrpc 2017-03-23 06:05:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.