Bug 1426324

Summary: common-ha: setup after teardown often fails
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Kaleb KEITHLEY <kkeithle>
Component: common-haAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED ERRATA QA Contact: surabhi <sbhaloth>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asoman, bturner, bugs, jthottan, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-16 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1426323 Environment:
Last Closed: 2017-03-23 06:05:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1426323    
Bug Blocks: 1351528    

Description Kaleb KEITHLEY 2017-02-23 17:03:00 UTC
+++ This bug was initially created as a clone of Bug #1426323 +++

Description of problem:

Tomas Jelinek tells me in IRC

 there's no need to stop or remove nodes one by one if you use destroy --all

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Kaleb KEITHLEY 2017-02-23 19:54:26 UTC
Note: we would only take the change
-    pcs cluster destroy
+    pcs cluster destroy --all

Comment 3 Ambarish 2017-02-24 04:11:43 UTC
Proposing this as a blocker post discussion with Kaleb.

Comment 4 Soumya Koduri 2017-02-24 06:40:04 UTC
I understand the fix may be the cleaner approach but would like to understand if it is blocker for this release at this point.

We do run --cleanup internally as part of "gluster nfs-ganesha disable" command which does teardown. The only issue was if the setup fails or comes in inconsistent state, admin needs to manually run --teardown & --cleanup prior to re-setup which is being documented as part of bug1399122 in Troubleshooting seection. Are those steps not enough?

Comment 5 Kaleb KEITHLEY 2017-02-24 22:39:28 UTC
It appears in testing that the loop at line 326 is not always run, thus the
  ...
  pcs cluster stop $server --force
  pcs cluster node remove $server
  ...
aren't being performed. As a result, they remain members of the cluster, and since cleanup only runs on $this node, the other nodes' membership in the cluster is "remembered" the next time a cluster is created.

Tom Jelinek, the pcs/pcsd developer says shutting down by stoping+removing one node at a time is problematic, e.g. quorum state could cause (unspecified) issues. He says the `pcs cluster destroy -all` is a better implementation.

Comment 6 Atin Mukherjee 2017-02-25 04:32:49 UTC
upstream patch : https://review.gluster.org/#/c/16737/

Comment 7 Atin Mukherjee 2017-03-01 08:51:10 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/98648/

Comment 9 surabhi 2017-03-09 15:51:58 UTC
Tried teardown with gluster nfs-ganesha disable and then enabling again on both RHEL6 and RHEL7 , don't see any issue with bringing up the cluster again.

With this fix there is no need to do manual cleanup after teardown and before enabling ganesha.
Marking the BZ verified.

Comment 11 errata-xmlrpc 2017-03-23 06:05:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html