Bug 1426324 - common-ha: setup after teardown often fails
Summary: common-ha: setup after teardown often fails
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: common-ha
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: RHGS 3.2.0
Assignee: Kaleb KEITHLEY
QA Contact: surabhi
Depends On: 1426323
Blocks: 1351528
TreeView+ depends on / blocked
Reported: 2017-02-23 17:03 UTC by Kaleb KEITHLEY
Modified: 2017-03-23 06:05 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.8.4-16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1426323
Last Closed: 2017-03-23 06:05:39 UTC

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Kaleb KEITHLEY 2017-02-23 17:03:00 UTC
+++ This bug was initially created as a clone of Bug #1426323 +++

Description of problem:

Tomas Jelinek tells me in IRC

 there's no need to stop or remove nodes one by one if you use destroy --all

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 2 Kaleb KEITHLEY 2017-02-23 19:54:26 UTC
Note: we would only take the change
-    pcs cluster destroy
+    pcs cluster destroy --all

Comment 3 Ambarish 2017-02-24 04:11:43 UTC
Proposing this as a blocker post discussion with Kaleb.

Comment 4 Soumya Koduri 2017-02-24 06:40:04 UTC
I understand the fix may be the cleaner approach but would like to understand if it is blocker for this release at this point.

We do run --cleanup internally as part of "gluster nfs-ganesha disable" command which does teardown. The only issue was if the setup fails or comes in inconsistent state, admin needs to manually run --teardown & --cleanup prior to re-setup which is being documented as part of bug1399122 in Troubleshooting seection. Are those steps not enough?

Comment 5 Kaleb KEITHLEY 2017-02-24 22:39:28 UTC
It appears in testing that the loop at line 326 is not always run, thus the
  pcs cluster stop $server --force
  pcs cluster node remove $server
aren't being performed. As a result, they remain members of the cluster, and since cleanup only runs on $this node, the other nodes' membership in the cluster is "remembered" the next time a cluster is created.

Tom Jelinek, the pcs/pcsd developer says shutting down by stoping+removing one node at a time is problematic, e.g. quorum state could cause (unspecified) issues. He says the `pcs cluster destroy -all` is a better implementation.

Comment 6 Atin Mukherjee 2017-02-25 04:32:49 UTC
upstream patch : https://review.gluster.org/#/c/16737/

Comment 7 Atin Mukherjee 2017-03-01 08:51:10 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/98648/

Comment 9 surabhi 2017-03-09 15:51:58 UTC
Tried teardown with gluster nfs-ganesha disable and then enabling again on both RHEL6 and RHEL7 , don't see any issue with bringing up the cluster again.

With this fix there is no need to do manual cleanup after teardown and before enabling ganesha.
Marking the BZ verified.

Comment 11 errata-xmlrpc 2017-03-23 06:05:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.