Bug 1516211

Summary: describe in detail what to do when ImportCluster task fails
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: doc-RHGS_Web_AdministrationAssignee: Rakesh <rghatvis>
Status: CLOSED CURRENTRELEASE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: asriram, mbukatov, nthomas, pmulay, rcyriac, rghatvis, rhinduja, rhs-bugs, sanandpa, sankarshan, srmukher, ssaha, storage-doc, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-30 17:59:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Bukatovic 2017-11-22 09:49:55 UTC
Document URL
============

Related to documentation for RHGS WA.

Describe the issue
==================

Since Tendrl can't recover from import cluster failure[1], we need to document
what to do when one is stuck with unimportable cluster.

There are multiple different aspects of this problem we need to tackle:

* ImportCluster failed for 1st cluster I'm trying to import
* ImportCluster failed, but I already have another cluster imported

* ImportCluster failed, and I see the reason why in events log (in
  Task Details page) in   some of error messages there
* ImportCluster failed, but I don't see the reason in the events log
  of task details page

[1] as described in BZ 1516135

Suggestions for improvement
===========================

Given the current limitation, describe what to do in all use cases listed
above, with additional note for some peculiar combinations if needed.

QE team would need to retry the scenarios during verification.

We are not aware of any option of recovery for all use cases listed above as
of today. The only description available[2] so far states:

> In case of failed imports as of today, users should refer Tendrl
> documentation (TODO) for cleaning up Tendrl central store and re-trying
> the Import after that.

The TODO item listed above probably refers to this section in upstream docs:

https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#uninstall-tendrl

which doesn't provide full list of:

* services to be stopped
* packages to be removed
* directories/files to be deleted
* notes a possibility of backup of etcd, and removing etcd database, but
  didn't discuss how could we restore the backup (or if that is possible)

which means that this solution is applicable only to the 1st use case as
described above.

Moreover, we should also note that events log of task details page will be
cleaned in few days, as stated in BZ 1514178.

[2] https://github.com/Tendrl/node-agent/issues/662#issuecomment-345279345

Comment 2 Martin Bukatovic 2017-11-22 09:52:12 UTC
bobbGT, feel free to reassign this BZ to a correct component

Comment 3 Martin Bukatovic 2017-11-22 09:53:29 UTC
Could you provide recovery details for all use cases listed in this BZ?

The work on this BZ is blocked until this information is provided.

Comment 4 Lubos Trilety 2017-11-24 09:26:07 UTC
*** Bug 1513003 has been marked as a duplicate of this bug. ***

Comment 7 Rakesh 2017-12-11 22:00:03 UTC
Moving to ON_QA to follow BZ: 1502877.

Comment 8 Lubos Trilety 2017-12-12 09:10:32 UTC
The steps described in the monitoring guide
https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/monitoring_guide/#troubleshooting
are pretty easy as it simply says uninstall RHGSWA and install it again for all mentioned scenarios. It'll work, but for example if there is already some cluster imported and functional it means RHGSWA lost it too, because one of uninstall RHGSWA/Unmanage Cluster step is removing data on RHGSWA server and second remove etcd completely. Both are mentioned as optional, but for failed import, I am pretty sure they has to be done, at least the removing etcd one.

Other issues are already listed in Bug 1502877.

Comment 18 Nishanth Thomas 2017-12-15 09:21:01 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1526338

Comment 19 Pratik Mulay 2017-12-15 10:14:48 UTC
Hi Team,

I've made the required change. Following is the link to the updated content:

https://doc-stage.usersys.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/monitoring_guide/#unmanaging_cluster

Let me know in case of any concerns.

Comment 21 Lubos Trilety 2017-12-15 11:10:31 UTC
All demands in this BZ were processed properly. There's a note about un-managing one cluster will result in all the clusters currently managed by Web Administration to be un-managed.