1516211 – describe in detail what to do when ImportCluster task fails

Bug 1516211 - describe in detail what to do when ImportCluster task fails

Summary: describe in detail what to do when ImportCluster task fails

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	doc-RHGS_Web_Administration
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.1
Assignee:	Rakesh
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1513003 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-22 09:49 UTC by Martin Bukatovic
Modified:	2018-05-30 17:59 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-30 17:59:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl documentation issues 94	None	None	None	2017-11-24 09:57:43 UTC
Github	Tendrl node-agent issues 662	None	None	None	2017-11-24 09:58:25 UTC
Red Hat Bugzilla	1514178	unspecified	CLOSED	document that task details page (on other details) are no longer available after few days	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1516135	unspecified	CLOSED	When import fails, the import button should be accessible only after unmanage	2021-12-10 15:25:43 UTC
Red Hat Bugzilla	1517065	unspecified	CLOSED	[Web-Admin] Add a new Chapter describing "Cluster Expansion"	2021-02-22 00:41:40 UTC

Internal Links: 1514178 1516135 1517065

Description Martin Bukatovic 2017-11-22 09:49:55 UTC

Document URL
============

Related to documentation for RHGS WA.

Describe the issue
==================

Since Tendrl can't recover from import cluster failure[1], we need to document
what to do when one is stuck with unimportable cluster.

There are multiple different aspects of this problem we need to tackle:

* ImportCluster failed for 1st cluster I'm trying to import
* ImportCluster failed, but I already have another cluster imported

* ImportCluster failed, and I see the reason why in events log (in
  Task Details page) in   some of error messages there
* ImportCluster failed, but I don't see the reason in the events log
  of task details page

[1] as described in BZ 1516135

Suggestions for improvement
===========================

Given the current limitation, describe what to do in all use cases listed
above, with additional note for some peculiar combinations if needed.

QE team would need to retry the scenarios during verification.

We are not aware of any option of recovery for all use cases listed above as
of today. The only description available[2] so far states:

> In case of failed imports as of today, users should refer Tendrl
> documentation (TODO) for cleaning up Tendrl central store and re-trying
> the Import after that.

The TODO item listed above probably refers to this section in upstream docs:

https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#uninstall-tendrl

which doesn't provide full list of:

* services to be stopped
* packages to be removed
* directories/files to be deleted
* notes a possibility of backup of etcd, and removing etcd database, but
  didn't discuss how could we restore the backup (or if that is possible)

which means that this solution is applicable only to the 1st use case as
described above.

Moreover, we should also note that events log of task details page will be
cleaned in few days, as stated in BZ 1514178.

[2] https://github.com/Tendrl/node-agent/issues/662#issuecomment-345279345

Comment 2 Martin Bukatovic 2017-11-22 09:52:12 UTC

bobbGT, feel free to reassign this BZ to a correct component

Comment 3 Martin Bukatovic 2017-11-22 09:53:29 UTC

Could you provide recovery details for all use cases listed in this BZ?

The work on this BZ is blocked until this information is provided.

Comment 4 Lubos Trilety 2017-11-24 09:26:07 UTC

*** Bug 1513003 has been marked as a duplicate of this bug. ***

Comment 7 Rakesh 2017-12-11 22:00:03 UTC

Moving to ON_QA to follow BZ: 1502877.

Comment 8 Lubos Trilety 2017-12-12 09:10:32 UTC

The steps described in the monitoring guide
https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/monitoring_guide/#troubleshooting
are pretty easy as it simply says uninstall RHGSWA and install it again for all mentioned scenarios. It'll work, but for example if there is already some cluster imported and functional it means RHGSWA lost it too, because one of uninstall RHGSWA/Unmanage Cluster step is removing data on RHGSWA server and second remove etcd completely. Both are mentioned as optional, but for failed import, I am pretty sure they has to be done, at least the removing etcd one.

Other issues are already listed in Bug 1502877.

Comment 18 Nishanth Thomas 2017-12-15 09:21:01 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1526338

Comment 19 Pratik Mulay 2017-12-15 10:14:48 UTC

Hi Team,

I've made the required change. Following is the link to the updated content:

https://doc-stage.usersys.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/monitoring_guide/#unmanaging_cluster

Let me know in case of any concerns.

Comment 21 Lubos Trilety 2017-12-15 11:10:31 UTC

All demands in this BZ were processed properly. There's a note about un-managing one cluster will result in all the clusters currently managed by Web Administration to be un-managed.

Note You need to log in before you can comment on or make changes to this bug.