Bug 1514442 - Successive attempts to import the same cluster on the same webadmin server fail
Summary: Successive attempts to import the same cluster on the same webadmin server fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Shubhendu Tripathi
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks: 1502877 1503134 1516135
TreeView+ depends on / blocked
 
Reported: 2017-11-17 12:39 UTC by Sweta Anandpara
Modified: 2018-09-04 06:59 UTC (History)
5 users (show)

Fixed In Version: tendrl-ansible-1.6.1-2.el7rhgs.noarch.rpm, tendrl-api-1.6.1-1.el7rhgs.noarch.rpm, tendrl-commons-1.6.1-1.el7rhgs.noarch.rpm, tendrl-monitoring-integration-1.6.1-1.el7rhgs.noarch.rpm, tendrl-node-agent-1.6.1-1.el7, tendrl-ui-1.6.1-1.el7rhgs.noarch.rpm,
Doc Type: Bug Fix
Doc Text:
Cause: Earlier if import cluster failed in RHGS-WA there was no way to trigger the same again from UI. User needed to clean the etcd details and fire import again. Consequence: Successive attempts to import the same clusrer used to be fail again and again Fix: Now with latest changes around import cluster and new feature to un-manage a cluster and import, if import cluster fails for a cluster due to invalid repos configured in storage nodes for installation of components, there is chance for user to correct the issues in underlying node and then fire re-import for the cluster. Also un-manage cluster flow helps in this flow, as the cluster can be un-managed and then re-imported. Now import job gets successful only if all the nodes report all required components installed and up and running on them and first round of sync done for the node. If any nodes fails to report the same import cluster job fails. User can now correct the issues reported in this case and execute re-import of cluster from RHGS-WA UI. Result: If import fails now, user can correct the issues on underlying nodes and execute re-import for the cluster.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:58:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github Tendrl node-agent issues 662 0 None closed When cluster import task fails and the cluster is shown as ok in Tendrl UI, the problem is not reported anywhere else or... 2020-03-05 03:07:57 UTC
Red Hat Bugzilla 1570048 0 unspecified CLOSED unmanaged task always fails after import failure 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2018:2616 0 None None None 2018-09-04 06:59:57 UTC

Internal Links: 1570048

Description Sweta Anandpara 2017-11-17 12:39:23 UTC
Description of problem:
=======================
While trying to verify bugs 1504702 and 1507984, the unavailability of repos was simulated to validate the import failure and the messages that are captured. The steps related to the above bugs were validated successfully.

For proceeding, we cleaned up etcd database, restarted tendrl-node-agent on storage nodes and webadmin server (as directed by Nishanth). And we tried to import again, this time in a healthy environment - but it failed again with a traceback mentioning 'Atom execution failed. Error executing post run function: tendrl.objects.Cluster.atoms.ConfigureMonitoring'.

Screenshot of the error messages and /var/log/messages is copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>

There is a possibility of stale entries going haywire. 

Either ways, there is a high probability that the cluster import would fail at our customer site for some config/setup related reason, or for not having followed all the right steps. The correctional steps would be taken, and cluster-import would be tried another time. The second or n'th attempt of cluster import should not fail because of things having gone wrong until n-1'th attempt.

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.8.4-52
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch


How reproducible:
=================
Seen multiple times on the same setup.
Nishanth did a cleanup another time, and it failed again.

Comment 2 Rohan Kanade 2017-11-20 07:25:37 UTC
Tendrl does not support retries on importing an already failed-to-import cluster. It will be supported in the next release. Currently, user needs to clean up Tendrl central store (etcd) and retry import. 

Steps documented here: https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#uninstall-tendrl

Comment 3 Rohan Kanade 2018-01-22 13:07:24 UTC
Tendrl will provide  flows called "unmanage cluster" and "delete cluster", in case of a failed import, user must "delete cluster" and try to import cluster again.

Comment 5 Shubhendu Tripathi 2018-03-05 02:37:24 UTC
Now if import fails due to issues like wrong repos set for glusterfs (required for dependency glusterfs-events), after failed import, the action is still allowed. User can opt to correct the underlying error and then un-manage + import can work out well from tendrl UI.

Comment 6 Filip Balák 2018-06-26 08:11:25 UTC
When reproducer from BZ 1570048 is applied and user fixes the problems and tries to import the cluster again, the import succeeds.
The unmanage function (BZ 1526338) seems to work correctly as a way to correct cluster misconfiguration. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch

Comment 8 Shubhendu Tripathi 2018-09-04 02:56:06 UTC
Looks fine

Comment 10 errata-xmlrpc 2018-09-04 06:58:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616


Note You need to log in before you can comment on or make changes to this bug.