Description of problem: ======================= While trying to verify bugs 1504702 and 1507984, the unavailability of repos was simulated to validate the import failure and the messages that are captured. The steps related to the above bugs were validated successfully. For proceeding, we cleaned up etcd database, restarted tendrl-node-agent on storage nodes and webadmin server (as directed by Nishanth). And we tried to import again, this time in a healthy environment - but it failed again with a traceback mentioning 'Atom execution failed. Error executing post run function: tendrl.objects.Cluster.atoms.ConfigureMonitoring'. Screenshot of the error messages and /var/log/messages is copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber> There is a possibility of stale entries going haywire. Either ways, there is a high probability that the cluster import would fail at our customer site for some config/setup related reason, or for not having followed all the right steps. The correctional steps would be taken, and cluster-import would be tried another time. The second or n'th attempt of cluster import should not fail because of things having gone wrong until n-1'th attempt. Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.8.4-52 tendrl-node-agent-1.5.4-2.el7rhgs.noarch tendrl-ansible-1.5.4-1.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-2.el7rhgs.noarch tendrl-api-1.5.4-2.el7rhgs.noarch tendrl-ui-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-2.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch tendrl-notifier-1.5.4-2.el7rhgs.noarch How reproducible: ================= Seen multiple times on the same setup. Nishanth did a cleanup another time, and it failed again.
Tendrl does not support retries on importing an already failed-to-import cluster. It will be supported in the next release. Currently, user needs to clean up Tendrl central store (etcd) and retry import. Steps documented here: https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#uninstall-tendrl
Tendrl will provide flows called "unmanage cluster" and "delete cluster", in case of a failed import, user must "delete cluster" and try to import cluster again.
Now if import fails due to issues like wrong repos set for glusterfs (required for dependency glusterfs-events), after failed import, the action is still allowed. User can opt to correct the underlying error and then un-manage + import can work out well from tendrl UI.
When reproducer from BZ 1570048 is applied and user fixes the problems and tries to import the cluster again, the import succeeds. The unmanage function (BZ 1526338) seems to work correctly as a way to correct cluster misconfiguration. --> VERIFIED Tested with: tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-7.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch tendrl-node-agent-1.6.3-7.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-4.el7rhgs.noarch
Looks fine
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616