Created attachment 1426621 [details] logs and configuration files Description of problem: Sometimes cluster import fails with error `Import existing Gluster Cluster` when imported without specified Cluster Name. It happens on a fresh setup where import didn't happen yet. Example of errors: ``` error Failure in Job 2bbae246-35c9-4931-9d16-72de102f2169 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 213, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 123, in run raise ex AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.SetupClusterAlias 25 Apr 2018 02:41:46 error Failed post-run: tendrl.objects.Cluster.atoms.SetupClusterAlias for flow: Import existing Gluster Cluster 25 Apr 2018 02:41:46 error Setting up cluster aliasnot yet complete. Timing out. (5d8640f5-8d33-42f5-a11e-bd35e2758fa3) 25 Apr 2018 02:41:46 ``` Version-Release number of selected component (if applicable): glusterfs-3.12.2-8.el7rhgs.x86_64 tendrl-ansible-1.6.3-2.el7rhgs.noarch tendrl-api-1.6.3-1.el7rhgs.noarch tendrl-api-httpd-1.6.3-1.el7rhgs.noarch tendrl-commons-1.6.3-2.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-1.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-1.el7rhgs.noarch tendrl-node-agent-1.6.3-2.el7rhgs.noarch tendrl-notifier-1.6.3-2.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-1.el7rhgs.noarch How reproducible: 20% Steps to Reproduce: 1. Install tendrl. 2. Import cluster, don't set Cluster Name parameter. 3. Open task detail. Actual results: In some cases there happens to appear an error and import fails. Expected results: Import should succeed. Additional info: In attachment are logs and configuration files from setup where this happened collected by https://github.com/usmqe/usmqe-setup/blob/master/qe_evidence.tendrl.yml
Created attachment 1426622 [details] task page
@filip, Can you check whether tendrl-monitoring-integration service is running when this happens?
Yes, tendrl-monitoring-integration is running without any error.
Problem is monitoring-integration sync create an alias using integration_id even short_name is exist. so import cluster flow has already created an alias using short_name. so for the same cluster, two alias exists. During un-manage flow alias which is created using short_name is deleted but alias which is created using integration_id is not deleted. When we try to import cluster without integration_id is try to create an alias using integration_id https://github.com/Tendrl/monitoring-integration/blob/master/tendrl/monitoring_integration/flows/setup_cluster_alias/__init__.py#L27. So already alias exists so this will raise an exception. And this exception is not handled so job thread in monitoring-integration failed. So job status is not updated so job timeout.
This issue is fixed
filip fix which I did is a similar issue like this, but from fresh machine, I can't reproduce this issue. Can you please reproduce this again.
I sent you a PM with access to hosts with reproduced problem.
I think this is an issue related to fqdn in node_context and integration_id in tendrl_contexct None for some time. so everyone created a job for others. This issue is fixed in upstream. please verify this with next build. please verify once with next release. because i am 100% sure but i saw the filip machine this is the conclusion i came from that machine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616