Description of problem: In most of the error log messages in import flow are very generic, It displays a big traceback with atom failed messages. But it is not specified why the atom is failed. With this error message user unable to pinpoint of the failure. ## import failed Failure in Job 0845ffb1-4d53-4a6c-9e18-3ed0a72c1ce5 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.SetupClusterAlias\n' Failed post-run: tendrl.objects.Cluster.atoms.SetupClusterAlias for flow: Import existing Gluster Cluster Version-Release number of selected component (if applicable): tendrl-commons-1.6.3-17.el7rhgs.noarch How reproducible: 100% Steps to Reproduce: 1. Create gluster cluster 2. Install RHGSWA via tendrl-ansible 3. Stop tendrl-monitoring-integration service in a server 4. Try to import the cluster Actual results: Import failed with some huge traceback info Expected results: Need a specific log message that shows why import is failed Additional info:
Quick list of related bugs (this is not a complete list) based on the sheer title of this bug (Import flow failure related log messages should be more specific about what went wrong): #1647322 WA should detect and report problems with carbon initialization #1647909 Import fails when WA is not updated #1616005 Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated #1612096 Import cluster with bricks down failed #1602858 Root cause of problem with import cluster job failure needs to be identified #1599375 Error executing pre run function: tendrl.objects.Cluster.atoms.Check Cluster Nodes Up #1589820 Non descriptive Import Cluster failure: Atom Execution failed #1589801 no error reported by WA ui when importing cluster without free disk space on /var/lib/carbon partition #1583713 No dashboards when cluster is imported on second attempt #1686888 import cluster fails after timeout without clear indication what went wrong #1686855 Task messages are not informative
The reproducer in this BZ is the same as in linked BZ 1686888. What is the purpose of this BZ?
Added pre-atom in import and unmanage cluster flow to check all required services are running: https://github.com/Tendrl/commons/pull/1081 https://github.com/Tendrl/commons/pull/1083 https://github.com/Tendrl/monitoring-integration/pull/594 Assigning ownership for carbon user while creating an alias: PR: https://github.com/Tendrl/monitoring-integration/pull/596
Created attachment 1577798 [details] Screenshot with correct message for unmanage cluster
Created attachment 1577799 [details] Screenshot with correct message for import cluster
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3251