Description of problem: Repeating Import and Unmanage finally fails during the Import task with following errors: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error Child jobs failed are [u'824ed251-a2c6-4626-b47a-02d38ea5f2c3'] 14 Aug 2018 06:07:29 error Failure in Job 824ed251-a2c6-4626-b47a-02d38ea5f2c3 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone\n'] 14 Aug 2018 06:07:29 error Failed post-run: tendrl.objects.Cluster.atoms.CheckSyncDone for flow: Import existing Gluster Cluster 14 Aug 2018 06:07:29 error Timing out import job, Cluster data still not fully updated (node: 5024965a-4068-40c5-9568-4d6c33bdbcb2) (integration_id: ce52a06d-e291-477d-8288-23cf61458e63) 14 Aug 2018 06:07:29 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This issue was found during testing of Bug 1559387. Version-Release number of selected component (if applicable): RHGS WA Server: Red Hat Enterprise Linux Server release 7.5 (Maipo) tendrl-ansible-1.6.3-6.el7rhgs.noarch tendrl-api-1.6.3-5.el7rhgs.noarch tendrl-api-httpd-1.6.3-5.el7rhgs.noarch tendrl-commons-1.6.3-11.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-9.el7rhgs.noarch Gluster Storage Server: Red Hat Enterprise Linux Server release 7.5 (Maipo) Red Hat Gluster Storage Server 3.4.0 tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-11.el7rhgs.noarch tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch How reproducible: It is 100% reproducible, but it takes time. It takes usually 10-20 cycles of import and unmanage. Steps to Reproduce: 1. Prepare Gluster storage cluster in my case: 6 Storage nodes, 2 volumes 2. Install RHGS WA 3. Import Gluster cluster into RHGS WA 4. Wait some time (around 50 minutes in my case) 5. Unmanage the cluster from RHGS WA 6. Wait 10 minutes. 7. Repeat steps 3 - 6. Actual results: After number of cycles, the Import cluster fails with the error: Timing out import job, Cluster data still not fully updated Expected results: Import cluster should pass. Additional info: I've used our test_cluster_unmanage_valid[1] test to automate the reproduction with following simple bash script used for execution of the tests: # sleep 3000; date | tee -a logs/stdout.log; while ( set -o pipefail; python3 -m pytest usmqe_tests/api/gluster/test_gluster_cluster.py -k test_cluster_unmanage_valid 2>&1 | tee -a logs/stdout.log); do sleep 3000; date | tee -a logs/stdout.log; done Accordingly to Gowtham, it might be another scenario for Bug 1612096. If that's the case, this bug will be mainly for QE to test this scenario. [1] https://github.com/usmqe/usmqe-tests/blob/master/usmqe_tests/api/gluster/test_gluster_cluster.py#L177
PR is under review https://github.com/Tendrl/commons/pull/1053