Bug 1616005 - Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated
Summary: Repeated Import (and Unmanage) fails: Timing out import job, Cluster data sti...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-node-agent
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nishanth Thomas
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-14 17:21 UTC by Daniel Horák
Modified: 2019-08-13 12:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-13 12:49:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github Tendrl commons issues 1052 0 None closed Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated 2020-06-04 14:58:09 UTC
Red Hat Bugzilla 1559387 0 unspecified CLOSED Back to back import and unmanage cluster multiple time resuts in a situation where import is complete but not marked cor... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1643816 0 high CLOSED [GSS]Import and Unmanage fails with error: Timing out import job, Cluster data still not fully updated 2022-03-13 15:53:40 UTC

Internal Links: 1559387 1643816

Description Daniel Horák 2018-08-14 17:21:50 UTC
Description of problem:
  Repeating Import and Unmanage finally fails during the Import task with
  following errors:

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  error
  Child jobs failed are [u'824ed251-a2c6-4626-b47a-02d38ea5f2c3']
  14 Aug 2018 06:07:29

  error
  Failure in Job 824ed251-a2c6-4626-b47a-02d38ea5f2c3 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone\n']
  14 Aug 2018 06:07:29

  error
  Failed post-run: tendrl.objects.Cluster.atoms.CheckSyncDone for flow: Import existing Gluster Cluster
  14 Aug 2018 06:07:29

  error
  Timing out import job, Cluster data still not fully updated (node: 5024965a-4068-40c5-9568-4d6c33bdbcb2) (integration_id: ce52a06d-e291-477d-8288-23cf61458e63)
  14 Aug 2018 06:07:29
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  This issue was found during testing of Bug 1559387.

Version-Release number of selected component (if applicable):
  RHGS WA Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  tendrl-ansible-1.6.3-6.el7rhgs.noarch
  tendrl-api-1.6.3-5.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
  tendrl-commons-1.6.3-11.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-9.el7rhgs.noarch

  Gluster Storage Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  Red Hat Gluster Storage Server 3.4.0
  tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-commons-1.6.3-11.el7rhgs.noarch
  tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch

How reproducible:
  It is 100% reproducible, but it takes time.
  It takes usually 10-20 cycles of import and unmanage.

Steps to Reproduce:
1. Prepare Gluster storage cluster
  in my case: 6 Storage nodes, 2 volumes  
2. Install RHGS WA
3. Import Gluster cluster into RHGS WA
4. Wait some time (around 50 minutes in my case)
5. Unmanage the cluster from RHGS WA
6. Wait 10 minutes.
7. Repeat steps 3 - 6.

Actual results:
  After number of cycles, the Import cluster fails with the error:
    Timing out import job, Cluster data still not fully updated 

Expected results:
  Import cluster should pass.

Additional info:
  I've used our test_cluster_unmanage_valid[1] test to automate the reproduction
  with following simple bash script used for execution of the tests:

# sleep 3000; date | tee -a logs/stdout.log; while ( set -o pipefail; python3 -m pytest usmqe_tests/api/gluster/test_gluster_cluster.py -k test_cluster_unmanage_valid 2>&1 | tee -a logs/stdout.log); do sleep 3000;  date | tee -a logs/stdout.log; done

Accordingly to Gowtham, it might be another scenario for Bug 1612096.
If that's the case, this bug will be mainly for QE to test this scenario.

[1] https://github.com/usmqe/usmqe-tests/blob/master/usmqe_tests/api/gluster/test_gluster_cluster.py#L177

Comment 3 gowtham 2018-08-28 08:21:09 UTC
PR is under review https://github.com/Tendrl/commons/pull/1053


Note You need to log in before you can comment on or make changes to this bug.