1616005 – Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated

Bug 1616005 - Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated

Summary: Repeated Import (and Unmanage) fails: Timing out import job, Cluster data sti...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-node-agent
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nishanth Thomas
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-14 17:21 UTC by Daniel Horák
Modified:	2019-08-13 12:49 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-13 12:49:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl commons issues 1052	None	closed	Repeated Import (and Unmanage) fails: Timing out import job, Cluster data still not fully updated	2020-06-04 14:58:09 UTC
Red Hat Bugzilla	1559387	unspecified	CLOSED	Back to back import and unmanage cluster multiple time resuts in a situation where import is complete but not marked cor...	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1643816	high	CLOSED	[GSS]Import and Unmanage fails with error: Timing out import job, Cluster data still not fully updated	2022-03-13 15:53:40 UTC

Internal Links: 1559387 1643816

Description Daniel Horák 2018-08-14 17:21:50 UTC

Description of problem:
  Repeating Import and Unmanage finally fails during the Import task with
  following errors:

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  error
  Child jobs failed are [u'824ed251-a2c6-4626-b47a-02d38ea5f2c3']
  14 Aug 2018 06:07:29

  error
  Failure in Job 824ed251-a2c6-4626-b47a-02d38ea5f2c3 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone\n']
  14 Aug 2018 06:07:29

  error
  Failed post-run: tendrl.objects.Cluster.atoms.CheckSyncDone for flow: Import existing Gluster Cluster
  14 Aug 2018 06:07:29

  error
  Timing out import job, Cluster data still not fully updated (node: 5024965a-4068-40c5-9568-4d6c33bdbcb2) (integration_id: ce52a06d-e291-477d-8288-23cf61458e63)
  14 Aug 2018 06:07:29
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  This issue was found during testing of Bug 1559387.

Version-Release number of selected component (if applicable):
  RHGS WA Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  tendrl-ansible-1.6.3-6.el7rhgs.noarch
  tendrl-api-1.6.3-5.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
  tendrl-commons-1.6.3-11.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-9.el7rhgs.noarch

  Gluster Storage Server:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  Red Hat Gluster Storage Server 3.4.0
  tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-commons-1.6.3-11.el7rhgs.noarch
  tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch
  tendrl-node-agent-1.6.3-9.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch

How reproducible:
  It is 100% reproducible, but it takes time.
  It takes usually 10-20 cycles of import and unmanage.

Steps to Reproduce:
1. Prepare Gluster storage cluster
  in my case: 6 Storage nodes, 2 volumes  
2. Install RHGS WA
3. Import Gluster cluster into RHGS WA
4. Wait some time (around 50 minutes in my case)
5. Unmanage the cluster from RHGS WA
6. Wait 10 minutes.
7. Repeat steps 3 - 6.

Actual results:
  After number of cycles, the Import cluster fails with the error:
    Timing out import job, Cluster data still not fully updated 

Expected results:
  Import cluster should pass.

Additional info:
  I've used our test_cluster_unmanage_valid[1] test to automate the reproduction
  with following simple bash script used for execution of the tests:

# sleep 3000; date | tee -a logs/stdout.log; while ( set -o pipefail; python3 -m pytest usmqe_tests/api/gluster/test_gluster_cluster.py -k test_cluster_unmanage_valid 2>&1 | tee -a logs/stdout.log); do sleep 3000;  date | tee -a logs/stdout.log; done

Accordingly to Gowtham, it might be another scenario for Bug 1612096.
If that's the case, this bug will be mainly for QE to test this scenario.

[1] https://github.com/usmqe/usmqe-tests/blob/master/usmqe_tests/api/gluster/test_gluster_cluster.py#L177

Comment 3 gowtham 2018-08-28 08:21:09 UTC

PR is under review https://github.com/Tendrl/commons/pull/1053

Note You need to log in before you can comment on or make changes to this bug.