Bug 1612096

Summary: Import cluster with bricks down failed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Filip Balák <fbalak>
Component: web-admin-tendrl-node-agentAssignee: gowtham <gshanmug>
Status: CLOSED WONTFIX QA Contact: sds-qe-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: gshanmug, nthomas, rhs-bugs, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 19:40:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Import job failed none

Description Filip Balák 2018-08-03 13:13:12 UTC
Created attachment 1472982 [details]
Import job failed

Description of problem:
When some volume bricks are killed in cluster that is going to be imported, the import job fails with errors:

Failure in Job e1e03a97-35d9-4091-8453-4220d237cd54 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 186, in run\n (atom_fqn, self._defs[\'help\'])\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster\n']
Failed atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster 
Child jobs failed are [u'30b4a3ff-8e66-4db2-a321-dd9d38cac87a'] 
Failure in Job 30b4a3ff-8e66-4db2-a321-dd9d38cac87a Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone\n'] 
Failed post-run: tendrl.objects.Cluster.atoms.CheckSyncDone for flow: Import existing Gluster Cluster

It usually takes a few tries importing/unmanaging the cluster before the issue appears.

Version-Release number of selected component (if applicable):
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-11.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-9.el7rhgs.noarch

How reproducible:
40%

Steps to Reproduce:
1. Prepare 6 nodes cluster with 2 volumes.
2. Kill some bricks in both volumes.
3. Import cluster into WA.
4. If import succeeded, unmanage the cluster and import it again.
5. Repeat step 4 until issue appears.

Actual results:
Import fails executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone

Expected results:
Import should work and user should know that some bricks are down.

Additional info:

Comment 3 gowtham 2018-08-06 17:44:19 UTC
PR is under review https://github.com/Tendrl/node-agent/pull/846