Bug 1612096 - Import cluster with bricks down failed
Summary: Import cluster with bricks down failed
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-node-agent
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: gowtham
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-03 13:13 UTC by Filip Balák
Modified: 2019-05-08 19:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-08 19:40:38 UTC
Embargoed:


Attachments (Terms of Use)
Import job failed (143.66 KB, image/png)
2018-08-03 13:13 UTC, Filip Balák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github Tendrl node-agent issues 845 0 None None None 2018-08-06 17:44:18 UTC

Description Filip Balák 2018-08-03 13:13:12 UTC
Created attachment 1472982 [details]
Import job failed

Description of problem:
When some volume bricks are killed in cluster that is going to be imported, the import job fails with errors:

Failure in Job e1e03a97-35d9-4091-8453-4220d237cd54 Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 186, in run\n (atom_fqn, self._defs[\'help\'])\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster\n']
Failed atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster 
Child jobs failed are [u'30b4a3ff-8e66-4db2-a321-dd9d38cac87a'] 
Failure in Job 30b4a3ff-8e66-4db2-a321-dd9d38cac87a Flow tendrl.flows.ImportCluster with error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 240, in process_job the_flow.run() File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 131, in run exc_traceback) FlowExecutionFailedError: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py", line 98, in run\n super(ImportCluster, self).run()\n', ' File "/usr/lib/python2.7/site-packages/tendrl/commons/flows/__init__.py", line 227, in run\n "Error executing post run function: %s" % atom_fqn\n', 'AtomExecutionFailedError: Atom Execution failed. Error: Error executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone\n'] 
Failed post-run: tendrl.objects.Cluster.atoms.CheckSyncDone for flow: Import existing Gluster Cluster

It usually takes a few tries importing/unmanaging the cluster before the issue appears.

Version-Release number of selected component (if applicable):
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-11.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-9.el7rhgs.noarch

How reproducible:
40%

Steps to Reproduce:
1. Prepare 6 nodes cluster with 2 volumes.
2. Kill some bricks in both volumes.
3. Import cluster into WA.
4. If import succeeded, unmanage the cluster and import it again.
5. Repeat step 4 until issue appears.

Actual results:
Import fails executing post run function: tendrl.objects.Cluster.atoms.CheckSyncDone

Expected results:
Import should work and user should know that some bricks are down.

Additional info:

Comment 3 gowtham 2018-08-06 17:44:19 UTC
PR is under review https://github.com/Tendrl/node-agent/pull/846


Note You need to log in before you can comment on or make changes to this bug.