1584593 – Import of 3 Node (100 Vols & 300 bricks) fails with Time-Out Error

Bug 1584593 - Import of 3 Node (100 Vols & 300 bricks) fails with Time-Out Error

Summary: Import of 3 Node (100 Vols & 300 bricks) fails with Time-Out Error

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-gluster-integration
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	gowtham
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-31 09:31 UTC by Shekhar Berry
Modified:	2019-05-08 17:43 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-08 15:45:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Screenshot of import failing due to time out (106.50 KB, image/png) 2018-05-31 09:31 UTC, Shekhar Berry	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1538248	0	unspecified	CLOSED	[RFE] Performance Improvements	2023-09-14 04:15:57 UTC

Internal Links: 1538248

Description Shekhar Berry 2018-05-31 09:31:12 UTC

Created attachment 1446199 [details]
Screenshot of import failing due to time out

Description of problem:
-----------------------
In CNS environment, the scale we recommend is 1000 volumes for 3 node gluster cluster. The test being done here is to find out resources consumed on RHGSWA server when we try to scale volume on 3 node gluster cluster. 
In the first run 100 volumes were created on a 3-node gluster cluster and these 3 nodes were imported.

The import task failed with Time-Out error. See below and the screenshot attached:

'Import jobs on cluster(c58ca47b-910e-493f-b699-643dc90737fe) not yet complete on all nodes([u'ef059316-ec09-408a-8f5d-d368274804fc', u'e162cc03-e603-4972-bf73-6a8bc6d1c471', u'986070c6-28ba-4c41-a688-da10247b4250']). Timing out.'

On further investigating the problem, We see each node gets about 6 minutes to complete import, withinh which if import is not completed it results in time-out.

Here is the piece of code that I feel is causing the issue. Its in /usr/lib/python2.7/site-packages/tendrl/commons/objects/cluster/atoms/import_cluster/__init__.py

 loop_count = 0
                # Wait for (no of nodes) * 6 minutes for import to complete
                wait_count = (len(node_list) - 1) * 36
                while True:
                    parent_job = NS.tendrl.objects.Job(
                        job_id=self.parameters['job_id']
                    ).load()
                    if loop_count >= wait_count:
                        logger.log(
                            "info",
                            NS.publisher_id,
                            {"message": "Import jobs on cluster(%s) not yet "
                             "complete on all nodes(%s). Timing out." %
                             (_cluster.short_name, str(node_list))},
                            job_id=self.parameters['job_id'],
                            flow_id=self.parameters['flow_id']
                        )
                        return False
                    time.sleep(10)



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
On Tendrl Server
----------------
rpm -qa | grep tendrl
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-2.el7rhgs.noarch
tendrl-notifier-1.6.3-2.el7rhgs.noarch
tendrl-ansible-1.6.3-2.el7rhgs.noarch
tendrl-commons-1.6.3-4.el7rhgs.noarch
tendrl-ui-1.6.3-1.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch


On Tendrl Nodes
---------------
rpm -qa | grep tendrl
tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-4.el7rhgs.noarch


How reproducible:
-----------------
Always


Steps to Reproduce:
-------------------
1. Create 100 replica-3 volumes on 3 node cluster
2. Try to Import nodes to RHGSWA
3. Observe

Actual results:
---------------
Import fails when we tried to scale 100 vols (300 bricks) in 3 node cluster.

Expected results:
-----------------

Import should have been successful.

Additional info:
----------------
The import stared at about 7:08 and it failed at about 7:21 so approximately after 13 minutes which matches with 6min per node. So basically it tried in two nodes for 6 minutes each and failed without going to the first node from where the gluster cluster was detected.

Comment 3 gowtham 2018-11-19 05:53:56 UTC

This issue is solved while fixing https://bugzilla.redhat.com/show_bug.cgi?id=1600092, now import waiting time for the flow is based on a number of volumes.

Comment 6 gowtham 2018-12-04 06:32:20 UTC

Part of import flow performance testing we tested up to 400 volumes and 1200 bricks also

Note You need to log in before you can comment on or make changes to this bug.