Bug 1584593 - Import of 3 Node (100 Vols & 300 bricks) fails with Time-Out Error
Summary: Import of 3 Node (100 Vols & 300 bricks) fails with Time-Out Error
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-gluster-integration
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: gowtham
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-31 09:31 UTC by Shekhar Berry
Modified: 2019-05-08 17:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-08 15:45:41 UTC
Embargoed:


Attachments (Terms of Use)
Screenshot of import failing due to time out (106.50 KB, image/png)
2018-05-31 09:31 UTC, Shekhar Berry
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1538248 0 unspecified CLOSED [RFE] Performance Improvements 2023-09-14 04:15:57 UTC

Internal Links: 1538248

Description Shekhar Berry 2018-05-31 09:31:12 UTC
Created attachment 1446199 [details]
Screenshot of import failing due to time out

Description of problem:
-----------------------
In CNS environment, the scale we recommend is 1000 volumes for 3 node gluster cluster. The test being done here is to find out resources consumed on RHGSWA server when we try to scale volume on 3 node gluster cluster. 
In the first run 100 volumes were created on a 3-node gluster cluster and these 3 nodes were imported.

The import task failed with Time-Out error. See below and the screenshot attached:

'Import jobs on cluster(c58ca47b-910e-493f-b699-643dc90737fe) not yet complete on all nodes([u'ef059316-ec09-408a-8f5d-d368274804fc', u'e162cc03-e603-4972-bf73-6a8bc6d1c471', u'986070c6-28ba-4c41-a688-da10247b4250']). Timing out.'

On further investigating the problem, We see each node gets about 6 minutes to complete import, withinh which if import is not completed it results in time-out.

Here is the piece of code that I feel is causing the issue. Its in /usr/lib/python2.7/site-packages/tendrl/commons/objects/cluster/atoms/import_cluster/__init__.py

 loop_count = 0
                # Wait for (no of nodes) * 6 minutes for import to complete
                wait_count = (len(node_list) - 1) * 36
                while True:
                    parent_job = NS.tendrl.objects.Job(
                        job_id=self.parameters['job_id']
                    ).load()
                    if loop_count >= wait_count:
                        logger.log(
                            "info",
                            NS.publisher_id,
                            {"message": "Import jobs on cluster(%s) not yet "
                             "complete on all nodes(%s). Timing out." %
                             (_cluster.short_name, str(node_list))},
                            job_id=self.parameters['job_id'],
                            flow_id=self.parameters['flow_id']
                        )
                        return False
                    time.sleep(10)



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
On Tendrl Server
----------------
rpm -qa | grep tendrl
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-2.el7rhgs.noarch
tendrl-notifier-1.6.3-2.el7rhgs.noarch
tendrl-ansible-1.6.3-2.el7rhgs.noarch
tendrl-commons-1.6.3-4.el7rhgs.noarch
tendrl-ui-1.6.3-1.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch


On Tendrl Nodes
---------------
rpm -qa | grep tendrl
tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-4.el7rhgs.noarch


How reproducible:
-----------------
Always


Steps to Reproduce:
-------------------
1. Create 100 replica-3 volumes on 3 node cluster
2. Try to Import nodes to RHGSWA
3. Observe

Actual results:
---------------
Import fails when we tried to scale 100 vols (300 bricks) in 3 node cluster.

Expected results:
-----------------

Import should have been successful.

Additional info:
----------------
The import stared at about 7:08 and it failed at about 7:21 so approximately after 13 minutes which matches with 6min per node. So basically it tried in two nodes for 6 minutes each and failed without going to the first node from where the gluster cluster was detected.

Comment 3 gowtham 2018-11-19 05:53:56 UTC
This issue is solved while fixing https://bugzilla.redhat.com/show_bug.cgi?id=1600092, now import waiting time for the flow is based on a number of volumes.

Comment 6 gowtham 2018-12-04 06:32:20 UTC
Part of import flow performance testing we tested up to 400 volumes and 1200 bricks also


Note You need to log in before you can comment on or make changes to this bug.