Bug 1600160 - scalability issue with gluster file on OCP
Summary: scalability issue with gluster file on OCP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: John Mulligan
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: Red Hat1568862 1581864
TreeView+ depends on / blocked
 
Reported: 2018-07-11 14:47 UTC by Hongkai Liu
Modified: 2018-12-11 04:58 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when under load due to many concurrent requests, Heketi consumed high amounts of memory and network connections and contended with itself for resources, leading to unpredictable behavior when provisioning new volumes. With this fix, Heketi now throttles client requests by considering the number of in-flight operations it is working on and telling clients to retry their requests later when that number reaches a threshold. The Heketi command line client and OpenShift/Kubernetes provisioner automatically retry these requests later.
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:49 UTC


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1555062 0 unspecified CLOSED glusterfs-storage-block: cannot create 100 PVCs 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1609360 0 unspecified CLOSED [Tracker-OCP-BZ#1613781] scalability issue at external dynamic prov 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:25:05 UTC

Internal Links: 1555062 1609360

Description Hongkai Liu 2018-07-11 14:47:30 UTC
Description of problem:
PVC provision is slow and the time spent on each PVC provision is unpredictable.


Version-Release number of selected component (if applicable):

openshift_storage_glusterfs_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-server-rhel7:3.3.1-22
openshift_storage_glusterfs_heketi_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-volmanager-rhel7:3.3.1-19
openshift_storage_glusterfs_block_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-gluster-block-prov-rhel7:3.3.1-18

How reproducible:
Always

Steps to Reproduce:
1. Create 250 PVC (either gluster.file or gluster.block)
2. Measure the number of bound PV each minute

Actual results:
This is the result on GCE for gluster.file:
# tail -f /tmp/pvc.log 
2018-07-11 09:18:01 bound PVC number is 0
2018-07-11 09:19:02 bound PVC number is 1
2018-07-11 09:20:01 bound PVC number is 3
2018-07-11 09:21:01 bound PVC number is 16
2018-07-11 09:22:01 bound PVC number is 30
2018-07-11 09:23:02 bound PVC number is 30
2018-07-11 09:24:01 bound PVC number is 30
2018-07-11 09:25:01 bound PVC number is 71
2018-07-11 09:26:01 bound PVC number is 71
2018-07-11 09:27:02 bound PVC number is 71
2018-07-11 09:28:01 bound PVC number is 71
2018-07-11 09:29:01 bound PVC number is 71
2018-07-11 09:30:02 bound PVC number is 71
2018-07-11 09:31:01 bound PVC number is 71
2018-07-11 09:32:01 bound PVC number is 71
2018-07-11 09:33:01 bound PVC number is 71
2018-07-11 09:34:02 bound PVC number is 71
2018-07-11 09:35:01 bound PVC number is 71
2018-07-11 09:36:01 bound PVC number is 71
2018-07-11 09:37:02 bound PVC number is 71
2018-07-11 09:38:01 bound PVC number is 71
2018-07-11 09:39:01 bound PVC number is 71
2018-07-11 09:40:02 bound PVC number is 71
2018-07-11 09:41:01 bound PVC number is 71
2018-07-11 09:42:01 bound PVC number is 71
2018-07-11 09:43:02 bound PVC number is 71
2018-07-11 09:44:01 bound PVC number is 71
2018-07-11 09:45:01 bound PVC number is 71
2018-07-11 09:46:01 bound PVC number is 71
2018-07-11 09:47:02 bound PVC number is 71
2018-07-11 09:48:01 bound PVC number is 71
2018-07-11 09:49:01 bound PVC number is 71
2018-07-11 09:50:02 bound PVC number is 71
2018-07-11 09:51:01 bound PVC number is 71
2018-07-11 09:52:01 bound PVC number is 71
2018-07-11 09:53:02 bound PVC number is 71
2018-07-11 09:54:01 bound PVC number is 71
2018-07-11 09:55:01 bound PVC number is 71
2018-07-11 09:56:02 bound PVC number is 71
2018-07-11 09:57:01 bound PVC number is 71
2018-07-11 09:58:01 bound PVC number is 71
2018-07-11 09:59:02 bound PVC number is 71
2018-07-11 10:00:01 bound PVC number is 71
2018-07-11 10:01:01 bound PVC number is 71
2018-07-11 10:02:01 bound PVC number is 71
2018-07-11 10:03:02 bound PVC number is 71
2018-07-11 10:04:01 bound PVC number is 71
2018-07-11 10:05:01 bound PVC number is 71
2018-07-11 10:06:02 bound PVC number is 71
2018-07-11 10:07:01 bound PVC number is 71
2018-07-11 10:08:01 bound PVC number is 71
2018-07-11 10:09:02 bound PVC number is 71
2018-07-11 10:10:01 bound PVC number is 71
2018-07-11 10:11:01 bound PVC number is 71
2018-07-11 10:12:02 bound PVC number is 71
2018-07-11 10:13:01 bound PVC number is 71



Expected results:
Maximal 10 seconds for each PVC and at least 1000 PVCs can be provisioned.


Additional info:
On OCP 3.9 with the CNS version at that time, it works for gluster.file.
For gluster.block, the scalability is always an issue.

Comment 2 John Mulligan 2018-07-12 17:24:13 UTC
Hi Hongkai,

Are you creating these PVCs with a script or tool? Could you provide it?


Could you clarify what you mean by:
> On OCP 3.9 with the CNS version at that time, it works for gluster.file.
For gluster.block, the scalability is always an issue.

Comment 9 Hongkai Liu 2018-07-19 11:53:09 UTC
@Raghavendra,

which tags of CNS images has that fix? Thanks.

Comment 10 Humble Chirammal 2018-07-19 13:20:01 UTC
(In reply to Hongkai Liu from comment #9)
> @Raghavendra,
> 
> which tags of CNS images has that fix? Thanks.

Fixed in version : brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-volmanager-rhel7:3.3.1-21

Comment 13 Vikas Laad 2018-07-20 19:42:04 UTC
I was able to verify this bz with gluster file, I still need to do gluster block.

I was able to create 750 pvc in 3 different projects in 1 hour with the same creation rate as mentioned in the bz.

Comment 33 Anjana KD 2018-08-31 00:12:22 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 35 Anjana KD 2018-09-05 09:47:03 UTC
Have made the changes based on the feedback given.

Comment 36 John Mulligan 2018-09-05 19:23:08 UTC
(In reply to Anjana from comment #35)
> Have made the changes based on the feedback given.

Looks OK to me.

Comment 38 errata-xmlrpc 2018-09-12 09:23:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.