Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1600160 - scalability issue with gluster file on OCP
scalability issue with gluster file on OCP
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi (Show other bugs)
cns-3.10
Unspecified Unspecified
unspecified Severity high
: ---
: CNS 3.10
Assigned To: John Mulligan
Neha Berry
:
Depends On:
Blocks: 1568862 1581864
  Show dependency treegraph
 
Reported: 2018-07-11 10:47 EDT by Hongkai Liu
Modified: 2018-09-12 05:25 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when under load due to many concurrent requests, Heketi consumed high amounts of memory and network connections and contended with itself for resources, leading to unpredictable behavior when provisioning new volumes. With this fix, Heketi now throttles client requests by considering the number of in-flight operations it is working on and telling clients to retry their requests later when that number reaches a threshold. The Heketi command line client and OpenShift/Kubernetes provisioner automatically retry these requests later.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-12 05:23:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2686 None None None 2018-09-12 05:25 EDT

  None (edit)
Description Hongkai Liu 2018-07-11 10:47:30 EDT
Description of problem:
PVC provision is slow and the time spent on each PVC provision is unpredictable.


Version-Release number of selected component (if applicable):

openshift_storage_glusterfs_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-server-rhel7:3.3.1-22
openshift_storage_glusterfs_heketi_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-volmanager-rhel7:3.3.1-19
openshift_storage_glusterfs_block_image=registry.reg-aws.openshift.com:443/rhgs3/rhgs-gluster-block-prov-rhel7:3.3.1-18

How reproducible:
Always

Steps to Reproduce:
1. Create 250 PVC (either gluster.file or gluster.block)
2. Measure the number of bound PV each minute

Actual results:
This is the result on GCE for gluster.file:
# tail -f /tmp/pvc.log 
2018-07-11 09:18:01 bound PVC number is 0
2018-07-11 09:19:02 bound PVC number is 1
2018-07-11 09:20:01 bound PVC number is 3
2018-07-11 09:21:01 bound PVC number is 16
2018-07-11 09:22:01 bound PVC number is 30
2018-07-11 09:23:02 bound PVC number is 30
2018-07-11 09:24:01 bound PVC number is 30
2018-07-11 09:25:01 bound PVC number is 71
2018-07-11 09:26:01 bound PVC number is 71
2018-07-11 09:27:02 bound PVC number is 71
2018-07-11 09:28:01 bound PVC number is 71
2018-07-11 09:29:01 bound PVC number is 71
2018-07-11 09:30:02 bound PVC number is 71
2018-07-11 09:31:01 bound PVC number is 71
2018-07-11 09:32:01 bound PVC number is 71
2018-07-11 09:33:01 bound PVC number is 71
2018-07-11 09:34:02 bound PVC number is 71
2018-07-11 09:35:01 bound PVC number is 71
2018-07-11 09:36:01 bound PVC number is 71
2018-07-11 09:37:02 bound PVC number is 71
2018-07-11 09:38:01 bound PVC number is 71
2018-07-11 09:39:01 bound PVC number is 71
2018-07-11 09:40:02 bound PVC number is 71
2018-07-11 09:41:01 bound PVC number is 71
2018-07-11 09:42:01 bound PVC number is 71
2018-07-11 09:43:02 bound PVC number is 71
2018-07-11 09:44:01 bound PVC number is 71
2018-07-11 09:45:01 bound PVC number is 71
2018-07-11 09:46:01 bound PVC number is 71
2018-07-11 09:47:02 bound PVC number is 71
2018-07-11 09:48:01 bound PVC number is 71
2018-07-11 09:49:01 bound PVC number is 71
2018-07-11 09:50:02 bound PVC number is 71
2018-07-11 09:51:01 bound PVC number is 71
2018-07-11 09:52:01 bound PVC number is 71
2018-07-11 09:53:02 bound PVC number is 71
2018-07-11 09:54:01 bound PVC number is 71
2018-07-11 09:55:01 bound PVC number is 71
2018-07-11 09:56:02 bound PVC number is 71
2018-07-11 09:57:01 bound PVC number is 71
2018-07-11 09:58:01 bound PVC number is 71
2018-07-11 09:59:02 bound PVC number is 71
2018-07-11 10:00:01 bound PVC number is 71
2018-07-11 10:01:01 bound PVC number is 71
2018-07-11 10:02:01 bound PVC number is 71
2018-07-11 10:03:02 bound PVC number is 71
2018-07-11 10:04:01 bound PVC number is 71
2018-07-11 10:05:01 bound PVC number is 71
2018-07-11 10:06:02 bound PVC number is 71
2018-07-11 10:07:01 bound PVC number is 71
2018-07-11 10:08:01 bound PVC number is 71
2018-07-11 10:09:02 bound PVC number is 71
2018-07-11 10:10:01 bound PVC number is 71
2018-07-11 10:11:01 bound PVC number is 71
2018-07-11 10:12:02 bound PVC number is 71
2018-07-11 10:13:01 bound PVC number is 71



Expected results:
Maximal 10 seconds for each PVC and at least 1000 PVCs can be provisioned.


Additional info:
On OCP 3.9 with the CNS version at that time, it works for gluster.file.
For gluster.block, the scalability is always an issue.
Comment 2 John Mulligan 2018-07-12 13:24:13 EDT
Hi Hongkai,

Are you creating these PVCs with a script or tool? Could you provide it?


Could you clarify what you mean by:
> On OCP 3.9 with the CNS version at that time, it works for gluster.file.
For gluster.block, the scalability is always an issue.
Comment 9 Hongkai Liu 2018-07-19 07:53:09 EDT
@Raghavendra,

which tags of CNS images has that fix? Thanks.
Comment 10 Humble Chirammal 2018-07-19 09:20:01 EDT
(In reply to Hongkai Liu from comment #9)
> @Raghavendra,
> 
> which tags of CNS images has that fix? Thanks.

Fixed in version : brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-volmanager-rhel7:3.3.1-21
Comment 13 Vikas Laad 2018-07-20 15:42:04 EDT
I was able to verify this bz with gluster file, I still need to do gluster block.

I was able to create 750 pvc in 3 different projects in 1 hour with the same creation rate as mentioned in the bz.
Comment 33 Anjana 2018-08-30 20:12:22 EDT
Updated doc text in the Doc Text field. Please review for technical accuracy.
Comment 35 Anjana 2018-09-05 05:47:03 EDT
Have made the changes based on the feedback given.
Comment 36 John Mulligan 2018-09-05 15:23:08 EDT
(In reply to Anjana from comment #35)
> Have made the changes based on the feedback given.

Looks OK to me.
Comment 38 errata-xmlrpc 2018-09-12 05:23:49 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686

Note You need to log in before you can comment on or make changes to this bug.