Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1364431 - [platformmanagement_public_713] It takes too much time for counting resources usage by cluster quota
[platformmanagement_public_713] It takes too much time for counting resources...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: David Eads
Qixuan Wang
:
: 1366740 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-05 06:43 EDT by Qixuan Wang
Modified: 2016-09-27 05:42 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-27 05:42:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
master config (5.21 KB, text/plain)
2016-08-08 06:58 EDT, Qixuan Wang
no flags Details
controller_1_metrics (90.33 KB, text/plain)
2016-08-11 06:58 EDT, Qixuan Wang
no flags Details
controller_2_metrics (90.36 KB, text/plain)
2016-08-11 06:59 EDT, Qixuan Wang
no flags Details
apiserver_1_metrics (324.87 KB, text/plain)
2016-08-11 06:59 EDT, Qixuan Wang
no flags Details
apiserver_2_metrics (325.10 KB, text/plain)
2016-08-11 07:00 EDT, Qixuan Wang
no flags Details
08-15-master-config.yaml (5.25 KB, text/plain)
2016-08-15 05:26 EDT, Qixuan Wang
no flags Details
08-16-node-config.yaml (1.18 KB, text/plain)
2016-08-15 05:28 EDT, Qixuan Wang
no flags Details
08-15-api-metrics (193.36 KB, text/plain)
2016-08-15 05:36 EDT, Qixuan Wang
no flags Details
08-15-controller-metrics (193.36 KB, text/plain)
2016-08-15 05:36 EDT, Qixuan Wang
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 09:24:36 EDT

  None (edit)
Description Qixuan Wang 2016-08-05 06:43:15 EDT
Description of problem:
It takes more than 2 minutes a resource can be counted by cluster quota in OCP environment(can't reproduce in Origin). The performance can be optimized? 

Version-Release number of selected component (if applicable):
openshift v3.3.0.14
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create 1 project
# oc new-project project-a

2. Label projects
# oc label namespace project-a user=dev --config=./admin.kubeconfig   

3. Create a clusterquota with label selector "user=dev"
# oc create clusterresourcequota crq --project-label-selector=user=dev --hard=pods=2 --config=./admin.kubeconfig 

4. Create a pod and check clusterquota
# oc run testpod-1 --image=aosqe/hello-openshift --generator=run-pod/v1
# oc describe clusterresourcequota crq --config=./admin.kubeconfig


Actual results:
4. It will take more than 2 minutes to count a running pod or other resources.
[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq --config=./admin.kubeconfig
Name:		crq
Namespace:	<none>
Created:	18 minutes ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		1	2


Expected results:
4. Resources change should be reflected to cluster quota usage ASAP.

Additional info:
Comment 1 David Eads 2016-08-05 07:51:12 EDT
How big is the cluster and how many clusterresourcequotas are there?  Also, can you provide a master log at loglevel=4?
Comment 2 Qixuan Wang 2016-08-08 06:57:32 EDT
This problem can't be reproduced in non-HA environment but exist in HA (2master+2infra_node+2node+3etcd). Attached master-config.yaml. BTW, I wasn't able to capture any useful messages on a public HA environment, I'm going to setup a private env to get more info if need.
Comment 3 Qixuan Wang 2016-08-08 06:58 EDT
Created attachment 1188619 [details]
master config
Comment 4 David Eads 2016-08-08 16:04:22 EDT
> I'm going to setup a private env to get more info if need

I am going to need see the controller logs (loglevel=4 please) to really have a reasonable starting point for investigation.
Comment 5 Qixuan Wang 2016-08-09 05:59:31 EDT
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1364403#c7 and attachments.
Comment 6 David Eads 2016-08-09 14:32:53 EDT
I've created https://github.com/openshift/origin/pull/10307 to gather metrics for the clusterquota controllers.  After its taken a while, please collect

curl -k https://controller-host-X:8444/metrics

curl -k https://each-api-server:8443/metrics

You may have to run `oadm policy add-cluster-role-to-user cluster-admin system:anonymous` or attach cluster-admin certs to the curl requests to get at those endpoints.

I'm working on getting a dev-ami
Comment 7 David Eads 2016-08-10 15:30:21 EDT
https://github.com/openshift/origin/pull/10307 has merged, but the devami job keeps failing on yum problems.

Once you have a build that contains it, please gather the metrics mentioned in comment-6.
Comment 8 Qixuan Wang 2016-08-11 06:57:41 EDT
Attached each apiserver and controller metrics. Are these what you want? Hope these help
Comment 9 Qixuan Wang 2016-08-11 06:58 EDT
Created attachment 1190012 [details]
controller_1_metrics
Comment 10 Qixuan Wang 2016-08-11 06:59 EDT
Created attachment 1190013 [details]
controller_2_metrics
Comment 11 Qixuan Wang 2016-08-11 06:59 EDT
Created attachment 1190014 [details]
apiserver_1_metrics
Comment 12 Qixuan Wang 2016-08-11 07:00 EDT
Created attachment 1190015 [details]
apiserver_2_metrics
Comment 14 David Eads 2016-08-11 14:51:58 EDT
It's hitting ratelimiting.  I'm considering my options.
Comment 15 David Eads 2016-08-11 15:42:19 EDT
Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287   To get immediate relief, update the master-config.yaml to update "ops:" to "qps:".
Comment 16 David Eads 2016-08-11 15:42:20 EDT
Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287   To get immediate relief, update the master-config.yaml to update "ops:" to "qps:".
Comment 17 David Eads 2016-08-12 07:41:32 EDT
Installer fix merged.
Comment 18 Jordan Liggitt 2016-08-12 14:24:38 EDT
*** Bug 1366740 has been marked as a duplicate of this bug. ***
Comment 19 Qixuan Wang 2016-08-15 04:58:45 EDT
Tested in HA environment (2master+2node+3etcd+1lbnfs)

Package version:

openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-docs-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-filter-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-lookup-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-playbooks-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-roles-3.3.10-1.git.0.7060379.el7.noarch.rpm

atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-clients-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-master-3.3.0.19-1.git.0.93380aa.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-sdn-ovs-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-tests-3.3.0.19-1.git.0.93380aa.el7.x86_64

PR https://github.com/openshift/openshift-ansible/pull/2287 is already contained in openshift-ansible-3.3.10-1. However, this problem persists.
Comment 20 Qixuan Wang 2016-08-15 05:26 EDT
Created attachment 1190833 [details]
08-15-master-config.yaml

masterClients:
  externalKubernetesClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    contentType: application/vnd.kubernetes.protobuf
    burst: 400
    qps: 200
  externalKubernetesKubeConfig: ""
  openshiftLoopbackClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    contentType: application/vnd.kubernetes.protobuf
    burst: 600
    qps: 300
Comment 21 Qixuan Wang 2016-08-15 05:28 EDT
Created attachment 1190834 [details]
08-16-node-config.yaml

masterClientConnectionOverrides:
  acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
  contentType: application/vnd.kubernetes.protobuf
  burst: 200
  qps: 100
Comment 22 Qixuan Wang 2016-08-15 05:36 EDT
Created attachment 1190836 [details]
08-15-api-metrics
Comment 23 Qixuan Wang 2016-08-15 05:36 EDT
Created attachment 1190837 [details]
08-15-controller-metrics
Comment 24 Qixuan Wang 2016-08-15 06:15:10 EDT
I'm very sorry, please ignore Comment 19~23. I configured master-config.yaml manually with "ClusterResourceQuota" enabled and didn't have this problem. Thanks.

Package version:
openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift v3.3.0.19
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date
Name:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	2
secrets		9	10
services	0	2


Mon Aug 15 18:08:17 CST 2016
[root@dhcp-141-95 qwang]# oc create -f multi-portsvc.json; date
service "multi-portsvc-2" created
Mon Aug 15 18:08:30 CST 2016
[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date
Name:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	2
secrets		9	10
services	1	2


Mon Aug 15 18:08:35 CST 2016
Comment 26 errata-xmlrpc 2016-09-27 05:42:35 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.