1364431 – [platformmanagement_public_713] It takes too much time for counting resources usage by cluster quota

Bug 1364431 - [platformmanagement_public_713] It takes too much time for counting resources usage by cluster quota

Summary: [platformmanagement_public_713] It takes too much time for counting resources...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	David Eads
QA Contact:	Qixuan Wang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1366740 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-05 10:43 UTC by Qixuan Wang
Modified:	2016-09-27 09:42 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-27 09:42:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
master config (5.21 KB, text/plain) 2016-08-08 10:58 UTC, Qixuan Wang	no flags	Details
controller_1_metrics (90.33 KB, text/plain) 2016-08-11 10:58 UTC, Qixuan Wang	no flags	Details
controller_2_metrics (90.36 KB, text/plain) 2016-08-11 10:59 UTC, Qixuan Wang	no flags	Details
apiserver_1_metrics (324.87 KB, text/plain) 2016-08-11 10:59 UTC, Qixuan Wang	no flags	Details
apiserver_2_metrics (325.10 KB, text/plain) 2016-08-11 11:00 UTC, Qixuan Wang	no flags	Details
08-15-master-config.yaml (5.25 KB, text/plain) 2016-08-15 09:26 UTC, Qixuan Wang	no flags	Details
08-16-node-config.yaml (1.18 KB, text/plain) 2016-08-15 09:28 UTC, Qixuan Wang	no flags	Details
08-15-api-metrics (193.36 KB, text/plain) 2016-08-15 09:36 UTC, Qixuan Wang	no flags	Details
08-15-controller-metrics (193.36 KB, text/plain) 2016-08-15 09:36 UTC, Qixuan Wang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Qixuan Wang 2016-08-05 10:43:15 UTC

Description of problem:
It takes more than 2 minutes a resource can be counted by cluster quota in OCP environment(can't reproduce in Origin). The performance can be optimized? 

Version-Release number of selected component (if applicable):
openshift v3.3.0.14
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create 1 project
# oc new-project project-a

2. Label projects
# oc label namespace project-a user=dev --config=./admin.kubeconfig   

3. Create a clusterquota with label selector "user=dev"
# oc create clusterresourcequota crq --project-label-selector=user=dev --hard=pods=2 --config=./admin.kubeconfig 

4. Create a pod and check clusterquota
# oc run testpod-1 --image=aosqe/hello-openshift --generator=run-pod/v1
# oc describe clusterresourcequota crq --config=./admin.kubeconfig


Actual results:
4. It will take more than 2 minutes to count a running pod or other resources.
[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq --config=./admin.kubeconfig
Name:		crq
Namespace:	<none>
Created:	18 minutes ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		1	2


Expected results:
4. Resources change should be reflected to cluster quota usage ASAP.

Additional info:

Comment 1 David Eads 2016-08-05 11:51:12 UTC

How big is the cluster and how many clusterresourcequotas are there?  Also, can you provide a master log at loglevel=4?

Comment 2 Qixuan Wang 2016-08-08 10:57:32 UTC

This problem can't be reproduced in non-HA environment but exist in HA (2master+2infra_node+2node+3etcd). Attached master-config.yaml. BTW, I wasn't able to capture any useful messages on a public HA environment, I'm going to setup a private env to get more info if need.

Comment 3 Qixuan Wang 2016-08-08 10:58:10 UTC

Created attachment 1188619 [details]
master config

Comment 4 David Eads 2016-08-08 20:04:22 UTC

> I'm going to setup a private env to get more info if need

I am going to need see the controller logs (loglevel=4 please) to really have a reasonable starting point for investigation.

Comment 5 Qixuan Wang 2016-08-09 09:59:31 UTC

Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1364403#c7 and attachments.

Comment 6 David Eads 2016-08-09 18:32:53 UTC

I've created https://github.com/openshift/origin/pull/10307 to gather metrics for the clusterquota controllers.  After its taken a while, please collect

curl -k https://controller-host-X:8444/metrics

curl -k https://each-api-server:8443/metrics

You may have to run `oadm policy add-cluster-role-to-user cluster-admin system:anonymous` or attach cluster-admin certs to the curl requests to get at those endpoints.

I'm working on getting a dev-ami

Comment 7 David Eads 2016-08-10 19:30:21 UTC

https://github.com/openshift/origin/pull/10307 has merged, but the devami job keeps failing on yum problems.

Once you have a build that contains it, please gather the metrics mentioned in comment-6.

Comment 8 Qixuan Wang 2016-08-11 10:57:41 UTC

Attached each apiserver and controller metrics. Are these what you want? Hope these help

Comment 9 Qixuan Wang 2016-08-11 10:58:40 UTC

Created attachment 1190012 [details]
controller_1_metrics

Comment 10 Qixuan Wang 2016-08-11 10:59:12 UTC

Created attachment 1190013 [details]
controller_2_metrics

Comment 11 Qixuan Wang 2016-08-11 10:59:38 UTC

Created attachment 1190014 [details]
apiserver_1_metrics

Comment 12 Qixuan Wang 2016-08-11 11:00:06 UTC

Created attachment 1190015 [details]
apiserver_2_metrics

Comment 14 David Eads 2016-08-11 18:51:58 UTC

It's hitting ratelimiting.  I'm considering my options.

Comment 15 David Eads 2016-08-11 19:42:19 UTC

Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287   To get immediate relief, update the master-config.yaml to update "ops:" to "qps:".

Comment 16 David Eads 2016-08-11 19:42:20 UTC

Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287   To get immediate relief, update the master-config.yaml to update "ops:" to "qps:".

Comment 17 David Eads 2016-08-12 11:41:32 UTC

Installer fix merged.

Comment 18 Jordan Liggitt 2016-08-12 18:24:38 UTC

*** Bug 1366740 has been marked as a duplicate of this bug. ***

Comment 19 Qixuan Wang 2016-08-15 08:58:45 UTC

Tested in HA environment (2master+2node+3etcd+1lbnfs)

Package version:

openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-docs-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-filter-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-lookup-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-playbooks-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift-ansible-roles-3.3.10-1.git.0.7060379.el7.noarch.rpm

atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-clients-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-master-3.3.0.19-1.git.0.93380aa.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-sdn-ovs-3.3.0.19-1.git.0.93380aa.el7.x86_64
atomic-openshift-tests-3.3.0.19-1.git.0.93380aa.el7.x86_64

PR https://github.com/openshift/openshift-ansible/pull/2287 is already contained in openshift-ansible-3.3.10-1. However, this problem persists.

Comment 20 Qixuan Wang 2016-08-15 09:26:34 UTC

Created attachment 1190833 [details]
08-15-master-config.yaml

masterClients:
  externalKubernetesClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    contentType: application/vnd.kubernetes.protobuf
    burst: 400
    qps: 200
  externalKubernetesKubeConfig: ""
  openshiftLoopbackClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    contentType: application/vnd.kubernetes.protobuf
    burst: 600
    qps: 300

Comment 21 Qixuan Wang 2016-08-15 09:28:00 UTC

Created attachment 1190834 [details]
08-16-node-config.yaml

masterClientConnectionOverrides:
  acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
  contentType: application/vnd.kubernetes.protobuf
  burst: 200
  qps: 100

Comment 22 Qixuan Wang 2016-08-15 09:36:11 UTC

Created attachment 1190836 [details]
08-15-api-metrics

Comment 23 Qixuan Wang 2016-08-15 09:36:58 UTC

Created attachment 1190837 [details]
08-15-controller-metrics

Comment 24 Qixuan Wang 2016-08-15 10:15:10 UTC

I'm very sorry, please ignore Comment 19~23. I configured master-config.yaml manually with "ClusterResourceQuota" enabled and didn't have this problem. Thanks.

Package version:
openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm
openshift v3.3.0.19
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date
Name:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	2
secrets		9	10
services	0	2


Mon Aug 15 18:08:17 CST 2016
[root@dhcp-141-95 qwang]# oc create -f multi-portsvc.json; date
service "multi-portsvc-2" created
Mon Aug 15 18:08:30 CST 2016
[root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date
Name:		crq
Namespace:	<none>
Created:	About an hour ago
Labels:		<none>
Annotations:	<none>
Label Selector: user=dev
AnnotationSelector: map[]
Resource	Used	Hard
--------	----	----
pods		0	2
secrets		9	10
services	1	2


Mon Aug 15 18:08:35 CST 2016

Comment 26 errata-xmlrpc 2016-09-27 09:42:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.