Bug 1364431
| Summary: | [platformmanagement_public_713] It takes too much time for counting resources usage by cluster quota | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qixuan Wang <qixuan.wang> | ||||||||||||||||||||
| Component: | Master | Assignee: | David Eads <deads> | ||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Qixuan Wang <qixuan.wang> | ||||||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||||||
| Priority: | medium | ||||||||||||||||||||||
| Version: | 3.3.0 | CC: | aos-bugs, jforrest, jokerman, mifiedle, mmccomas, qixuan.wang, tdawson | ||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||
| Last Closed: | 2016-09-27 09:42:35 UTC | Type: | Bug | ||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||
|
Description
Qixuan Wang
2016-08-05 10:43:15 UTC
How big is the cluster and how many clusterresourcequotas are there? Also, can you provide a master log at loglevel=4? This problem can't be reproduced in non-HA environment but exist in HA (2master+2infra_node+2node+3etcd). Attached master-config.yaml. BTW, I wasn't able to capture any useful messages on a public HA environment, I'm going to setup a private env to get more info if need. Created attachment 1188619 [details]
master config
> I'm going to setup a private env to get more info if need
I am going to need see the controller logs (loglevel=4 please) to really have a reasonable starting point for investigation.
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1364403#c7 and attachments. I've created https://github.com/openshift/origin/pull/10307 to gather metrics for the clusterquota controllers. After its taken a while, please collect curl -k https://controller-host-X:8444/metrics curl -k https://each-api-server:8443/metrics You may have to run `oadm policy add-cluster-role-to-user cluster-admin system:anonymous` or attach cluster-admin certs to the curl requests to get at those endpoints. I'm working on getting a dev-ami https://github.com/openshift/origin/pull/10307 has merged, but the devami job keeps failing on yum problems. Once you have a build that contains it, please gather the metrics mentioned in comment-6. Attached each apiserver and controller metrics. Are these what you want? Hope these help Created attachment 1190012 [details]
controller_1_metrics
Created attachment 1190013 [details]
controller_2_metrics
Created attachment 1190014 [details]
apiserver_1_metrics
Created attachment 1190015 [details]
apiserver_2_metrics
It's hitting ratelimiting. I'm considering my options. Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287 To get immediate relief, update the master-config.yaml to update "ops:" to "qps:". Config problem. Opened https://github.com/openshift/openshift-ansible/pull/2287 To get immediate relief, update the master-config.yaml to update "ops:" to "qps:". Installer fix merged. *** Bug 1366740 has been marked as a duplicate of this bug. *** Tested in HA environment (2master+2node+3etcd+1lbnfs) Package version: openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift-ansible-docs-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift-ansible-filter-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift-ansible-lookup-plugins-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift-ansible-playbooks-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift-ansible-roles-3.3.10-1.git.0.7060379.el7.noarch.rpm atomic-openshift-3.3.0.19-1.git.0.93380aa.el7.x86_64 atomic-openshift-clients-3.3.0.19-1.git.0.93380aa.el7.x86_64 atomic-openshift-master-3.3.0.19-1.git.0.93380aa.el7.x86_64 tuned-profiles-atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64 atomic-openshift-node-3.3.0.19-1.git.0.93380aa.el7.x86_64 atomic-openshift-sdn-ovs-3.3.0.19-1.git.0.93380aa.el7.x86_64 atomic-openshift-tests-3.3.0.19-1.git.0.93380aa.el7.x86_64 PR https://github.com/openshift/openshift-ansible/pull/2287 is already contained in openshift-ansible-3.3.10-1. However, this problem persists. Created attachment 1190833 [details]
08-15-master-config.yaml
masterClients:
externalKubernetesClientConnectionOverrides:
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
contentType: application/vnd.kubernetes.protobuf
burst: 400
qps: 200
externalKubernetesKubeConfig: ""
openshiftLoopbackClientConnectionOverrides:
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
contentType: application/vnd.kubernetes.protobuf
burst: 600
qps: 300
Created attachment 1190834 [details]
08-16-node-config.yaml
masterClientConnectionOverrides:
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
contentType: application/vnd.kubernetes.protobuf
burst: 200
qps: 100
Created attachment 1190836 [details]
08-15-api-metrics
Created attachment 1190837 [details]
08-15-controller-metrics
I'm very sorry, please ignore Comment 19~23. I configured master-config.yaml manually with "ClusterResourceQuota" enabled and didn't have this problem. Thanks. Package version: openshift-ansible-3.3.10-1.git.0.7060379.el7.noarch.rpm openshift v3.3.0.19 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git [root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date Name: crq Namespace: <none> Created: About an hour ago Labels: <none> Annotations: <none> Label Selector: user=dev AnnotationSelector: map[] Resource Used Hard -------- ---- ---- pods 0 2 secrets 9 10 services 0 2 Mon Aug 15 18:08:17 CST 2016 [root@dhcp-141-95 qwang]# oc create -f multi-portsvc.json; date service "multi-portsvc-2" created Mon Aug 15 18:08:30 CST 2016 [root@dhcp-141-95 qwang]# oc describe clusterresourcequota crq; date Name: crq Namespace: <none> Created: About an hour ago Labels: <none> Annotations: <none> Label Selector: user=dev AnnotationSelector: map[] Resource Used Hard -------- ---- ---- pods 0 2 secrets 9 10 services 1 2 Mon Aug 15 18:08:35 CST 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |