Bug 1363641
| Summary: | Cannot decrease current CPU rate via HorizontalPodAutoscaler. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Zhang Cheng <chezhang> | ||||
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | chunchen <chunchen> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.3.0 | CC: | aos-bugs, chezhang, dma, mwringe, tdawson, wmeng, wsun | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-09-27 09:42:19 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Zhang Cheng
2016-08-03 09:20:50 UTC
Can you attach in OpenShift node logs? Is there anything odd in there? I also collected the master and node logs for your debugging. It cannot be attached in here since size more than limitation. I shared with you via google Drive, you can find it from https://drive.google.com/drive/shared-with-me how many nodes were you running? Which node were the pods that had no metrics available running on? Were they all running on the same node, and were the ones that did have metrics available running on a different node? Also, what log level were you logging at? This bug is easy to recreate follow my steps mentioned in the beginning of this page. You can try to reproduce it in your local env, please let me know if you need any support. ok, so, a bit more digging. Can you try upping your collection interval in the Heapster RC? set `-metrics_resolution` to `20s` instead of `10s`, and then killing the Heapster pod (so it gets restarted with the new collection interval)? It's possible what we're seeing is that since the collection interval for cAdvisor is 10s, and the Heapster collection interval is 10s, Heapster is managing to collect two samples from the same interval. This would cause the rate calculator not to fire, which would cause the "no metrics" error (which actually means "couldn't find memory/usage or cpu/usage_rate") Hi, Solly HPA can get metrics info succeed after change `-metrics_resolution` to `20s` and restart heapster pod by manually. I used all by default while setting up metrics, below is my steps: oc project openshift-infra oc create -f https://raw.githubusercontent.com/openshift/origin-metrics/master/metrics-deployer-setup.yaml oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster oc policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer oc secrets new metrics-deployer nothing=/dev/null oc process openshift//metrics-deployer-template -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.0816-lfr.qe.rhcloud.com,IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,IMAGE_VERSION=latest,USE_PERSISTENT_STORAGE=false,MASTER_URL=https://host-8-173-175.host.centralci.eng.rdu2.redhat.com:8443,CASSANDRA_PV_SIZE=10Gi |oc create -f - In this case, can you change the default value of `-metrics_resolution` in metrics-deployer-template? mwringe@ I think we should update the default collection interval to be higher than 10s (not sure it needs to be a full 20s, but I think it at least needs to be more than 10s to make sure we always get a sample from a different cAdvisor sample period). reassigning to mwringe@ to deal with the origin-metrics template change. The metric resolution has now been updated to 15s Test passed and verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |