Bug 1628961
| Summary: | [free-stg] upgrade failed due to '/apis/metrics.k8s.io/v1beta1' -> "Error from server (ServiceUnavailable): the server is currently unable to handle the request" | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Justin Pierce <jupierce> | ||||||
| Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Weinan Liu <weinliu> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.11.0 | CC: | aos-bugs, deads, jokerman, juzhao, maszulik, mfojtik, mmccomas, weinliu, wmeng, wsun, xtian | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.11.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-11-26 15:51:58 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1483342 [details]
master api log excerpt
This is neat. The "listings from apiservice / pods " attachment indicates that the metrics pod is running, but the core-apiserver cannot find a route to the service IP. Further investigation found that controller pods were in crash loop backoff. F0914 14:21:01.166381 1 plugins.go:136] Could not create hostpath recycler pod from file /etc/origin/master/recycler_pod.yaml: failed to read file path /etc/origin/master/recycler_pod.yaml: open /etc/origin/master/recycler_pod.yaml: no such file or directory This was traced to an openshift-ansible problem. At least in this specific environment (free-stg) controllers were crashing due to misconfiguration introduced in last night's build. Deploying the recycler pod definition and restarting controllers seems to have cleared up the metrics server api failure. I imagine controllers dying prevent the network from properly converging after the control plane was restarted. https://github.com/openshift/openshift-ansible/pull/10082 to fix that I'm taking this bug for Upgrade component and verifying in my environment. https://github.com/openshift/openshift-ansible/pull/10089 release-3.11 pick The PR 10089 has been merged to openshift-ansible-3.11.7-1 |
Created attachment 1483339 [details] listings from apiservice / pods Description of problem: During an openshift-ansible based upgrade from 3.11.0-0.32.0->3.11.3, the upgrade stopped after trying and failing for several minutes to get metrics: fatal: [free-stg-master-03fb6]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "--raw", "/apis/metrics.k8s.io/v1beta1"], "delta": "0:00:00.358927", "end": "2018-09-14 13:26:36.640073", "msg": "non-zero return code", "rc": 1, "start": "2018-09-14 13:26:36.281146", "stderr": "Error from server (ServiceUnavailable): the server is currently unable to handle the request", "stderr_lines": ["Error from server (ServiceUnavailable): the server is currently unable to handle the request"], "stdout": "", "stdout_lines": []} Version-Release number of selected component (if applicable): v3.11.3 Steps to Reproduce: 1. Error occurred running openshift-ansible upgrade from 3.11.0-0.32.0 -> 3.11.3 Additional info: See attachments for detailed listings