Bug 1582287
| Summary: | API container responds with HTTP 500 to a liveness check and gets restarted, breaking | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vadim Rutkovsky <vrutkovs> | ||||||
| Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> | ||||||
| Status: | CLOSED DEFERRED | QA Contact: | Wang Haoran <haowang> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mfojtik, mmccomas | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.10.z | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-11-20 18:56:02 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1441205 [details]
Full API logs
From the log attached, the probes seems to behave normally, not randomly flipping between 200/500: https://gist.githubusercontent.com/mfojtik/9793a3c1312da6cd41fccd0789b531e7/raw/2ef8689b873e7fd149e6bab4e9b9b1ae61866d43/gistfile1.txt This is the record of the livenessProbe from the log: https://gist.github.com/mfojtik/bb8707b1498a99be96c98818e7e4d859 Seems normal to me as well. Moving off the blocker list for now, if you are able to replicate this again or prove the sporadic behavior or healthz endpoint, please set this BZ back to target release 3.10.0 as it will be deliver blocker in that case. OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed. [1]: https://access.redhat.com/support/policy/updates/openshift |
Created attachment 1441178 [details] Log excerpt Description of problem: API container /healthz endpoint randomly returns HTTP 500, causing openshift-ansible throw errors about API timeout or missing objects Version-Release number of selected component (if applicable): v1.10.0+b81c8f8 How reproducible: ~50% of installs Steps to Reproduce: 1. Setup IOrigin using latest openshift-ansible 2. Run `/usr/local/bin/master-logs api api` Actual results: API containers contains golang traceback on some /healthz requests, see attachement Expected results: No tracebacks, all liveness checks pass Additional info: