Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1582287

Summary: API container responds with HTTP 500 to a liveness check and gets restarted, breaking
Product: OpenShift Container Platform Reporter: Vadim Rutkovsky <vrutkovs>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED DEFERRED QA Contact: Wang Haoran <haowang>
Severity: low Docs Contact:
Priority: low    
Version: 3.10.0CC: aos-bugs, jokerman, mfojtik, mmccomas
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-20 18:56:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log excerpt
none
Full API logs none

Description Vadim Rutkovsky 2018-05-24 18:06:58 UTC
Created attachment 1441178 [details]
Log excerpt

Description of problem:
API container /healthz endpoint randomly returns HTTP 500, causing openshift-ansible throw errors about API timeout or missing objects

Version-Release number of selected component (if applicable):
v1.10.0+b81c8f8

How reproducible:
~50% of installs

Steps to Reproduce:
1. Setup IOrigin using latest openshift-ansible
2. Run `/usr/local/bin/master-logs api api`

Actual results:
API containers contains golang traceback on some /healthz requests, see attachement

Expected results:
No tracebacks, all liveness checks pass

Additional info:

Comment 1 Vadim Rutkovsky 2018-05-24 18:41:28 UTC
Created attachment 1441205 [details]
Full API logs

Comment 2 Michal Fojtik 2018-05-25 10:52:10 UTC
From the log attached, the probes seems to behave normally, not randomly flipping between 200/500:

https://gist.githubusercontent.com/mfojtik/9793a3c1312da6cd41fccd0789b531e7/raw/2ef8689b873e7fd149e6bab4e9b9b1ae61866d43/gistfile1.txt

Comment 3 Michal Fojtik 2018-05-25 11:11:08 UTC
This is the record of the livenessProbe from the log:

https://gist.github.com/mfojtik/bb8707b1498a99be96c98818e7e4d859

Seems normal to me as well. Moving off the blocker list for now, if you are able to replicate this again or prove the sporadic behavior or healthz endpoint, please set this BZ back to target release 3.10.0 as it will be deliver blocker in that case.

Comment 4 Stephen Cuppett 2019-11-20 18:56:02 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift