Bug 1939227

Summary: kube-apiserver liveness probe appears to be hitting /healthz, not /livez
Product: OpenShift Container Platform Reporter: Elana Hashman <ehashman>
Component: kube-apiserverAssignee: David Eads <deads>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8CC: aos-bugs, deads, kewang, mfojtik, wking, xxia
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:53:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elana Hashman 2021-03-15 18:51:58 UTC
Description of problem:

When looking at logs of an unhealthy API server, I noticed it appears to be hitting /healthz instead of /livez for liveness probes.

Mar 15 03:01:38.268055 ip-10-0-146-200 hyperkube[1442]: I0315 03:01:38.268073    1442 prober.go:117] Liveness probe for "kube-apiserver-ip-10-0-146-200.us-west-1.compute.internal_openshift-kube-apiserver(22bd459e-677c-40ce-a715-9a263b663b2b):kube-apiserver" failed (failure): Get "https://10.0.146.200:6443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)


See also must-gather here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.8/1371290082278903808/artifacts/e2e-aws/


Version-Release number of selected component (if applicable): 4.8? possibly more


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 David Eads 2021-03-16 19:28:20 UTC
livez and healthz are slightly different, but in our configuration the distinction isn't meaningful enough to backport (I don't think).

Comment 3 David Eads 2021-03-16 19:29:04 UTC
huh, I can't seem to set targets.  fun.

Comment 5 Xingxing Xia 2021-04-07 10:27:11 UTC
Verified in 4.8.0-0.nightly-2021-04-06-162113:
$ oc get po -n openshift-kube-apiserver kube-apiserver-ci-ln-qfw7ms2-002ac-p5jzx-master-0 -o yaml
...
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: livez
...
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: readyz
...

Liveness probe now uses "livez".

Run crictl stop with the kube-apiserver container on master, then:
$ oc describe po -n openshift-kube-apiserver kube-apiserver-ci-ln-qfw7ms2-002ac-p5jzx-master-0
...
  Warning  Unhealthy   6m14s (x2 over 6m24s)  kubelet  Liveness probe failed: Get "https://10.0.0.6:6443/livez": dial tcp 10.0.0.6:6443: connect: connection refused
  Warning  ProbeError  6m14s (x2 over 6m24s)  kubelet  Readiness probe error: Get "https://10.0.0.6:6443/readyz": dial tcp 10.0.0.6:6443: connect: connection refused
...

Same, liveness probe now uses "livez".

Comment 8 errata-xmlrpc 2021-07-27 22:53:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438