Description of problem: When looking at logs of an unhealthy API server, I noticed it appears to be hitting /healthz instead of /livez for liveness probes. Mar 15 03:01:38.268055 ip-10-0-146-200 hyperkube[1442]: I0315 03:01:38.268073 1442 prober.go:117] Liveness probe for "kube-apiserver-ip-10-0-146-200.us-west-1.compute.internal_openshift-kube-apiserver(22bd459e-677c-40ce-a715-9a263b663b2b):kube-apiserver" failed (failure): Get "https://10.0.146.200:6443/healthz": net/http: request canceled (Client.Timeout exceeded while awaiting headers) See also must-gather here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.8/1371290082278903808/artifacts/e2e-aws/ Version-Release number of selected component (if applicable): 4.8? possibly more How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
livez and healthz are slightly different, but in our configuration the distinction isn't meaningful enough to backport (I don't think).
huh, I can't seem to set targets. fun.
Verified in 4.8.0-0.nightly-2021-04-06-162113: $ oc get po -n openshift-kube-apiserver kube-apiserver-ci-ln-qfw7ms2-002ac-p5jzx-master-0 -o yaml ... livenessProbe: failureThreshold: 3 httpGet: path: livez ... readinessProbe: failureThreshold: 3 httpGet: path: readyz ... Liveness probe now uses "livez". Run crictl stop with the kube-apiserver container on master, then: $ oc describe po -n openshift-kube-apiserver kube-apiserver-ci-ln-qfw7ms2-002ac-p5jzx-master-0 ... Warning Unhealthy 6m14s (x2 over 6m24s) kubelet Liveness probe failed: Get "https://10.0.0.6:6443/livez": dial tcp 10.0.0.6:6443: connect: connection refused Warning ProbeError 6m14s (x2 over 6m24s) kubelet Readiness probe error: Get "https://10.0.0.6:6443/readyz": dial tcp 10.0.0.6:6443: connect: connection refused ... Same, liveness probe now uses "livez".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438