Bug 2066886
Summary: | openshift-apiserver pods never going NotReady | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> | |
Component: | openshift-apiserver | Assignee: | Abu Kashem <akashem> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Gangwar <rgangwar> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.11 | CC: | akashem, mfojtik, sanchezl | |
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2109235 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:55:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2109235 |
Description
Devan Goodwin
2022-03-22 16:54:54 UTC
readiness probe with default values (from the Pod object) > readinessProbe: > failureThreshold: 10 > httpGet: > path: healthz > port: 8443 > scheme: HTTPS > periodSeconds: 10 > successThreshold: 1 > timeoutSeconds: 1 - kubelet readiness check should probe '/readyz' endpoint, not '/healthz' https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/openshift-apiserver/deploy.yaml#L128 - with default 'periodSeconds=10s' and 'failureThreshold=10', it will take '100s' for kubelet to set ready=false. Once the pod is patched with 'ready=false' the endpoints controller will rotate the Pod IP out of the Service. we can set 'failureThreshold' to '1' - kubelet will take '10s' in the worst case to set ready=false - do we need a startup probe? we also have these related settings: > shutdown-delay-duration: > - 10s # give SDN some time to converge > shutdown-send-retry-after: > - "true" https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/config/defaultconfig.yaml#L17-L20 In the openshift-apiserver namespace, we do an oc get -n openshift-apiserver pods -w -ojson and then in another window we delete one of the openshift-apiserver pods. the pod's container status before deletion indicated readiness=false. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-27-140854 True False 23m Cluster version is 4.11.0-0.nightly-2022-03-27-140854 oc get pods -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-7dd8d5695f-662cj 2/2 Running 0 34m apiserver-7dd8d5695f-cshxc 2/2 Running 0 36m apiserver-7dd8d5695f-mgb5f 2/2 Running 0 36m In one terminal we do watcher for below command and other deleted one OAS pod "oc delete pod/apiserver-7dd8d5695f-cshxc -n openshift-apiserver" before completion of deleting pod status indicated readiness=false. oc get -n openshift-apiserver pods -w -ojson }, "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:27:52Z", "message": "0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't match pod anti-affinity rules.", "reason": "Unschedulable", "status": "False", "type": "PodScheduled" } ], "phase": "Pending", "qosClass": "Burstable" } oc get -n openshift-apiserver pods -w -ojson "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:35Z", "status": "True", "type": "Initialized" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:32Z", "message": "containers with unready status: [openshift-apiserver]", "reason": "ContainersNotReady", "status": "False", "type": "Ready" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:32Z", "message": "containers with unready status: [openshift-apiserver]", "reason": "ContainersNotReady", "status": "False", "type": "ContainersReady" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:32Z", "status": "True", "type": "PodScheduled" } ], "containerStatuses": [ { "containerID": "cri-o://2c9a6c7a9d97f8b774f62c0763eb92b9bdeda3bf71a26d322b5115706bbb64ab", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "lastState": {}, "name": "openshift-apiserver", "ready": false, "restartCount": 0, "started": true, "state": { "running": { "startedAt": "2022-03-29T06:44:35Z" } } }, Termination grace period is 90s as in PR. ], "nodeName": "ip-10-0-205-119.us-east-2.compute.internal", "nodeSelector": { "node-role.kubernetes.io/master": "" }, "preemptionPolicy": "PreemptLowerPriority", "priority": 2000001000, "priorityClassName": "system-node-critical", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "openshift-apiserver-sa", "serviceAccountName": "openshift-apiserver-sa", "terminationGracePeriodSeconds": 90, "tolerations": [ { After deletion and restart, container status readiness: true "status": { "conditions": [ { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:35Z", "status": "True", "type": "Initialized" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:42Z", "status": "True", "type": "Ready" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:42Z", "status": "True", "type": "ContainersReady" }, { "lastProbeTime": null, "lastTransitionTime": "2022-03-29T06:44:32Z", "status": "True", "type": "PodScheduled" } ], "containerStatuses": [ { "containerID": "cri-o://2c9a6c7a9d97f8b774f62c0763eb92b9bdeda3bf71a26d322b5115706bbb64ab", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "lastState": {}, "name": "openshift-apiserver", "ready": true, "restartCount": 0, "started": true, "state": { "running": { "startedAt": "2022-03-29T06:44:35Z" } } }, { "containerID": "cri-o://006f837b616086325026119c7b48e7ede4761f0d65396c97f1cdd7b67522707b", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ccd4c61eed4b6dca55f94aa7148766bc2e2ef3682a6607e71ce3ae6331997cb", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ccd4c61eed4b6dca55f94aa7148766bc2e2ef3682a6607e71ce3ae6331997cb", "lastState": {}, "name": "openshift-apiserver-check-endpoints", "ready": true, "restartCount": 0, "started": true, "state": { "running": { "startedAt": "2022-03-29T06:44:35Z" } } } ], "hostIP": "10.0.205.119", "initContainerStatuses": [ { "containerID": "cri-o://04563c455ea5ed13ff43cad0bc345adea4c56206e193099303a723b8604cc107", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21", "lastState": {}, "name": "fix-audit-permissions", "ready": true, "restartCount": 0, "state": { "terminated": { "containerID": "cri-o://04563c455ea5ed13ff43cad0bc345adea4c56206e193099303a723b8604cc107", "exitCode": 0, "finishedAt": "2022-03-29T06:44:34Z", "reason": "Completed", "startedAt": "2022-03-29T06:44:34Z" } } } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |