Bug 2066886
| Summary: | openshift-apiserver pods never going NotReady | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> | |
| Component: | openshift-apiserver | Assignee: | Abu Kashem <akashem> | |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Gangwar <rgangwar> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.11 | CC: | akashem, mfojtik, sanchezl | |
| Target Milestone: | --- | |||
| Target Release: | 4.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2109235 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-10 10:55:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2109235 | |||
|
Description
Devan Goodwin
2022-03-22 16:54:54 UTC
readiness probe with default values (from the Pod object) > readinessProbe: > failureThreshold: 10 > httpGet: > path: healthz > port: 8443 > scheme: HTTPS > periodSeconds: 10 > successThreshold: 1 > timeoutSeconds: 1 - kubelet readiness check should probe '/readyz' endpoint, not '/healthz' https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/openshift-apiserver/deploy.yaml#L128 - with default 'periodSeconds=10s' and 'failureThreshold=10', it will take '100s' for kubelet to set ready=false. Once the pod is patched with 'ready=false' the endpoints controller will rotate the Pod IP out of the Service. we can set 'failureThreshold' to '1' - kubelet will take '10s' in the worst case to set ready=false - do we need a startup probe? we also have these related settings: > shutdown-delay-duration: > - 10s # give SDN some time to converge > shutdown-send-retry-after: > - "true" https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/config/defaultconfig.yaml#L17-L20 In the openshift-apiserver namespace, we do an oc get -n openshift-apiserver pods -w -ojson and then in another window we delete one of the openshift-apiserver pods. the pod's container status before deletion indicated readiness=false.
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-03-27-140854 True False 23m Cluster version is 4.11.0-0.nightly-2022-03-27-140854
oc get pods -n openshift-apiserver
NAME READY STATUS RESTARTS AGE
apiserver-7dd8d5695f-662cj 2/2 Running 0 34m
apiserver-7dd8d5695f-cshxc 2/2 Running 0 36m
apiserver-7dd8d5695f-mgb5f 2/2 Running 0 36m
In one terminal we do watcher for below command and other deleted one OAS pod "oc delete pod/apiserver-7dd8d5695f-cshxc -n openshift-apiserver" before completion of deleting pod status indicated readiness=false.
oc get -n openshift-apiserver pods -w -ojson
},
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:27:52Z",
"message": "0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't match pod anti-affinity rules.",
"reason": "Unschedulable",
"status": "False",
"type": "PodScheduled"
}
],
"phase": "Pending",
"qosClass": "Burstable"
}
oc get -n openshift-apiserver pods -w -ojson
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:35Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:32Z",
"message": "containers with unready status: [openshift-apiserver]",
"reason": "ContainersNotReady",
"status": "False",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:32Z",
"message": "containers with unready status: [openshift-apiserver]",
"reason": "ContainersNotReady",
"status": "False",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:32Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "cri-o://2c9a6c7a9d97f8b774f62c0763eb92b9bdeda3bf71a26d322b5115706bbb64ab",
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"lastState": {},
"name": "openshift-apiserver",
"ready": false,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-03-29T06:44:35Z"
}
}
},
Termination grace period is 90s as in PR.
],
"nodeName": "ip-10-0-205-119.us-east-2.compute.internal",
"nodeSelector": {
"node-role.kubernetes.io/master": ""
},
"preemptionPolicy": "PreemptLowerPriority",
"priority": 2000001000,
"priorityClassName": "system-node-critical",
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "openshift-apiserver-sa",
"serviceAccountName": "openshift-apiserver-sa",
"terminationGracePeriodSeconds": 90,
"tolerations": [
{
After deletion and restart, container status readiness: true
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:35Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:42Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:42Z",
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-03-29T06:44:32Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "cri-o://2c9a6c7a9d97f8b774f62c0763eb92b9bdeda3bf71a26d322b5115706bbb64ab",
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"lastState": {},
"name": "openshift-apiserver",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-03-29T06:44:35Z"
}
}
},
{
"containerID": "cri-o://006f837b616086325026119c7b48e7ede4761f0d65396c97f1cdd7b67522707b",
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ccd4c61eed4b6dca55f94aa7148766bc2e2ef3682a6607e71ce3ae6331997cb",
"imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ccd4c61eed4b6dca55f94aa7148766bc2e2ef3682a6607e71ce3ae6331997cb",
"lastState": {},
"name": "openshift-apiserver-check-endpoints",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-03-29T06:44:35Z"
}
}
}
],
"hostIP": "10.0.205.119",
"initContainerStatuses": [
{
"containerID": "cri-o://04563c455ea5ed13ff43cad0bc345adea4c56206e193099303a723b8604cc107",
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9be21be1b69f360f0d79a2c1ec6cac7d3519d80e76f46f7f1c02868bfffcda21",
"lastState": {},
"name": "fix-audit-permissions",
"ready": true,
"restartCount": 0,
"state": {
"terminated": {
"containerID": "cri-o://04563c455ea5ed13ff43cad0bc345adea4c56206e193099303a723b8604cc107",
"exitCode": 0,
"finishedAt": "2022-03-29T06:44:34Z",
"reason": "Completed",
"startedAt": "2022-03-29T06:44:34Z"
}
}
}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |