Bug 2051785
Summary: | startupProbe behavior changed between 4.6.40 and 4.6.53 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | John McMeeking <jmcmeek> |
Component: | Node | Assignee: | Sai Ramesh Vanka <svanka> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | medium | CC: | harpatil, jkaur, jmclemor, openshift-bugzilla-robot, rholbroo, rkant, skant, sparpate, svanka, vshrivas |
Version: | 4.6 | Flags: | svanka:
needinfo-
svanka: needinfo- |
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-24 01:47:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2062647 | ||
Bug Blocks: |
Description
John McMeeking
2022-02-08 02:54:07 UTC
I believe this is related to Kubernetes issue https://github.com/kubernetes/kubernetes/issues/101064 Fixed in https://github.com/kubernetes/kubernetes/pull/101093 Cherry picked to 1.19 for 1.19.11 https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/kubelet/prober/worker.go if c.Started != nil && *c.Started { // Stop probing for startup once container has started. * // we keep it running to make sure it will work for restarted container. if w.probeType == startup { * return true } * = changed lines It looks like OpenShift 4.6 has diverged from Kubernetes if c.Started != nil && *c.Started { // Stop probing for startup once container has started. // we keep it running to make sure it will work for restarted container. if w.probeType == startup { * return false } * - The return value appears to be from UPSTREAM: 98571: kubelet: Stop probing a pod during graceful shutdown I don't know why these are different, but the comment in Kubernetes 98571 suggests "return false" is wrong. https://github.com/kubernetes/kubernetes/pull/98571/files#r612993819 It looks like 4.7 and 4.8 follow Kubernetes - keeping "return true". Should 4.6 also have that? I also saw https://github.com/kubernetes/kubernetes/issues/107817. It looks like that Kubernetes issue may be specific to OpenShift. I'll see if I can construct a simple recreate. Recreate info: $ oc version Client Version: 4.7.0-202107292319.p0.git.8b4b094.assembly.stream-8b4b094 Server Version: 4.6.48 Kubernetes Version: v1.19.14+fcff70a ster,worker 173m v1.19.16+845f228 jmcmeek@jazzman-fedora:tmp$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 10.171.3.85 Ready master,worker 173m v1.19.16+845f228 10.171.3.85 169.47.196.147 Red Hat 3.10.0-1160.53.1.el7.x86_64 cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7 10.73.143.14 Ready master,worker 173m v1.19.16+845f228 10.73.143.14 169.62.204.157 Red Hat 3.10.0-1160.53.1.el7.x86_64 cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7 10.74.76.22 Ready master,worker 173m v1.19.16+845f228 10.74.76.22 169.61.144.210 Red Hat 3.10.0-1160.53.1.el7.x86_64 cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7 startupprobe.yaml --- apiVersion: v1 kind: Pod metadata: labels: test: startupprobe name: startupprobe spec: containers: - name: main image: k8s.gcr.io/busybox args: - /bin/sh - -c - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 2 startupProbe: exec: command: - cat - /tmp/healthy failureThreshold: 4 periodSeconds: 10 --- Kubelet.log references to 'startupprobe' - you can see "Startup probe" log entries on initial startup, and only liveness probes after that. Feb 7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.104518 9619 kubelet.go:1861] SyncLoop (ADD, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.202326 9619 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96") Feb 7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.304254 9619 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96") Feb 7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.321331 9619 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96") Feb 7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.442151 9619 kuberuntime_manager.go:425] No sandbox for pod "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" can be found. Need to start a new one Feb 7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.228298 9619 kubelet.go:1868] SyncLoop (UPDATE, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.302561 9619 kubelet.go:1868] SyncLoop (UPDATE, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.620272 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"5b1c208a8803fca98c7eb9b2957d0c0ba8df387c26bfa74290a7bf1708aae2a8"} Feb 7 23:28:29 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:29.625065 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"} Feb 7 23:28:30 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:30.802225 9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:28:40 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:40.792235 9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:28:50 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:50.783528 9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:30:57 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:30:57.907716 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.908279 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.908359 9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.909146 9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted Feb 7 23:31:33 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:33.265429 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"} Feb 7 23:31:34 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:34.269267 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"} Feb 7 23:31:42 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:42.912166 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.908706 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.908862 9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.909674 9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted Feb 7 23:32:18 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:18.445490 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"} Feb 7 23:32:19 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:19.450459 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"} Feb 7 23:32:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:27.907798 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:32:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: E0207 23:32:28.900479 9619 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/data/criorootstorage/overlay/b4c5dc405087fc7de2568aa35b86db4106c46f9fc479f0dd56809f1e527230df/diff" to get inode usage: stat /var/data/criorootstorage/overlay/b4c5dc405087fc7de2568aa35b86db4106c46f9fc479f0dd56809f1e527230df/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/default_startupprobe_8655f345-c7fa-47a0-aea6-b5ba03903e96/main/0.log" to get inode usage: stat /var/log/pods/default_startupprobe_8655f345-c7fa-47a0-aea6-b5ba03903e96/main/0.log: no such file or directory Feb 7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.912175 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.912307 9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.913403 9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted Feb 7 23:33:03 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:03.647259 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"} Feb 7 23:33:04 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:04.652357 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"} Feb 7 23:33:12 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:12.910146 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.904741 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.904831 9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" Feb 7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.905621 9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted Feb 7 23:33:48 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:48.852535 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"} Feb 7 23:33:48 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:48.852722 9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"c749c22a2b4c68d0d0dd26350d7a84ac2415ad64a285c13d4f2e836458569ecf"} Feb 7 23:33:57 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:57.911589 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:34:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:34:02.913285 9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 7 23:34:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:34:02.914044 9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" In contrast, both Kubernetes v1.19.16 and OCP 4.7.41 (kubelet v1.20.11+e880017) run the startup probe on each restart: Feb 8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.564886 6536 kubelet.go:1856] SyncLoop (ADD, "api"): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.571423 6536 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152") Feb 8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.671990 6536 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152") Feb 8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.689918 6536 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152") Feb 8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.880781 6536 kuberuntime_manager.go:425] No sandbox for pod "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" can be found. Need to start a new one Feb 8 05:12:48 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:48.500357 6536 kubelet.go:1863] SyncLoop (UPDATE, "api"): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:12:49 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:49.400457 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"} Feb 8 05:12:49 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:49.400558 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"2e0c73af266a7bfbe56bdf19c5581390eb2bbccfa3dcae6382b08d3d0ecbd31a"} Feb 8 05:12:55 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:55.336806 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:13:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:13:05.332530 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:15:18 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:18.374024 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.389873 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.390393 6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.391062 6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted Feb 8 05:15:53 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:53.782526 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"} Feb 8 05:15:54 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:54.787175 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"} Feb 8 05:15:55 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:55.331834 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:16:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:16:05.335045 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:16:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:16:15.328706 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:18:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:23.389542 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.384427 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.384972 6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.385814 6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted Feb 8 05:18:59 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:59.172611 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"} Feb 8 05:18:59 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:59.172779 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"} Feb 8 05:19:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:19:05.334879 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:19:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:19:15.331573 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:21:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:28.390913 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.383124 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.383234 6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.384030 6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted Feb 8 05:22:03 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:03.575056 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"} Feb 8 05:22:04 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:04.580617 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"} Feb 8 05:22:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:05.327255 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:22:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:15.328658 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:22:25 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:25.321413 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:24:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:33.379753 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.384108 6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.384351 6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" Feb 8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.385398 6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted Feb 8 05:25:08 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:08.999206 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"} Feb 8 05:25:08 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:08.999347 6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"f5ec2620cc49e88d2551d732e9ce9acfab96d2c4a7b81dafb57f11f504ded834"} Feb 8 05:25:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:15.337917 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Feb 8 05:25:25 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:25.321844 6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory Hello team,I'm touching base to see if there have been any updates that can be shared with us. This is a high priority for the customer because of the potential risk it poses to their security. Hello @jmclemor.com , Sorry for the delay as I've been consumed by some priority work. I'm working on it and would update you soon. @svanka Thank you. We'll look forward to hearing from you. Update: Opened a backport PR, waiting for the approval. https://github.com/openshift/kubernetes/pull/1207 Thanks, Ramesh *** Bug 2061652 has been marked as a duplicate of this bug. *** The CU is pressing for an ETA, and I see the backport PR was approved. Do we have an idea when this will roll out yet? Hoping to give the customer an estimate. Waiting for the cherry-pick-approval label. Reference: https://github.com/openshift/kubernetes/pull/1207#issuecomment-1064714952 @svanka Cherry pick approval label seems to be in place now, could you please help with some ETA and next steps. We need to update customer accordingly Hello @vshrivas , The fix has been merged. The PR is merged and is now included in new nightly build 4.6.0-0.nightly-2022-03-11-104423 Created pods in 2 separate projects. I see both pods change back to Running after they go into CrashLoopBackOff during restart. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2022-03-11-104423 True False 3m12s Cluster version is 4.6.0-0.nightly-2022-03-11-104423 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-128-79.us-east-2.compute.internal Ready worker 19m v1.19.16+d5ccddc ip-10-0-148-199.us-east-2.compute.internal Ready master 24m v1.19.16+d5ccddc ip-10-0-166-218.us-east-2.compute.internal Ready worker 19m v1.19.16+d5ccddc ip-10-0-177-241.us-east-2.compute.internal Ready master 24m v1.19.16+d5ccddc ip-10-0-196-115.us-east-2.compute.internal Ready worker 19m v1.19.16+d5ccddc ip-10-0-213-111.us-east-2.compute.internal Ready master 24m v1.19.16+d5ccddc $ cat startupprobe.yaml apiVersion: v1 kind: Pod metadata: labels: test: startupprobe name: startupprobe spec: containers: - name: main image: k8s.gcr.io/busybox args: - /bin/sh - -c - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 2 startupProbe: exec: command: - cat - /tmp/healthy failureThreshold: 4 periodSeconds: 10 $ oc create -f startupprobe.yaml pod/startupprobe created $ oc get pods NAME READY STATUS RESTARTS AGE startupprobe 0/1 ContainerCreating 0 3s $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES startupprobe 0/1 Running 0 52s 10.129.2.13 ip-10-0-196-115.us-east-2.compute.internal <none> <none> $ oc get pods -o wide -n app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES startupprobe 1/1 Running 12 31m 10.129.2.13 ip-10-0-196-115.us-east-2.compute.internal <none> <none> $ oc get pods -o wide -n app1 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES startupprobe 1/1 Running 6 8m33s 10.131.0.25 ip-10-0-166-218.us-east-2.compute.internal <none> <none> I will run same test on another cluster and will add results here. Also checked on other cluster on Azure. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2022-03-11-104423 True False 44m Cluster version is 4.6.0-0.nightly-2022-03-11-104423 $ oc get nodes NAME STATUS ROLES AGE VERSION sunilc46-z-pl7bx-master-0 Ready master 75m v1.19.16+d5ccddc sunilc46-z-pl7bx-master-1 Ready master 75m v1.19.16+d5ccddc sunilc46-z-pl7bx-master-2 Ready master 74m v1.19.16+d5ccddc sunilc46-z-pl7bx-worker-northcentralus-4ljc4 Ready worker 59m v1.19.16+d5ccddc sunilc46-z-pl7bx-worker-northcentralus-dkf6v Ready worker 59m v1.19.16+d5ccddc sunilc46-z-pl7bx-worker-northcentralus-jfcvm Ready worker 59m v1.19.16+d5ccddc $ cat startupprobe.yaml apiVersion: v1 kind: Pod metadata: labels: test: startupprobe name: startupprobe spec: containers: - name: main image: k8s.gcr.io/busybox args: - /bin/sh - -c - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 2 startupProbe: exec: command: - cat - /tmp/healthy failureThreshold: 4 periodSeconds: 10 $ oc create -f startupprobe.yaml pod/startupprobe created $ oc get pods NAME READY STATUS RESTARTS AGE startupprobe 0/1 ContainerCreating 0 3s $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES startupprobe 0/1 Running 0 30s 10.128.2.20 sunilc46-z-pl7bx-worker-northcentralus-dkf6v <none> <none> $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES startupprobe 1/1 Running 10 24m 10.128.2.20 sunilc46-z-pl7bx-worker-northcentralus-dkf6v <none> <none> I was looking at the test results and realized that while I a gave way to recreate the problem, it might rely more on looking at logs or events to tell what was happening than I realized. I happened to catch the test on my 4.6 cluster looking a lot like your results. I caught it in crashloopbackoff a little later. From a test perspective, the pod events look more reliable: On 4.6 I see this in the "describe pod" Events: Warning BackOff 4m7s (x93 over 27m) kubelet Back-off restarting failed container and I don't see that on my 4.8 cluster. I think the fix is good based on my initial work. But if there is a way I can put this on my 4.6 nodes I'd be happy to add my confirmation as well. This was addressed in 4.6.56 however the errata automation failed to close this bug as it should have. |