Bug 2051785

Summary: startupProbe behavior changed between 4.6.40 and 4.6.53
Product: OpenShift Container Platform Reporter: John McMeeking <jmcmeek>
Component: NodeAssignee: Sai Ramesh Vanka <svanka>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: medium CC: harpatil, jkaur, jmclemor, openshift-bugzilla-robot, rholbroo, rkant, skant, sparpate, svanka, vshrivas
Version: 4.6Flags: svanka: needinfo-
svanka: needinfo-
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-24 01:47:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2062647    
Bug Blocks:    

Description John McMeeking 2022-02-08 02:54:07 UTC
Description of problem:

Behavior of pod startup Probe changed between 4.6.40 and 4.6.53. In 4.6.40 the startupProbe applies to each container start. In 4.6.53 the startupProbe appears to be used only during the initial pod startup.

In our Openshift Cluster 4.6 we experienced a CrashLoopBackOff error on a Pod.

Checking the reason we saw that the application was not able to start on time when restarted and the kubelet declered not live the pod starting a loop that was solved only Terminating the pod.

Based on this issue we started and investigation on pod and on configurations.
We have an Openshift Cluster where there are old worker nodes (4.6.40) and new worker nodes (4.6.53), the application that failed was hosted on a new node (4.6.53).

We create a test with the aim to check if the node with different version has a different behavior. We used the same pod with the same YAML configuration with the same Probe Policy and we saw these results:

- Pod on node 4.6.40, after restart was able to start properly within 2 minutes
- Pod on node 4.6.53, after restart was not able to start properly and it failed after 30 seconds and the CrashLoopBackOff issue occur.

Considering the stack of 4.6.40 and 4.6.53 we saw that the 4.6.40 use a Kubelet version 1.19.0 the 4.6.53 use a Kubelet version 1.19.16.

Result:

- Pod on node 4.6.40 consider the StartupProbe configuration on the first start and on restart scenario
- Pod on node 4.6.53 consider the StartupProbe configuration only on first start. During the restart the liveness probe is used instead of Startup one.


Version-Release number of selected component (if applicable):

4.6.53


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John McMeeking 2022-02-08 03:03:49 UTC
I believe this is related to
Kubernetes issue https://github.com/kubernetes/kubernetes/issues/101064
Fixed in https://github.com/kubernetes/kubernetes/pull/101093
Cherry picked to 1.19 for 1.19.11

https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/kubelet/prober/worker.go

	if c.Started != nil && *c.Started {
		// Stop probing for startup once container has started.
*		// we keep it running to make sure it will work for restarted container.
		if w.probeType == startup {
*			return true
		}

* = changed lines


It looks like OpenShift 4.6 has diverged from Kubernetes

	if c.Started != nil && *c.Started {
		// Stop probing for startup once container has started.
		// we keep it running to make sure it will work for restarted container.
		if w.probeType == startup {
*			return false
		}

* - The return value appears to be from UPSTREAM: 98571: kubelet: Stop probing a pod during graceful shutdown

I don't know why these are different, but the comment in Kubernetes 98571 suggests "return false" is wrong.
https://github.com/kubernetes/kubernetes/pull/98571/files#r612993819

It looks like 4.7 and 4.8 follow Kubernetes - keeping "return true".  Should 4.6 also have that?

I also saw https://github.com/kubernetes/kubernetes/issues/107817. It looks like that Kubernetes issue may be specific to OpenShift.

I'll see if I can construct a simple recreate.

Comment 2 John McMeeking 2022-02-08 07:02:40 UTC
Recreate info:

$ oc version
Client Version: 4.7.0-202107292319.p0.git.8b4b094.assembly.stream-8b4b094
Server Version: 4.6.48
Kubernetes Version: v1.19.14+fcff70a

ster,worker   173m   v1.19.16+845f228
jmcmeek@jazzman-fedora:tmp$ oc get nodes -o wide
NAME           STATUS   ROLES           AGE    VERSION            INTERNAL-IP    EXTERNAL-IP      OS-IMAGE   KERNEL-VERSION                CONTAINER-RUNTIME
10.171.3.85    Ready    master,worker   173m   v1.19.16+845f228   10.171.3.85    169.47.196.147   Red Hat    3.10.0-1160.53.1.el7.x86_64   cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7
10.73.143.14   Ready    master,worker   173m   v1.19.16+845f228   10.73.143.14   169.62.204.157   Red Hat    3.10.0-1160.53.1.el7.x86_64   cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7
10.74.76.22    Ready    master,worker   173m   v1.19.16+845f228   10.74.76.22    169.61.144.210   Red Hat    3.10.0-1160.53.1.el7.x86_64   cri-o://1.19.4-5.rhaos4.6.git5d5a465.el7


startupprobe.yaml
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: startupprobe
  name: startupprobe
spec:
  containers:
  - name: main
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 2
    startupProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      failureThreshold: 4
      periodSeconds: 10
---


Kubelet.log references to 'startupprobe' - you can see "Startup probe" log entries on initial startup, and only liveness probes after that.

Feb  7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.104518    9619 kubelet.go:1861] SyncLoop (ADD, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.202326    9619 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96")
Feb  7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.304254    9619 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96")
Feb  7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.321331    9619 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-stmrd" (UniqueName: "kubernetes.io/secret/8655f345-c7fa-47a0-aea6-b5ba03903e96-default-token-stmrd") pod "startupprobe" (UID: "8655f345-c7fa-47a0-aea6-b5ba03903e96")
Feb  7 23:28:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:27.442151    9619 kuberuntime_manager.go:425] No sandbox for pod "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)" can be found. Need to start a new one
Feb  7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.228298    9619 kubelet.go:1868] SyncLoop (UPDATE, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.302561    9619 kubelet.go:1868] SyncLoop (UPDATE, "api"): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:28:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:28.620272    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"5b1c208a8803fca98c7eb9b2957d0c0ba8df387c26bfa74290a7bf1708aae2a8"}
Feb  7 23:28:29 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:29.625065    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"}
Feb  7 23:28:30 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:30.802225    9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:28:40 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:40.792235    9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:28:50 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:28:50.783528    9619 prober.go:117] Startup probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:30:57 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:30:57.907716    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.908279    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.908359    9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:31:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:02.909146    9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted
Feb  7 23:31:33 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:33.265429    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"9f15fa1eec62889b007f81b0b34131c215e8ace10996e9d52f93d7b8526af0e7"}
Feb  7 23:31:34 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:34.269267    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"}
Feb  7 23:31:42 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:42.912166    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.908706    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.908862    9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:31:47 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:31:47.909674    9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted
Feb  7 23:32:18 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:18.445490    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"41780690521ad727f2d93afc27a691f99bf9f4a4f1c5e4f8b1bea99054bda070"}
Feb  7 23:32:19 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:19.450459    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"}
Feb  7 23:32:27 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:27.907798    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:32:28 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: E0207 23:32:28.900479    9619 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/data/criorootstorage/overlay/b4c5dc405087fc7de2568aa35b86db4106c46f9fc479f0dd56809f1e527230df/diff" to get inode usage: stat /var/data/criorootstorage/overlay/b4c5dc405087fc7de2568aa35b86db4106c46f9fc479f0dd56809f1e527230df/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/default_startupprobe_8655f345-c7fa-47a0-aea6-b5ba03903e96/main/0.log" to get inode usage: stat /var/log/pods/default_startupprobe_8655f345-c7fa-47a0-aea6-b5ba03903e96/main/0.log: no such file or directory
Feb  7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.912175    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.912307    9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:32:32 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:32:32.913403    9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted
Feb  7 23:33:03 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:03.647259    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"adef4cacc8256e64d2ef02180b26d94da7cd4cb9addf3718db7beb2ba0152ce4"}
Feb  7 23:33:04 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:04.652357    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"}
Feb  7 23:33:12 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:12.910146    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.904741    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.904831    9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"
Feb  7 23:33:17 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:17.905621    9619 kuberuntime_manager.go:642] Container "main" ({"cri-o" "31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"}) of pod startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96): Container main failed liveness probe, will be restarted
Feb  7 23:33:48 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:48.852535    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerDied", Data:"31cd0250a7059c564097b158fa1521e3239957a484b0f0e2e0e8b90cd4d2d877"}
Feb  7 23:33:48 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:48.852722    9619 kubelet.go:1899] SyncLoop (PLEG): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)", event: &pleg.PodLifecycleEvent{ID:"8655f345-c7fa-47a0-aea6-b5ba03903e96", Type:"ContainerStarted", Data:"c749c22a2b4c68d0d0dd26350d7a84ac2415ad64a285c13d4f2e836458569ecf"}
Feb  7 23:33:57 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:33:57.911589    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:34:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:34:02.913285    9619 prober.go:117] Liveness probe for "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  7 23:34:02 test-c80u9lk20u38j9jhp8f0-jmcmeek46-default-00000173 kubelet.service: I0207 23:34:02.914044    9619 kubelet.go:1932] SyncLoop (container unhealthy): "startupprobe_default(8655f345-c7fa-47a0-aea6-b5ba03903e96)"


In contrast, both Kubernetes v1.19.16 and OCP 4.7.41 (kubelet v1.20.11+e880017) run the startup probe on each restart:

Feb  8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.564886    6536 kubelet.go:1856] SyncLoop (ADD, "api"): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.571423    6536 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152")
Feb  8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.671990    6536 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152")
Feb  8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.689918    6536 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-npjfc" (UniqueName: "kubernetes.io/secret/4d3be00b-c847-42d7-8b28-a7f19d44e152-default-token-npjfc") pod "startupprobe" (UID: "4d3be00b-c847-42d7-8b28-a7f19d44e152")
Feb  8 05:12:47 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:47.880781    6536 kuberuntime_manager.go:425] No sandbox for pod "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)" can be found. Need to start a new one
Feb  8 05:12:48 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:48.500357    6536 kubelet.go:1863] SyncLoop (UPDATE, "api"): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:12:49 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:49.400457    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"}
Feb  8 05:12:49 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:49.400558    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"2e0c73af266a7bfbe56bdf19c5581390eb2bbccfa3dcae6382b08d3d0ecbd31a"}
Feb  8 05:12:55 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:12:55.336806    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:13:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:13:05.332530    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:15:18 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:18.374024    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.389873    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.390393    6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:15:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:23.391062    6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted
Feb  8 05:15:53 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:53.782526    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"80200352c3ac03cb503a2ccb462f370812137f9a382fc17ca514c0a37f18fac8"}
Feb  8 05:15:54 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:54.787175    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"}
Feb  8 05:15:55 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:15:55.331834    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:16:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:16:05.335045    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:16:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:16:15.328706    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:18:23 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:23.389542    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.384427    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.384972    6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:18:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:28.385814    6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted
Feb  8 05:18:59 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:59.172611    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"648bdf18a2bedee8530a9264639451f9603941c5cb6c3e36872b2bb7c93f3bb2"}
Feb  8 05:18:59 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:18:59.172779    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"}
Feb  8 05:19:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:19:05.334879    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:19:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:19:15.331573    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:21:28 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:28.390913    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.383124    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.383234    6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:21:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:21:33.384030    6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted
Feb  8 05:22:03 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:03.575056    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"d2a3869484b628ad50a3919131c392ce071dcdf7ef990bb7ffe77333377ffefb"}
Feb  8 05:22:04 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:04.580617    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"}
Feb  8 05:22:05 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:05.327255    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:22:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:15.328658    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:22:25 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:22:25.321413    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:24:33 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:33.379753    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.384108    6536 prober.go:117] Liveness probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.384351    6536 kubelet.go:1927] SyncLoop (container unhealthy): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)"
Feb  8 05:24:38 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:24:38.385398    6536 kuberuntime_manager.go:642] Container "main" ({"containerd" "52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"}) of pod startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152): Container main failed liveness probe, will be restarted
Feb  8 05:25:08 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:08.999206    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerDied", Data:"52fda93a610ecac0fdd82a8cbfffd513b091ca7ed716224f28ace1fc4c67d2d8"}
Feb  8 05:25:08 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:08.999347    6536 kubelet.go:1894] SyncLoop (PLEG): "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152)", event: &pleg.PodLifecycleEvent{ID:"4d3be00b-c847-42d7-8b28-a7f19d44e152", Type:"ContainerStarted", Data:"f5ec2620cc49e88d2551d732e9ce9acfab96d2c4a7b81dafb57f11f504ded834"}
Feb  8 05:25:15 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:15.337917    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory
Feb  8 05:25:25 test-c80uajm20hquk538qjtg-jmcmeek119-default-00000111 kubelet.service[6536]: I0208 05:25:25.321844    6536 prober.go:117] Startup probe for "startupprobe_default(4d3be00b-c847-42d7-8b28-a7f19d44e152):main" failed (failure): cat: can't open '/tmp/healthy': No such file or directory

Comment 3 Joe M 2022-03-01 18:09:10 UTC
Hello team,I'm touching base to see if there have been any updates that can be shared with us. This is a high priority for the customer because of the potential risk it poses to their security.

Comment 4 Sai Ramesh Vanka 2022-03-07 13:35:20 UTC
Hello @jmclemor.com , Sorry for the delay as I've been consumed by some priority work. I'm working on it and would update you soon.

Comment 5 John McMeeking 2022-03-07 23:21:52 UTC
@svanka Thank you. We'll look forward to hearing from you.

Comment 7 Sai Ramesh Vanka 2022-03-08 08:23:19 UTC
Update:

Opened a backport PR, waiting for the approval.
https://github.com/openshift/kubernetes/pull/1207

Thanks,
Ramesh

Comment 10 Harshal Patil 2022-03-10 10:19:15 UTC
*** Bug 2061652 has been marked as a duplicate of this bug. ***

Comment 12 Ryan Holbrook 2022-03-11 01:03:01 UTC
The CU is pressing for an ETA, and I see the backport PR was approved.  Do we have an idea when this will roll out yet?  Hoping to give the customer an estimate.

Comment 14 Sai Ramesh Vanka 2022-03-11 02:53:06 UTC
Waiting for the cherry-pick-approval label.
Reference: https://github.com/openshift/kubernetes/pull/1207#issuecomment-1064714952

Comment 15 Varsha Shrivastava 2022-03-11 04:15:06 UTC
@svanka Cherry pick approval label seems to be in place now, could you please help with some ETA and next steps. We need to update customer accordingly

Comment 16 Sai Ramesh Vanka 2022-03-11 09:45:01 UTC
Hello @vshrivas , The fix has been merged.

Comment 18 Sunil Choudhary 2022-03-11 12:28:29 UTC
The PR is merged and is now included in new nightly build  4.6.0-0.nightly-2022-03-11-104423

Created pods in 2 separate projects. I see both pods change back to Running after they go into CrashLoopBackOff during restart.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2022-03-11-104423   True        False         3m12s   Cluster version is 4.6.0-0.nightly-2022-03-11-104423


$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-79.us-east-2.compute.internal    Ready    worker   19m   v1.19.16+d5ccddc
ip-10-0-148-199.us-east-2.compute.internal   Ready    master   24m   v1.19.16+d5ccddc
ip-10-0-166-218.us-east-2.compute.internal   Ready    worker   19m   v1.19.16+d5ccddc
ip-10-0-177-241.us-east-2.compute.internal   Ready    master   24m   v1.19.16+d5ccddc
ip-10-0-196-115.us-east-2.compute.internal   Ready    worker   19m   v1.19.16+d5ccddc
ip-10-0-213-111.us-east-2.compute.internal   Ready    master   24m   v1.19.16+d5ccddc

$ cat startupprobe.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: startupprobe
  name: startupprobe
spec:
  containers:
  - name: main
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 2
    startupProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      failureThreshold: 4
      periodSeconds: 10

$ oc create -f startupprobe.yaml 
pod/startupprobe created


$ oc get pods
NAME           READY   STATUS              RESTARTS   AGE
startupprobe   0/1     ContainerCreating   0          3s


$ oc get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
startupprobe   0/1     Running   0          52s   10.129.2.13   ip-10-0-196-115.us-east-2.compute.internal   <none>           <none>


$ oc get pods -o wide -n app
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
startupprobe   1/1     Running   12         31m   10.129.2.13   ip-10-0-196-115.us-east-2.compute.internal   <none>           <none>


$ oc get pods -o wide -n app1
NAME           READY   STATUS    RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
startupprobe   1/1     Running   6          8m33s   10.131.0.25   ip-10-0-166-218.us-east-2.compute.internal   <none>           <none>


I will run same test on another cluster and will add results here.

Comment 20 Sunil Choudhary 2022-03-11 14:19:46 UTC
Also checked on other cluster on Azure.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2022-03-11-104423   True        False         44m     Cluster version is 4.6.0-0.nightly-2022-03-11-104423

$ oc get nodes
NAME                                           STATUS   ROLES    AGE   VERSION
sunilc46-z-pl7bx-master-0                      Ready    master   75m   v1.19.16+d5ccddc
sunilc46-z-pl7bx-master-1                      Ready    master   75m   v1.19.16+d5ccddc
sunilc46-z-pl7bx-master-2                      Ready    master   74m   v1.19.16+d5ccddc
sunilc46-z-pl7bx-worker-northcentralus-4ljc4   Ready    worker   59m   v1.19.16+d5ccddc
sunilc46-z-pl7bx-worker-northcentralus-dkf6v   Ready    worker   59m   v1.19.16+d5ccddc
sunilc46-z-pl7bx-worker-northcentralus-jfcvm   Ready    worker   59m   v1.19.16+d5ccddc

$ cat startupprobe.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: startupprobe
  name: startupprobe
spec:
  containers:
  - name: main
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - rm -rf /tmp/healthy; sleep 25; touch /tmp/healthy; sleep 120; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 2
    startupProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      failureThreshold: 4
      periodSeconds: 10


$ oc create -f startupprobe.yaml 
pod/startupprobe created

$ oc get pods
NAME           READY   STATUS              RESTARTS   AGE
startupprobe   0/1     ContainerCreating   0          3s


$ oc get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
startupprobe   0/1     Running   0          30s   10.128.2.20   sunilc46-z-pl7bx-worker-northcentralus-dkf6v   <none>           <none>


$ oc get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE                                           NOMINATED NODE   READINESS GATES
startupprobe   1/1     Running   10         24m   10.128.2.20   sunilc46-z-pl7bx-worker-northcentralus-dkf6v   <none>           <none>

Comment 21 John McMeeking 2022-03-11 18:01:22 UTC
I was looking at the test results and realized that while I a gave way to recreate the problem, it might rely more on looking at logs or events to tell what was happening than I realized.  I happened to catch the test on my 4.6 cluster looking a lot like your results.  I caught it in crashloopbackoff a little later.

From a test perspective, the pod events look more reliable:

On 4.6 I see this in the "describe pod" Events:
  Warning  BackOff         4m7s (x93 over 27m)  kubelet            Back-off restarting failed container

and I don't see that on my 4.8 cluster.

I think the fix is good based on my initial work.  But if there is a way I can put this on my 4.6 nodes I'd be happy to add my confirmation as well.

Comment 26 Scott Dodson 2022-08-24 01:47:40 UTC
This was addressed in 4.6.56 however the errata automation failed to close this bug as it should have.