Bug 1770017

Summary:	Init containers restart when the exited container is removed from node.
Product:	OpenShift Container Platform	Reporter:	Ryan Howe <rhowe>
Component:	Node	Assignee:	Joel Smith <joelsmith>
Node sub component:	Kubelet	QA Contact:	MinLi <minmli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	antgarci, aos-bugs, bjarvis, dahernan, dcbw, erich, joelsmith, jokerman, mfojtik, minmli, rphillips, ruchi.sharma6, sjenning, sreber, stwalter, tsweeney
Version:	3.11.0
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-24 15:10:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1907461

Description Ryan Howe 2019-11-07 22:18:37 UTC

Description of problem:
Init containers restart when the exited init container is removed from node by kubelet garbage collector. 

Version-Release number of selected component (if applicable):
3.11


How reproducible:
100%

Steps to Reproduce:
1. Create pod

   # oc create -f [1]
   pod/init-test created

   #  oc exec  init-test cat /mnt/data/count
   1

   

2. Set node garbage collection low to trigger cleanup faster

 ```
 kubeletArguments:
  minimum-container-ttl-duration:
  - "10s"
  maximum-dead-containers-per-container:
  - "0"
  maximum-dead-containers:
  - "0"
  ```
        
or remove the exited init container manually with `docker rm`
     
    # docker ps -a | grep init
e4978a8bb28a        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                   "/bin/sh -ec 'cat ..."   2 minutes ago       Up 2 minutes                                   k8s_my-container_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0
94fb06cc88b0        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                   "/bin/bash -c '#!/..."   2 minutes ago       Exited (0) 2 minutes ago                       k8s_inittest_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0
d5f1053ffc23        registry.redhat.io/openshift3/ose-pod:v3.11.153                                                                                         "/usr/bin/pod"           2 minutes ago       Up 2 minutes     

 # docker rm 94fb06cc88b0
 

Actual results:
- init container continues to restart due to being cleaned up by kubelet. 

#  oc exec init-test cat /mnt/data/count
2
# oc exec init-test cat /mnt/data/count
3
...
...

- Main container never gets restarted:

# docker ps -a | grep init
9d4605b3d6cf        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                   "/bin/bash -c '#!/..."   3 minutes ago       Exited (0) 3 minutes ago                       k8s_inittest_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0
e4978a8bb28a        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                   "/bin/sh -ec 'cat ..."   6 minutes ago       Up 6 minutes                                   k8s_my-container_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0
d5f1053ffc23        registry.redhat.io/openshift3/ose-pod:v3.11.153                                                                                         "/usr/bin/pod"           6 minutes ago       Up 6 minutes                                   k8s_POD_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0


Expected results:

init container to only run when pod is restarted or first scheduled to node. 


Additional info:

[1] 

oc create -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  initContainers:
  - name: inittest 
    image: "registry.redhat.io/rhel7/rhel"
    command: ["bin/sh", "-ec", "echo running >> /mnt/data/test"]
    volumeMounts:
    - name: my-volume
      mountPath: /mnt/data
  containers:
  - name: my-container
    image: "registry.redhat.io/rhel7/rhel"
    command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"]
    volumeMounts:
    - mountPath: /mnt/data
      name: my-volume
  volumes:
  - name: my-volume
    emptyDir: {}
EOF

Comment 1 Ryan Howe 2019-11-07 22:43:08 UTC

Further if the init container fails when rerun due to already executing on this process it  can cause the pod the pod to appear to be in a pad status as the init container is restarting in a loop. 


1. Create pod

apiVersion: v1
kind: Pod
metadata:
  name: init-fail-test
spec:
  initContainers:
  - name: inittest
    image: "registry.redhat.io/rhel7/rhel"
    command:
    - /bin/bash
    - -c
    - |
      #!/bin/bash
      set -euo pipefail
      file=/mnt/data/count
      if [[ -f "${file}" ]]; then
        count=$(<"${file}")
        expr $count + 1 > $file
        echo "Init Container has run ${count} times"
        exit 1
      fi
      echo 1 > ${file}
    volumeMounts:
    - name: my-volume
      mountPath: /mnt/data
  containers:
  - name: my-container
    image: "registry.redhat.io/rhel7/rhel"
    command: ["/bin/sh", "-ec", "while true; do sleep 30; cat /mnt/data/count; done;"]
    volumeMounts:
    - mountPath: /mnt/data
      name: my-volume
  volumes:
  - name: my-volume
    emptyDir: {}


2. Wait for kubelet to clean up init container.



$ oc get pods
NAME             READY     STATUS                  RESTARTS   AGE
init-fail-test   0/1       Init:CrashLoopBackOff   4          6m


$ oc logs init-fail-test
1
1
1
1
2
4
5
5
6
6
6
7


3. We see that the pod and main container never get restarted, but the init container is looping and increasing the restart count on the pod. 

# docker ps -a | grep init
cdaeb00eef23        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                           "/bin/bash -c '#!/..."   12 seconds ago      Exited (1) 4 seconds ago                        k8s_inittest_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_5
918fabe7c867        registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03                                           "/bin/sh -ec 'whil..."   7 minutes ago      Up 7 minutes                                   k8s_my-container_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0
b768406fb266        registry.redhat.io/openshift3/ose-pod:v3.11.153                                                                                                 "/usr/bin/pod"           7 minutes ago      Up 7 minutes                                   k8s_POD_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0

Comment 13 Joel Smith 2020-06-21 03:05:37 UTC

Hi Tony,
Thanks for checking in on this. We would like to fix it but the fix may be complicated and difficult to backport to 3.11. I'm planning to spend some time working on a fix for 4.6, then assessing feasibility of a backport to 3.11.z.

Comment 16 Ryan Howe 2020-08-06 21:28:42 UTC

This issue is hit with Kubelet GC, but can be reproduced with any dead container cleanup.

Comment 17 Ruchi 2020-08-07 11:33:36 UTC

Can you please let me know if this bug is fixed? If yes, in which OC release?

Comment 18 Ryan Howe 2020-08-11 17:07:33 UTC

Interesting side effect of this, is that kubelet controlled volume mounts like /etc/host get remounted, so any changes to this file get reverted. Not sure what the effect of this is with other volumes. 

==================================
# oc create -f - << EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: my-pod
    spec:
      initContainers:
      - name: inittest
        image: "registry.redhat.io/rhel7/rhel"
        command: ["bin/sh", "-ec", "echo running >> /mnt/data/test"]
        volumeMounts:
        - name: my-volume
          mountPath: /mnt/data
      containers:
      - name: my-container
        image: "registry.redhat.io/rhel7/rhel"
        command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"]
        volumeMounts:
        - mountPath: /mnt/data
          name: my-volume
      volumes:
      - name: my-volume
        emptyDir: {}
    EOF
     
    [root@node-2 quicklab]# docker ps -a | grep my-pod
    faa9efd5dae8        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "/bin/sh -ec 'ls /..."   46 seconds ago       Up 45 seconds                                   k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    8430f48640f9        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "bin/sh -ec 'echo ..."   49 seconds ago       Exited (0) 48 seconds ago                       k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    c6a2f318e010        registry.redhat.io/openshift3/ose-pod:v3.11.248                                                                                  "/usr/bin/pod"           About a minute ago   Up About a minute                               k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
     
    [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/sh
    sh-4.2# echo "1.1.1.1 test" >> /etc/hosts
    sh-4.2# exit
     
    [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts
    # Kubernetes-managed hosts file.
    127.0.0.1       localhost
    ::1     localhost ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    fe00::0 ip6-mcastprefix
    fe00::1 ip6-allnodes
    fe00::2 ip6-allrouters
    10.129.2.4      my-pod
    1.1.1.1 test
     
    [root@node-2 quicklab]# docker rm 8430f48640f9
    8430f48640f9
     
    [root@node-2 quicklab]# docker ps -a | grep my-pod
    faa9efd5dae8        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "/bin/sh -ec 'ls /..."   3 minutes ago       Up 3 minutes                            k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    c6a2f318e010        registry.redhat.io/openshift3/ose-pod:v3.11.248                                                                                  "/usr/bin/pod"           3 minutes ago       Up 3 minutes                            k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
     
    [root@node-2 quicklab]# docker ps -a | grep my-pod
    faa9efd5dae8        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "/bin/sh -ec 'ls /..."   3 minutes ago       Up 3 minutes                            k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    c6a2f318e010        registry.redhat.io/openshift3/ose-pod:v3.11.248                                                                                  "/usr/bin/pod"           4 minutes ago       Up 4 minutes                            k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
     
    [root@node-2 quicklab]# docker ps -a | grep my-pod
    f7e5a172329d        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "bin/sh -ec 'echo ..."   2 seconds ago       Exited (0) Less than a second ago                       k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    faa9efd5dae8        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "/bin/sh -ec 'ls /..."   3 minutes ago       Up 3 minutes                                            k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    c6a2f318e010        registry.redhat.io/openshift3/ose-pod:v3.11.248                                                                                  "/usr/bin/pod"           4 minutes ago       Up 4 minutes                                            k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    [root@node-2 quicklab]# docker ps -a | grep my-pod
    f7e5a172329d        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "bin/sh -ec 'echo ..."   7 seconds ago       Exited (0) 6 seconds ago                       k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    faa9efd5dae8        registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a                            "/bin/sh -ec 'ls /..."   3 minutes ago       Up 3 minutes                                   k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
    c6a2f318e010        registry.redhat.io/openshift3/ose-pod:v3.11.248                                                                                  "/usr/bin/pod"           4 minutes ago       Up 4 minutes                                   k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
     
    [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts
    # Kubernetes-managed hosts file.
    127.0.0.1       localhost
    ::1     localhost ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    fe00::0 ip6-mcastprefix
    fe00::1 ip6-allnodes
    fe00::2 ip6-allrouters
    10.129.2.4      my-pod


=============================

Comment 19 Joel Smith 2020-08-11 23:09:02 UTC

To Ruchi, I'm sorry, this bug is not fixed yet.

To Ryan,

/etc/hosts is managed by the Kubelet and it is rewritten every time a container in the pod is started. That's what this comment in the file means:

# Kubernetes-managed hosts file.

A pod shouldn't have any expectation that any changes to that file will remain. If a pod needs to add entries to that file, it should use hostAliases.

More info about that in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1860201#c12

Comment 39 MinLi 2021-01-06 10:14:49 UTC

verified on version : 4.7.0-0.nightly-2021-01-06-055910

removed the exited init container, and the init container didn't restart any more.

Comment 44 errata-xmlrpc 2021-02-24 15:10:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 45 Red Hat Bugzilla 2023-09-15 00:19:32 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days