Bug 1770017
Summary: | Init containers restart when the exited container is removed from node. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
Component: | Node | Assignee: | Joel Smith <joelsmith> |
Node sub component: | Kubelet | QA Contact: | MinLi <minmli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | antgarci, aos-bugs, bjarvis, dahernan, dcbw, erich, joelsmith, jokerman, mfojtik, minmli, rphillips, ruchi.sharma6, sjenning, sreber, stwalter, tsweeney |
Version: | 3.11.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:10:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1907461 |
Description
Ryan Howe
2019-11-07 22:18:37 UTC
Further if the init container fails when rerun due to already executing on this process it can cause the pod the pod to appear to be in a pad status as the init container is restarting in a loop. 1. Create pod apiVersion: v1 kind: Pod metadata: name: init-fail-test spec: initContainers: - name: inittest image: "registry.redhat.io/rhel7/rhel" command: - /bin/bash - -c - | #!/bin/bash set -euo pipefail file=/mnt/data/count if [[ -f "${file}" ]]; then count=$(<"${file}") expr $count + 1 > $file echo "Init Container has run ${count} times" exit 1 fi echo 1 > ${file} volumeMounts: - name: my-volume mountPath: /mnt/data containers: - name: my-container image: "registry.redhat.io/rhel7/rhel" command: ["/bin/sh", "-ec", "while true; do sleep 30; cat /mnt/data/count; done;"] volumeMounts: - mountPath: /mnt/data name: my-volume volumes: - name: my-volume emptyDir: {} 2. Wait for kubelet to clean up init container. $ oc get pods NAME READY STATUS RESTARTS AGE init-fail-test 0/1 Init:CrashLoopBackOff 4 6m $ oc logs init-fail-test 1 1 1 1 2 4 5 5 6 6 6 7 3. We see that the pod and main container never get restarted, but the init container is looping and increasing the restart count on the pod. # docker ps -a | grep init cdaeb00eef23 registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/bash -c '#!/..." 12 seconds ago Exited (1) 4 seconds ago k8s_inittest_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_5 918fabe7c867 registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/sh -ec 'whil..." 7 minutes ago Up 7 minutes k8s_my-container_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0 b768406fb266 registry.redhat.io/openshift3/ose-pod:v3.11.153 "/usr/bin/pod" 7 minutes ago Up 7 minutes k8s_POD_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0 Hi Tony, Thanks for checking in on this. We would like to fix it but the fix may be complicated and difficult to backport to 3.11. I'm planning to spend some time working on a fix for 4.6, then assessing feasibility of a backport to 3.11.z. This issue is hit with Kubelet GC, but can be reproduced with any dead container cleanup. Can you please let me know if this bug is fixed? If yes, in which OC release? Interesting side effect of this, is that kubelet controlled volume mounts like /etc/host get remounted, so any changes to this file get reverted. Not sure what the effect of this is with other volumes. ================================== # oc create -f - << EOF apiVersion: v1 kind: Pod metadata: name: my-pod spec: initContainers: - name: inittest image: "registry.redhat.io/rhel7/rhel" command: ["bin/sh", "-ec", "echo running >> /mnt/data/test"] volumeMounts: - name: my-volume mountPath: /mnt/data containers: - name: my-container image: "registry.redhat.io/rhel7/rhel" command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"] volumeMounts: - mountPath: /mnt/data name: my-volume volumes: - name: my-volume emptyDir: {} EOF [root@node-2 quicklab]# docker ps -a | grep my-pod faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 46 seconds ago Up 45 seconds k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 8430f48640f9 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 49 seconds ago Exited (0) 48 seconds ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" About a minute ago Up About a minute k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/sh sh-4.2# echo "1.1.1.1 test" >> /etc/hosts sh-4.2# exit [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.129.2.4 my-pod 1.1.1.1 test [root@node-2 quicklab]# docker rm 8430f48640f9 8430f48640f9 [root@node-2 quicklab]# docker ps -a | grep my-pod faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 3 minutes ago Up 3 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 [root@node-2 quicklab]# docker ps -a | grep my-pod faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 [root@node-2 quicklab]# docker ps -a | grep my-pod f7e5a172329d registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 2 seconds ago Exited (0) Less than a second ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 [root@node-2 quicklab]# docker ps -a | grep my-pod f7e5a172329d registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 7 seconds ago Exited (0) 6 seconds ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0 [root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.129.2.4 my-pod ============================= To Ruchi, I'm sorry, this bug is not fixed yet. To Ryan, /etc/hosts is managed by the Kubelet and it is rewritten every time a container in the pod is started. That's what this comment in the file means: # Kubernetes-managed hosts file. A pod shouldn't have any expectation that any changes to that file will remain. If a pod needs to add entries to that file, it should use hostAliases. More info about that in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1860201#c12 verified on version : 4.7.0-0.nightly-2021-01-06-055910 removed the exited init container, and the init container didn't restart any more. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |