Bug 1770017
| Summary: | Init containers restart when the exited container is removed from node. | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
| Component: | Node | Assignee: | Joel Smith <joelsmith> |
| Node sub component: | Kubelet | QA Contact: | MinLi <minmli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | antgarci, aos-bugs, bjarvis, dahernan, dcbw, erich, joelsmith, jokerman, mfojtik, minmli, rphillips, ruchi.sharma6, sjenning, sreber, stwalter, tsweeney |
| Version: | 3.11.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:10:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1907461 | ||
Further if the init container fails when rerun due to already executing on this process it can cause the pod the pod to appear to be in a pad status as the init container is restarting in a loop.
1. Create pod
apiVersion: v1
kind: Pod
metadata:
name: init-fail-test
spec:
initContainers:
- name: inittest
image: "registry.redhat.io/rhel7/rhel"
command:
- /bin/bash
- -c
- |
#!/bin/bash
set -euo pipefail
file=/mnt/data/count
if [[ -f "${file}" ]]; then
count=$(<"${file}")
expr $count + 1 > $file
echo "Init Container has run ${count} times"
exit 1
fi
echo 1 > ${file}
volumeMounts:
- name: my-volume
mountPath: /mnt/data
containers:
- name: my-container
image: "registry.redhat.io/rhel7/rhel"
command: ["/bin/sh", "-ec", "while true; do sleep 30; cat /mnt/data/count; done;"]
volumeMounts:
- mountPath: /mnt/data
name: my-volume
volumes:
- name: my-volume
emptyDir: {}
2. Wait for kubelet to clean up init container.
$ oc get pods
NAME READY STATUS RESTARTS AGE
init-fail-test 0/1 Init:CrashLoopBackOff 4 6m
$ oc logs init-fail-test
1
1
1
1
2
4
5
5
6
6
6
7
3. We see that the pod and main container never get restarted, but the init container is looping and increasing the restart count on the pod.
# docker ps -a | grep init
cdaeb00eef23 registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/bash -c '#!/..." 12 seconds ago Exited (1) 4 seconds ago k8s_inittest_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_5
918fabe7c867 registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/sh -ec 'whil..." 7 minutes ago Up 7 minutes k8s_my-container_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0
b768406fb266 registry.redhat.io/openshift3/ose-pod:v3.11.153 "/usr/bin/pod" 7 minutes ago Up 7 minutes k8s_POD_init-fail-test_openshift-sdn_57eee915-01ad-11ea-ae27-fa163e05678f_0
Hi Tony, Thanks for checking in on this. We would like to fix it but the fix may be complicated and difficult to backport to 3.11. I'm planning to spend some time working on a fix for 4.6, then assessing feasibility of a backport to 3.11.z. This issue is hit with Kubelet GC, but can be reproduced with any dead container cleanup. Can you please let me know if this bug is fixed? If yes, in which OC release? Interesting side effect of this, is that kubelet controlled volume mounts like /etc/host get remounted, so any changes to this file get reverted. Not sure what the effect of this is with other volumes.
==================================
# oc create -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
initContainers:
- name: inittest
image: "registry.redhat.io/rhel7/rhel"
command: ["bin/sh", "-ec", "echo running >> /mnt/data/test"]
volumeMounts:
- name: my-volume
mountPath: /mnt/data
containers:
- name: my-container
image: "registry.redhat.io/rhel7/rhel"
command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"]
volumeMounts:
- mountPath: /mnt/data
name: my-volume
volumes:
- name: my-volume
emptyDir: {}
EOF
[root@node-2 quicklab]# docker ps -a | grep my-pod
faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 46 seconds ago Up 45 seconds k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
8430f48640f9 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 49 seconds ago Exited (0) 48 seconds ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" About a minute ago Up About a minute k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
[root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/sh
sh-4.2# echo "1.1.1.1 test" >> /etc/hosts
sh-4.2# exit
[root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.129.2.4 my-pod
1.1.1.1 test
[root@node-2 quicklab]# docker rm 8430f48640f9
8430f48640f9
[root@node-2 quicklab]# docker ps -a | grep my-pod
faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 3 minutes ago Up 3 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
[root@node-2 quicklab]# docker ps -a | grep my-pod
faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
[root@node-2 quicklab]# docker ps -a | grep my-pod
f7e5a172329d registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 2 seconds ago Exited (0) Less than a second ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
[root@node-2 quicklab]# docker ps -a | grep my-pod
f7e5a172329d registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "bin/sh -ec 'echo ..." 7 seconds ago Exited (0) 6 seconds ago k8s_inittest_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
faa9efd5dae8 registry.redhat.io/rhel7/rhel@sha256:6dabf4a152c6f209b1fbbfc59dc684fb327e3df7d24410c696ec6c5388f4f33a "/bin/sh -ec 'ls /..." 3 minutes ago Up 3 minutes k8s_my-container_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
c6a2f318e010 registry.redhat.io/openshift3/ose-pod:v3.11.248 "/usr/bin/pod" 4 minutes ago Up 4 minutes k8s_POD_my-pod_default_e0e2a1e6-d82e-11ea-8c23-fa163e318e7f_0
[root@node-2 quicklab]# docker exec -u 0 -it faa9efd5dae8 /bin/cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.129.2.4 my-pod
=============================
To Ruchi, I'm sorry, this bug is not fixed yet. To Ryan, /etc/hosts is managed by the Kubelet and it is rewritten every time a container in the pod is started. That's what this comment in the file means: # Kubernetes-managed hosts file. A pod shouldn't have any expectation that any changes to that file will remain. If a pod needs to add entries to that file, it should use hostAliases. More info about that in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1860201#c12 verified on version : 4.7.0-0.nightly-2021-01-06-055910 removed the exited init container, and the init container didn't restart any more. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Description of problem: Init containers restart when the exited init container is removed from node by kubelet garbage collector. Version-Release number of selected component (if applicable): 3.11 How reproducible: 100% Steps to Reproduce: 1. Create pod # oc create -f [1] pod/init-test created # oc exec init-test cat /mnt/data/count 1 2. Set node garbage collection low to trigger cleanup faster ``` kubeletArguments: minimum-container-ttl-duration: - "10s" maximum-dead-containers-per-container: - "0" maximum-dead-containers: - "0" ``` or remove the exited init container manually with `docker rm` # docker ps -a | grep init e4978a8bb28a registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/sh -ec 'cat ..." 2 minutes ago Up 2 minutes k8s_my-container_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0 94fb06cc88b0 registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/bash -c '#!/..." 2 minutes ago Exited (0) 2 minutes ago k8s_inittest_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0 d5f1053ffc23 registry.redhat.io/openshift3/ose-pod:v3.11.153 "/usr/bin/pod" 2 minutes ago Up 2 minutes # docker rm 94fb06cc88b0 Actual results: - init container continues to restart due to being cleaned up by kubelet. # oc exec init-test cat /mnt/data/count 2 # oc exec init-test cat /mnt/data/count 3 ... ... - Main container never gets restarted: # docker ps -a | grep init 9d4605b3d6cf registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/bash -c '#!/..." 3 minutes ago Exited (0) 3 minutes ago k8s_inittest_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0 e4978a8bb28a registry.redhat.io/rhel7/rhel@sha256:7bca0d15377805f6b3bbb79d28325da8dd90f4eeef6fb463c0973d05257e5d03 "/bin/sh -ec 'cat ..." 6 minutes ago Up 6 minutes k8s_my-container_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0 d5f1053ffc23 registry.redhat.io/openshift3/ose-pod:v3.11.153 "/usr/bin/pod" 6 minutes ago Up 6 minutes k8s_POD_init-test_openshift-sdn_3deb5d3e-01ab-11ea-ae27-fa163e05678f_0 Expected results: init container to only run when pod is restarted or first scheduled to node. Additional info: [1] oc create -f - << EOF apiVersion: v1 kind: Pod metadata: name: my-pod spec: initContainers: - name: inittest image: "registry.redhat.io/rhel7/rhel" command: ["bin/sh", "-ec", "echo running >> /mnt/data/test"] volumeMounts: - name: my-volume mountPath: /mnt/data containers: - name: my-container image: "registry.redhat.io/rhel7/rhel" command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"] volumeMounts: - mountPath: /mnt/data name: my-volume volumes: - name: my-volume emptyDir: {} EOF