Bug 1897732

Summary: Race condition in kubelet cgroup destroy process [docker-runc]
Product: Red Hat Enterprise Linux 7 Reporter: Derrick Ornelas <dornelas>
Component: dockerAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.7CC: agk, ajia, akrzos, amurdaca, aos-bugs, atomic-bugs, dornelas, dwalsh, fshaikh, jnovy, jokerman, lsm5, mpatel, mrobson, msekleta, nchoudhu, pasik, rmetrich, rphillips, schoudha, tsweeney
Target Milestone: rcKeywords: Extras, Reopened
Target Release: 7.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: docker-1.13.1-204.git0be3e21.el7_9 or newer Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1766665 Environment:
Last Closed: 2021-03-16 14:42:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1766665, 1782918    
Bug Blocks: 1186913, 1787148, 1857174, 1858174, 1913377    

Comment 4 Pasi Karkkainen 2021-01-06 17:14:49 UTC
I'm seeing docker daemon crash when starting kubelet container. I wonder if this bug is related? The problem seems to happen with docker-1.13.1-203.git0be3e21. Based on the comments on bz #1766665 this bug might be related.. eg. there's a patch missing from -203 ?

When using older version docker-1.13.1-162.git64e9980 I'm not seeing the docker daemon crash with kubelet.

Comment 8 Alex Jia 2021-02-24 04:48:13 UTC
Moving this bug to VERIFIED state according to the following test steps.

[root@amd-dinar-04 ~]# rpm -q systemd docker kernel
systemd-219-78.el7_9.3.x86_64
docker-1.13.1-204.git0be3e21.el7_9.x86_64
kernel-3.10.0-1160.15.2.el7.x86_64
[root@amd-dinar-04 ~]# docker create --name=sf02456020 --cgroup-parent=sf02456020.slice registry.access.redhat.com/rhel7 sh -c "trap \"\" TERM ; sleep 365d"
Unable to find image 'registry.access.redhat.com/rhel7:latest' locally
Trying to pull repository registry.access.redhat.com/rhel7 ... 
latest: Pulling from registry.access.redhat.com/rhel7
96bd051f1942: Pull complete 
159a7b5d1b30: Pull complete 
Digest: sha256:7cec88dcd1f3f11d5321438da25e1d851899cf78fef7972c70f8a96dcf633c71
Status: Downloaded newer image for registry.access.redhat.com/rhel7:latest
5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30
[root@amd-dinar-04 ~]# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                    CREATED             STATUS              PORTS               NAMES
5b23adf531a1        registry.access.redhat.com/rhel7   "sh -c 'trap \"\" TE..."   9 seconds ago       Created                                 sf02456020
[root@amd-dinar-04 ~]# docker start sf02456020
sf02456020
[root@amd-dinar-04 ~]# docker ps
CONTAINER ID        IMAGE                              COMMAND                    CREATED             STATUS              PORTS               NAMES
5b23adf531a1        registry.access.redhat.com/rhel7   "sh -c 'trap \"\" TE..."   20 seconds ago      Up 4 seconds                            sf02456020
[root@amd-dinar-04 ~]# ls -ld /sys/fs/cgroup/systemd/sf02456020.slice/docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope
drwxr-xr-x. 2 root root 0 Feb 23 23:43 /sys/fs/cgroup/systemd/sf02456020.slice/docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope
[root@amd-dinar-04 ~]# systemctl list-units | grep -e sf02456020.slice -e docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope
docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope                         loaded active running   libcontainer container 5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30
sf02456020.slice                                                                                      loaded active active    sf02456020.slice
[root@amd-dinar-04 ~]# time systemctl stop sf02456020.slice

real	1m30.147s
user	0m0.010s
sys	0m0.010s
[root@amd-dinar-04 ~]# systemctl list-units | grep -e sf02456020.slice -e docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope
[root@amd-dinar-04 ~]# ls -ld /sys/fs/cgroup/systemd/sf02456020.slice/docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope
ls: cannot access /sys/fs/cgroup/systemd/sf02456020.slice/docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope: No such file or directory
[root@amd-dinar-04 ~]# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                    CREATED             STATUS                            PORTS               NAMES
5b23adf531a1        registry.access.redhat.com/rhel7   "sh -c 'trap \"\" TE..."   3 minutes ago       Exited (137) About a minute ago                       sf02456020
[root@amd-dinar-04 ~]# journalctl | grep "scope stopping timed out. Killing."
Feb 23 23:45:51 amd-dinar-04.khw1.lab.eng.bos.redhat.com systemd[1]: docker-5b23adf531a177134de7d349b5861a74177c690603fd2035e3f2e454c8e89a30.scope stopping timed out. Killing.

Comment 10 errata-xmlrpc 2021-03-16 14:42:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (docker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0888