Bug 1357121
Summary: | [extras-rhel-7.3.0] Docker ps -a shows dead pods that can't be removed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Eric Jones <erjones> |
Component: | docker | Assignee: | Lokesh Mandvekar <lsm5> |
Status: | CLOSED ERRATA | QA Contact: | atomic-bugs <atomic-bugs> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | agrimm, aos-bugs, dwalsh, gouyang, imcleod, jhonce, jkaur, jokerman, lsm5, mmccomas, mpatel, qcai, vgoyal, wmeng |
Target Milestone: | rc | Keywords: | Extras |
Target Release: | 7.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
Docker 1.9.1
|
|
Last Closed: | 2016-11-04 09:08:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1303130 |
Description
Eric Jones
2016-07-15 19:47:05 UTC
Need some more information about setup. - Are you running docker daemon in host mount namespace or in a separate mount namespace with slave relationship. If you are using systemd, you can check docker.service unit file and see if "MountFlags=slave" is specified or not. Mrunal, Have a question about shm. So in this case directory /var/lib/.../shm/ could not be removed. And I think reason being that it is mounted on in some other mount namespace. There are two possibilities. - This mount point has leaked into some other namespace. - Or we have intentially leaked this mount point in some other namespace so that two containers can share this mount point. I remember that you had done some work in this area. Can you shed some light on this. I think this issue depends on kernel issue 1347821. A directory removal has failed because this directory is mounted on in some other mount namespace. *** Bug 1356993 has been marked as a duplicate of this bug. *** I agree that this is tied to the kernel mount namespace bug. I think getting the mounts in the container, host and docker mount namespaces will help. (In reply to Mrunal Patel from comment #5) > I agree that this is tied to the kernel mount namespace bug. I think getting > the mounts in the container, host and docker mount namespaces will help. Murnal, Do we expect that any of the recent changes to the mount namespace(s) in our docker 1.10 packages will fix this? If so, can we point people to a particular build and/or commit? Ian, Just talked with Vivek and he agrees that adding MountFlags=slave should help with this. He is also going to be working on a workaround garbage collection to remove these as kernel backfixes aren't going to happen. Docker containers can't be removed right now because most likely their mounts points have leaked into other app's mountnamespace, like systemd-machined. There is a high chance that these containers can be cleaned up little later. If we provide a cron job which cleans up dead containers, periodically, will that help. Also in fedora systemd-machined runs into a host mountnamespace and mountpoint leak situation does not arise. While in rhel, systemd-machined seems to be running into its own mount namespace. If we modify rhel7 systemd-machined to run into host mount namespace, it will reduce the possibility of facing Dead containers. Following entry in "crontab -e" worked for me. Thanks to Dan Walsh for the docker command. 0,10,20,30,40,50 * * * * docker rm $(docker ps -aq -f status=dead) Rather, following tries to clean dead containers every 10 mins. */10 * * * * docker rm $(docker ps -aq -f status=dead) I think we can drop an hourly cron job script in /etc/cron.hourly/ to take care of cleaning dead containers. We probably require two scripts. One for docker and other for docker-latest. For now I am testing docker-hourly.cron and docker-latest-hourly.cron on my system. docker-hourly.cron =================== #!/bin/bash # Do nothing if docker daemon is not running if ! systemctl is-active --quiet docker exit 0 fi # Try to cleanup dead containers docker rm $(docker ps -aq -f status=dead) docker-latest-hourly.cron ========================= #!/bin/bash # Do nothing if docker-latest service is not active if ! systemctl is-active --quiet docker-latest exit 0 fi # Try to cleanup dead containers docker rm $(docker ps -aq -f status=dead) We can package it up in docker-common, and then only have one. /usr/bin/docker will work for either docker or docker-latest. That's fine too. I will modify initial check to test that either docker or docker-latest should be active otherwise don't do anything. Following works for me. Lokesh, will you be able to change docker-common package to include script named docker-hourly.cron and installed in /etc/cron.hourly/ dir. #!/bin/bash # Do nothing if neither docker nor docker-latest service is running #if ! systemctl is-active --quiet docker-latest docker; then if ! systemctl --quiet is-active docker-latest && ! systemctl --quiet is-active docker; then exit 0 fi # Try to cleanup dead containers docker rm $(docker ps -aq -f status=dead) #if ! systemctl is-active --quiet docker-latest docker; then if ! systemctl --quiet is-active docker-latest && ! systemctl --quiet is-active First line should work. Need to pick one. I tried "if ! systemctl is-active --quiet docker-latest docker;" with docker-latest running and docker not being installed. It was not working as written in man pages. I was getting non-zero exit status. As per man page I should get zero exit status as long as one of the services listed is active. Looks like there is some bug. So I switched to the current syntax. That is fine, just remove the initial comment. Here is the updated script. #!/bin/bash # Do nothing if neither docker nor docker-latest service is running if ! systemctl --quiet is-active docker-latest && ! systemctl --quiet is-active docker; then exit 0 fi # Try to cleanup dead containers docker rm $(docker ps -aq -f status=dead) LGTM Hi, quick follow up question. Is this a script that needs to be implemented somewhere in code? Or is this something that users experiencing this issue can use to correct the problem? The script can be shipped as part of the docker package. But it is just a simple bash script. Nothing has to change in code, until we fix the kernel issue which is causing this problem. So if a customer is currently experiencing this issue, they should be able to run `docker rm $(docker ps -aq -f status=dead)` to remove the stuck, dead, containers? Or do they need to run anything first? Customers can try running this. There is no guarantee that dead containers will go away immediately. Though one needs to keep trying and there is a hope that over a period of time, dead containers will go away. Okay, Thank you @Vivek Lokesh can we get this into the 7.3 release? Use following version of script. I updated it so that it exits with status 0 if there are no dead containers. #!/bin/bash # Do nothing if neither docker nor docker-latest service is running if ! systemctl --quiet is-active docker-latest && ! systemctl --quiet is-active docker; then exit 0 fi # If there are no dead containers, exit. DEAD_CONTAINERS=`docker ps -aq -f status=dead` [ -z "$DEAD_CONTAINERS" ] && exit 0 # Try to cleanup dead containers docker rm $DEAD_CONTAINERS checked the script is included in docker-common-1.10.3-55.el7.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2634.html |