Description of problem: So, I have this single (zombie?) container that just refuses to die. lsm5 @ nagato : ~(master) $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7fe220fd9fba nanadai.me/bionic:latest /bin/bash 25 hours ago Up 25 hours ago frosty_mayer [Skipped "Error: can only kill running containers.." from this one] lsm5 @ nagato : ~(master) $ podman kill --all 2019-10-04T17:25:40.000761384Z: error opening file '/run/user/0/crun/7fe220fd9fba37eb2d1e4fc9565930168b1c8b1cd37b646ca12d7060716c6339/status': No such file or directory Error: error sending signal to container 7fe220fd9fba37eb2d1e4fc9565930168b1c8b1cd37b646ca12d7060716c6339: `/usr/bin/crun kill 7fe220fd9fba37eb2d1e4fc9565930168b1c8b1cd37b646ca12d7060716c6339 9` failed: exit status 1 lsm5 @ nagato : ~(master) $ podman inspect 7fe220fd9fba | grep Pid "Pid": 16443, "ConmonPid": 16436, "ConmonPidFile": "/var/run/containers/storage/overlay-containers/7fe220fd9fba37eb2d1e4fc9565930168b1c8b1cd37b646ca12d7060716c6339/userdata/conmon.pid", "PidMode": "", "PidsLimit": 0, lsm5 @ nagato : ~(master) $ sudo kill -9 16443 kill: sending signal to 16443 failed: No such process lsm5 @ nagato : ~(master) $ podman ps -a --sync ERRO[0000] unable to get container info: "error decoding container status for container 7fe220fd9fba37eb2d1e4fc9565930168b1c8b1cd37b646ca12d7060716c6339: EOF" Version-Release number of selected component (if applicable): podman-1.6.1-2.fc31.x86_64 How reproducible: observed consistently atm (though I'm not sure how exactly to reproduce it from a clean install) I wonder if this'll work the same across a reboot.
This doesn't happen with the other containers I have btw.
So, I was able to reproduce this just once in the very next reboot for the same container image. But, not in subsequent reboots. Also, I consistently see a lot of the following messages when running the very first container: ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 02edd56bf00b46851117a988ae23a370890029fb9e9fe9501a9e7f067fb89c07: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 0b1b7a8c2c9d3076427227facac71278815ad7fbeea1427808b7f3e70cbb4fbd: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 0eadba4583bff0ec18ffd0fcee080a129f3320c01183fe4bfc585127ecb89327: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 127177d12fc7ce80a18ea47b610352400829458f6e75c3947f5418e89be64bac: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 141f0fc567dfb7e62b845961b64d4089841e66b90eda6bc62c1df8e7f3a9f1fb: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 144c0836e377fd8b2bd59b835982e90dfab8db602090e55f2bb5cd5c1550c4fa: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 15fbd06ce5b1e6ac71dcdafba32fa8ef30c8773e499772d86d495e4e6dcc3789: missing prevResult from earlier plugin ERRO[0002] Error deleting network: missing prevResult from earlier plugin ERRO[0002] Error while removing pod from CNI network "podman": missing prevResult from earlier plugin ERRO[0002] Error refreshing container 1cee9987f12dd0d1b025a629010ac0eacee9dea05c9976a65fd98fdb914cc8f2: missing prevResult from earlier plugin
(In reply to Lokesh Mandvekar from comment #2) > So, I was able to reproduce this just once in the very next reboot for the > same container image. But, not in subsequent reboots. > > Also, I consistently see a lot of the following messages when running the > very first container: > > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 02edd56bf00b46851117a988ae23a370890029fb9e9fe9501a9e7f067fb89c07: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 0b1b7a8c2c9d3076427227facac71278815ad7fbeea1427808b7f3e70cbb4fbd: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 0eadba4583bff0ec18ffd0fcee080a129f3320c01183fe4bfc585127ecb89327: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 127177d12fc7ce80a18ea47b610352400829458f6e75c3947f5418e89be64bac: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 141f0fc567dfb7e62b845961b64d4089841e66b90eda6bc62c1df8e7f3a9f1fb: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 144c0836e377fd8b2bd59b835982e90dfab8db602090e55f2bb5cd5c1550c4fa: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 15fbd06ce5b1e6ac71dcdafba32fa8ef30c8773e499772d86d495e4e6dcc3789: missing > prevResult from earlier plugin > ERRO[0002] Error deleting network: missing prevResult from earlier plugin > ERRO[0002] Error while removing pod from CNI network "podman": missing > prevResult from earlier plugin > ERRO[0002] Error refreshing container > 1cee9987f12dd0d1b025a629010ac0eacee9dea05c9976a65fd98fdb914cc8f2: missing > prevResult from earlier plugin The container does run after this though. Not a fatal error.
I think we may have two issues here. One that caused the container to die without that being recorded by Conmon, and another where trying to do a `crun status` on the container gets us an EOF. I think the second one is pretty concerning - it prevents `ps --sync` from accurately determining the status of a container when `crun` is in play.
Update, I let my system sit for a while with 2 running containers, got back and tried to exec into both, and I can see the error now for both of them.
I'm also seeing this on F31 with a number of my containers. The containers are running fine, but I'm unable to exec into them, and I get an error when I try to stop them: # podman exec -it mariadb /bin/bash Error: error opening file '/run/user/0/crun/d2f2fbb1e6db942def1f1430342e8c8c6919f9696604c65780626e20569126c7/status': No such file or directory: OCI runtime command not found error # podman exec -it nextcloud /bin/bash Error: error opening file '/run/user/0/crun/97a9d1f70771ff8006c45c452ce9abd0af57cf7586c37ca038dfb89dea477d9b/status': No such file or directory: OCI runtime command not found error # ls -l /run/user/0/crun/ total 0
FWIW, I checked the modified time for the directory, and it looks like it was last changed at 4:00AM. It makes it look like this might be some kind of cron job being overly aggressive while cleaning up: # ls -ld /run/user/0/crun drwx------. 2 root root 40 Dec 3 04:00 /run/user/0/crun
It looks like the data was removed for the running container.
Can you stop and restart the container?
Is the container under `/run/crun`?
(In reply to Daniel Walsh from comment #9) > Can you stop and restart the container? The only way to stop the container is to manually kill conmon. Once I've done that, I can restart the container.
(In reply to Giuseppe Scrivano from comment #10) > Is the container under `/run/crun`? I'm not sure what you mean. The container filesystem is in `/var/lib/containers/storage` (which I think is the default). There is no `/run/crun` directory on my system.
Debarshi, have you seen this issue before?
Ok, I think I have some idea of what's happening, but not necessarily why. I've setup libpod.conf to use crun as my runtime. I have a systemd service that starts the container. My service uses --uidmap and --gidmap to map the users in the container to unprivileged users on the host. According to the `crun` man page, it uses $XDG_RUNTIME_DIR/crun when run as a non-privileged user, which seems to be `/run/user/0/crun` (even though crun itself is running as root). Whenever I log into my server as root and then logout again, /run/user/0 is wiped as part of the session cleanup. In this case, it seems that crun should be using `/run/crun`, and I'm not sure why it's not (or if I've missed something) I'm including the systemd service I'm using to start to container. I realize there are some non-best-practices there, but I don't think anything there should be causing this particular bug. $ rpm -q crun podman crun-0.10.6-1.fc31.x86_64 podman-1.6.2-2.fc31.x86_64 $ cat /etc/systemd/system/nextcloud.service [Unit] Description=Nextcloud Wants=mariadb.service network-online.target After=mariadb.service network-online.target [Service] Restart=on-failure ExecStartPre=-/usr/bin/podman pull docker.io/library/nextcloud:stable ExecStartPre=-/usr/bin/podman stop nextcloud ExecStartPre=-/usr/bin/podman rm nextcloud ExecStart=/usr/bin/podman run \ --name nextcloud \ --uidmap 0:110000:4999 \ --gidmap 0:110000:4999 \ --uidmap 65534:114999:1 \ --gidmap 65534:114999:1 \ --ip 10.88.1.3 \ --add-host mariadb:10.88.1.2 \ --add-host nextcloud:10.88.1.3 \ --hostname nextcloud \ --conmon-pidfile=/run/nextcloud.pid \ --tty \ -p 127.0.0.1:8888:80 \ -v /fast/nextcloud/data:/var/www/html:Z \ -v /home/nextcloud/users:/var/www/html/users:Z \ docker.io/library/nextcloud:stable ExecStop=/usr/bin/podman stop nextcloud PIDFile=/run/nextcloud.pid [Install] WantedBy=multi-user.target
A bit more (hopefully welcome) analysis: In https://github.com/containers/libpod/blob/309452dcfc4bea9cbeff03c8b7c9a0811521e524/libpod/runtime.go#L77, there's a function to set XDG_RUNTIME_DIR if podman is run rootless. In https://github.com/containers/crun/blob/91db0f4a76db33d6221e18ae0560cd715b2e3273/src/libcrun/status.c#L33, crun will use $XDG_RUNTIME_DIR/crun if the variable is set, whether the current UID is root or not. I think what's happening is that the --uidmap flag is turning rootless mode on, even though the container is started using the root user. Because rootless mode is turned on, podman sets XDG_RUNTIME_DIR, which then causes crun to use $XDG_RUNTIME_DIR/crun as its working path, which then causes problems if the root user ever logs out of the system. I think the best solution would be to have crun check if it's running as root before using $XDG_RUNTIME_DIR/crun.
I've spent more time on this today and figured out what's going wrong and how to fix it. My previous analysis was incorrect in that IsRootless() does return false when run as root, even if --uidmap is passed. It also turns out that, in rootless mode, crun still thinks it's root, so my original plan for a fix also doesn't work. The real problem is that GetRuntimeDir() in pkg/util/utils_supported.go will *always* return /run/user/<uid>, whether the container is started by root or not. This also applies to containers started by root without the --uidmap flag passed. I think the best fix is to update GetRuntimeDir() to return an empty path is IsRootless() is false. The runtime should then use the default path for storing state, which should be accessible by root. For crun, the default path is /run/crun. I've put up a PR to implement this fix: https://github.com/containers/libpod/pull/4657
This ought to be fixed, as Podman 1.7.0 was released with the given patch included
(In reply to Giuseppe Scrivano from comment #13) > Debarshi, have you seen this issue before? No, I don't recall seeing this ever.
This has been fixed since Podman 1.7.0 was released.