Bug 1945929
Summary: | Every podman run invocation generates two "Couldn't stat device /dev/char/10:200: No such file or directory" lines in the journal | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Michele Baldessari <michele> | ||||||
Component: | runc | Assignee: | Jindrich Novy <jnovy> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Alex Jia <ajia> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 8.4 | CC: | augol, bbaude, dahernan, dornelas, dwalsh, eglottma, ggiguash, gscrivan, jligon, jnovy, kir, lsm5, mheon, nrevo, pthomas, rsandu, rwright, skrenger, sscheink, tsweeney, umohnani, xiliang, ypu | ||||||
Target Milestone: | beta | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | runc-1.0.3-6.el8 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2022-06-03 21:04:16 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1186913, 2074322 | ||||||||
Attachments: |
|
Description
Michele Baldessari
2021-04-02 17:52:02 UTC
Is there a device /dev/char/10:200 on your system? The device is question is associated with /dev/tun. When the tun module is loaded, it creates a symlink between /dev/char/10:200 -> ../net/tun. The real problem, to be determined, is in runc. This can be worked around with one of two ways: modprobe tun or install crun. And i double checked rc93 in brew, same issue exists. Kir seems like this is a runc issue. It seems that the message comes from systemd. What runc does is creates DeviceAllow systemd property based on the OCI runtime config (aka config.json), section linux.resources.devices).
I guess there is an entry for /dev/char/10:200 (which is a symlink to /dev/net/tun) in OCI runtime config, so it is added to DeviceAllow.
So, the problem is in OCI runtime config, which is generated by podman.
> The real problem, to be determined, is in runc.
I believe it is not in runc. It doesn't know anything about /dev/net/tun or /dev/char/10:200, it merely relays the config provided to systemd.
This can't be reproduced with crun, because it does not (yet?) relay device rules to DeviceAllow systemd property.
Now, I was not able to figure out where in podman this device is added, but I am not very familiar with its networking aspects.
Setting back to Brent. Created attachment 1872044 [details] journalctl output showing tun module load during boot I found this bug while looking for this message: [ 99.936178] magura01 systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory Yesterday I reported bug 2074320 on RHOCP 4.10 and I noticed that in the journalctl attached to that bug are 1000's of instances of this message. I don't believe the messages relate at all to the bug I reported, but having noticed them, I wanted to note them here in the interest of cleaning up this spurious message. Seeing remarks in this bug relating the message to podman, I looked on a different system in my lab that is running RHEL8.5 and on which podman is in use. The message is not seen on that RHEL85 system, but drawing on the clues in comment 11 here, I found that the tun module has been loaded at boot: [root@revel01 ~]# ls -ld /dev/net/tun /dev/char/10:200 lrwxrwxrwx. 1 root root 10 Mar 31 16:26 /dev/char/10:200 -> ../net/tun crw-rw-rw-. 1 root root 10, 200 Mar 31 16:26 /dev/net/tun [root@revel01 ~]# journalctl -a|grep tun: Mar 31 16:26:37 revel01 kernel: tun: Universal TUN/TAP device driver, 1.6 [root@revel01 ~]# lsmod|grep tun tun 49152 1 The attached journalctl illustrates startup on the RHEL85 system with podman and the fact that the tun module was loaded early. The journalctl I provided in bug 2074320 confirms the tun module did not load. I see also the comment 5 remark: "... This can be worked around with one of two ways: modprobe tun or install crun." In response to which I'd like to ask the question: is it actually necessary to load the tun module if the only purpose is to suppress this error message? If comment 11 is correct in tying to a system DeviceAllow property, could that be qualified with a ConditionPathExists on /dev/char/10:200 .... or should systemd infer such a condition on DeviceAllow properties? After further debugging, I remain convinced that this is a runc problem, as Podman never requests any action be taken on `/dev/char/10:200` or any path in `/dev/char` from the OCI runtime. We will continue to debug further and isolate exactly why that is. Created attachment 1874975 [details] journalctl shows /dev/char.10:200 diagnostic occurs frequently, then ceases I noticed today that the /dev/char/10:200 diagnostic had ceased on the system that I mentioned in comment 15. It was emitted every few minutes on the average from system boot until around 16:55 on Friday: [root@magura01 ~]# journalctl -ab > journalctl-20220425.txt [root@magura01 ~]# grep -c /dev/char/10:200 journalctl-20220425.txt 5721 [root@magura01 ~]# grep /dev/char/10:200 journalctl-20220425.txt | tail ... Apr 22 16:50:27 magura01 systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory Apr 22 16:55:45 magura01 systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory I correlated the cessation of the message to running a particular newly constructed container last Friday. Today I can replicate this by running "modprobe -r tun" to unload the tun module, then run the container, then look again and tun is gain loaded: [core@magura01 ~]$ lsmod|grep tun ip6_udp_tunnel 16384 1 vxlan udp_tunnel 20480 1 vxlan [core@magura01 ~]$ podman run --rm -dt -p8080:8080 localhost/httpd-repo-server 0f1801c508dee8244100ea76ed1686cb68a7ec0312ccd9b0770716a2cf7e2e45 [core@magura01 ~]$ lsmod|grep tun tun 53248 2 ip6_udp_tunnel 16384 1 vxlan udp_tunnel 20480 1 vxlan I suppose this observation may be intuitively obvious to experts, but for a relative container newbie like myself, the fact that running a container as an ordinary non-root user can trigger the module load and thereby turn off the diagnostic is helpful to understand why the message could occur for a possibly lengthy period and then disappear. Whereas another system running a different mixture of containers might log this diagnostic only a few times if at all. The nature of the container is to provide http/httpd access to an rpm repo that is constructed inside that container, it is based on registry.access.redhat.com/ubi8/httpd-24 @kir thoughts about Matt's comments in comment 17? https://bugzilla.redhat.com/show_bug.cgi?id=1945929#c17 I can reproduce the issue just using `runc run` with the config.json produced by `runc spec`: # mkdir rootfs # runc spec # runc --systemd-cgroup run foo ERRO[0000] runc run failed: unable to start container process: exec: "sh": executable file not found in $PATH and in the journal I see: Apr 26 03:46:56 localhost.localdomain systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory -- The unit runc-foo.scope has successfully entered the 'dead' state. so I am moving it back to runc since it doesn't depend on the configuration created by Podman. To workaround the issue it is sufficient to load the tun module: `sudo modprobe tun`. Reproduced on CentOS 8 using runc from latest git. Can not reproduce on Fedora 35. Most probably a systemd issue; looking. Can't yet figure out why systemd is interested in /dev/net/tun in particular when creating a container scope, but it looks like backporting https://github.com/systemd/systemd/pull/10996 (in particular, commits https://github.com/systemd/systemd/pull/10996/commits/d5aecba6e0b7c73657c4cf544ce57289115098e7 and https://github.com/systemd/systemd/pull/10996/commits/d5aecba6e0b7c73657c4cf544ce57289115098e7, maybe others, too) should help to eliminate the message. I am inclined to say this is a systemd issue, as runc do not set anything specific to /dev/net/tun, but I will take a deeper look tomorrow. Sorry, I was wrong, this is indeed runc what adds the rule (not sure how I missed it before). Proposed upstream fix: https://github.com/opencontainers/runc/pull/3468 The upstream fix was merged. Not really related to this bug, but to get the full picture, this change motivated similar changes in related projects: * containerd: https://github.com/containerd/containerd/pull/6923 * crun: https://github.com/containers/crun/pull/916 Also for the sake of completeness, I looked into podman and cri-o, and haven't found any traces of /dev/net/tun aka 10:200 being enabled by default. Tested with runc-1.0.3-6.module+el8.7.0+15223+3987d347, it works as expected. [root@ibm-x3650m4-01-vm-06 ~]# lsmod|grep tun [root@ibm-x3650m4-01-vm-06 ~]# mkdir rootfs [root@ibm-x3650m4-01-vm-06 ~]# mkdir mycnt/rootfs -p [root@ibm-x3650m4-01-vm-06 ~]# rm -rf rootfs/ [root@ibm-x3650m4-01-vm-06 ~]# cd mycnt/ [root@ibm-x3650m4-01-vm-06 mycnt]# podman export $(podman create quay.io/libpod/alpine) | tar -C rootfs -xvf - ...ignore... [root@ibm-x3650m4-01-vm-06 mycnt]# runc spec [root@ibm-x3650m4-01-vm-06 mycnt]# runc --systemd-cgroup run foo / # ls bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var / # exit [root@ibm-x3650m4-01-vm-06 ~]# rpm -q podman runc conmon podman-4.1.0-2.module+el8.7.0+15223+3987d347.x86_64 runc-1.0.3-6.module+el8.7.0+15223+3987d347.x86_64 conmon-2.1.0-3.module+el8.7.0+15223+3987d347.x86_64 Close this bug per Comment 28. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |