Hide Forgot
Description of problem: This problem was previously reported and fixed in: https://bugzilla.redhat.com/show_bug.cgi?id=1869030 but the patch for that bug was then backed out, because it was only thought to be relevant in context of systemd-nspawn. In practice this appears to hit much more widely. This glibc change has broken Fedora rawhide when running inside docker containers. This has broken libvirt CI running on GitLab CI, which uses docker Downgrading from glibc-2.32.9000-16.fc34.x86_64.rpm, back to glibc-2.32.9000-15.fc34.x86_64.rpm, which has the workaround, fixes execution inside docker but that's not viable for users todo manually every time. A patch has been sent to Docker to allow faccessat2, but that is pretty recent and so it is not widely deployed at this time. https://github.com/moby/moby/pull/41353/files I think that glibc needs to keep the workaround from bug 1869030 a good while longer yet, to allow time for fixed docker to become available widely. I suspect the more general root cause problem lies in 'runc' which is returning EPERM when filtering syscalls instead of ENOSYS https://github.com/opencontainers/runc/issues/2151 Version-Release number of selected component (if applicable): glibc-2.32.9000-16.fc34.x86_64.rpm How reproducible: I've seen it in docker under GitLab Ci, but not in podman running locally, but I suspect that's because my podman setup isn't doing syscall filtering
*** Bug 1899913 has been marked as a duplicate of this bug. ***
The issue can be demonstrated on Fedora 33 host with docker from moby-engine packaged in Fedora 33, see bug 1897493.
This is bad. I think I'm hitting the same issue. R (in a rawhide docker image) stopped working after updating glibc to glibc-2.32.9000-16.fc34. I see: $ R ERROR: R_HOME ('/usr/lib64/R') not found
glibc is basically sitting between the kernel and the cloud. I've brought the discussion to what I think are the appropriate forums: https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ https://groups.google.com/a/opencontainers.org/g/dev/c/8Phfq3VBxtw I've also posted a glibc upstream patch to show what it would look like: https://sourceware.org/pipermail/libc-alpha/2020-November/119955.html Personally, I find it difficult to support such an approach technically, and I would like to see some reassurance from kernel developers that this is okay.
Can we get the previous workaround re-applied to rawhide as a stop gap until the upstream discussions reach some conclusion about the right long term fix ? This broken faccessat() is quite disruptive to people using rawhide in containers
The workaround has been categorically rejected by kernel developers and glibc developers alike. Work is under way to address this in runc and potentially libseccomp.
Here is another case that fails for the same reason: FROM registry.fedoraproject.org/fedora:rawhide RUN echo "echo test" > test.sh RUN chmod +x test.sh RUN ls -l test.sh RUN test -x test.sh
I really appreciate that the fix is on the way. I just want to point out again that runc und libseccomp are components that are often in the scope of infrastructure operators. So it will take a (long) time until they get updated. There are so many people involved that it simply will take a lot of time... Developer providing the fix, upstream review and hopefully merge, backports to stable versions used by all the major distributions out there, package distribution and finally the operator who accepts the new version and deploys it... Some projects already had to disabled their CI jobs for building and testing on current Fedora releases because of this issue. Any chance to get something like a "special" package for the use in containers or provide "working" Fedora 34 container images?
(In reply to Florian Bezdeka from comment #15) > Some projects already had to disabled their CI jobs for building and testing > on current Fedora releases because of this issue. Only Fedora Rawhide is impacted, and the goals of Rawhide are different from the goals of a stable release. Please review the Fedora Rawhide goals here: https://fedoraproject.org/wiki/Releases/Rawhide#Goals "To identify and fix issues with packages before they reach a stable release of Fedora." > Any chance to get something like a "special" package for the use in > containers or provide "working" Fedora 34 container images? Fedora 34 does not release until May 20th 2021: https://fedorapeople.org/groups/schedule/f-34/f-34-key-tasks.html My opinion is that we have some time to work on a solution that integrates the best possible fixes from upstream. Thank you for your comments.
(In reply to Carlos O'Donell from comment #16) > Only Fedora Rawhide is impacted, and the goals of Rawhide are different from > the goals of a stable release. > > Please review the Fedora Rawhide goals here: > https://fedoraproject.org/wiki/Releases/Rawhide#Goals > > "To identify and fix issues with packages before they reach a stable release > of Fedora." I understand that point of view. I will now give the point of view of one of those open source projects running Fedora-rawhide images in the CI: in the CGAL project (https://www.cgal.org/ and https://github.com/CGAL/cgal), we want to identify and fix issues when our software library is compiled with the compilers and system libraries of Fedora Rawhide, so that our software is always ready to run under Fedora XY as soon as it is released. When the `glibc` or the kernel of Rawhide have an issue with `runc`, we can no longer test our software with it. That is probably what Florian wanted to point out in comment #15.
I totally agree, we run CI against fedora:rawhide to catch compiler and library problems early and currently this isn't possible anymore.
Same here. With travis and github actions a work around is to not restrict the container [1] sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ --name mobydick registry.fedoraproject.org/fedora:rawhide \ /tmp/BOUT-dev/.travis_fedora.sh mpich [1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24
(In reply to david08741 from comment #21) > Same here. With travis and github actions a work around is to not restrict > the container [1] > > sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ > --name mobydick registry.fedoraproject.org/fedora:rawhide \ > /tmp/BOUT-dev/.travis_fedora.sh mpich > > [1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24 Or alternatively pass an updated seccomp default profile that includes faccessat2? I see that Moby updated their default profile to include faccessat2 about 4 months ago to SCMP_ACT_ALLOW: https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L97
I am using the "container:" keyword in GitHub action to run on rawhide, is there are workaround for that, too?
An another alternative is to just explicitly downgrade glibc in your rawhide containers. This is viable as a short term hack, as long as new glibc doesn't introduce a new symbol that apps pick up a dependency on, which hasn't been a problem in this rawhide cycle so far. This is how we've temporarily worked around this problem in libvirt, for example https://gitlab.com/libvirt/libvirt-appdev-guide-python/-/commit/93837ef20164a46469e495cfe7bd887e59828bdb
(In reply to Christoph Junghans from comment #20) > I totally agree, we run CI against fedora:rawhide to catch compiler and > library problems early and currently this isn't possible anymore. Those are great reasons to use fedora:rawhide. Thank you for using it! Unfortunately your infrastructure providers have limited your access to kernel functionality and you can no longer run fedora:rawhide. We will continue to track this situation and raise the issue with affected upstreams. We will track this closely as Fedora Rawhide approaches release as Fedora 34. (In reply to Laurent Rineau from comment #17) > I understand that point of view. I will now give the point of view of one of > those open source projects running Fedora-rawhide images in the CI: in the > CGAL project (https://www.cgal.org/ and https://github.com/CGAL/cgal), we > want to identify and fix issues when our software library is compiled with > the compilers and system libraries of Fedora Rawhide, so that our software > is always ready to run under Fedora XY as soon as it is released. When the > `glibc` or the kernel of Rawhide have an issue with `runc`, we can no longer > test our software with it. That is probably what Florian wanted to point out > in comment #15. Please reach out to your infrastructure providers and ask them to update their seccomp filters? This has been done already by systemd for systemd-nspawn to support Fedora and Fedora COPR builders. Upstream for moby looks updated with faccessat2. Upstream updates for runc, docker, and others is still in progress (last I checked) to fix this "once and for all" so the problem doesn't keep happening. Otherwise this will happen again and again until the infrastructure is updated to correctly manage and mediate access to new kernel functionality.
The main place it needs changing is in libseccomp, and the fix is part of the 2.4.4 release[1] onwards. No distribution traditionally used for CI workers ships it. The closest is Ubuntu 20.04 at 2.3.3 but it still means manual poking just to get the fedora:rawhide image working correctly. It's hard to swallow you really expect every "infrastructure provider" to happily jump in to backport newer libseccomp to every server used for CI. [1] https://github.com/seccomp/libseccomp/commit/b3206ad5645dceda89538ea8acc984078ab697ab
*** Bug 1906575 has been marked as a duplicate of this bug. ***
*** Bug 1910208 has been marked as a duplicate of this bug. ***
*** Bug 1914984 has been marked as a duplicate of this bug. ***
This issue is seriously blocking testing of Fedora rawhide and ELN kernels, as containers are heavily utilized in the CKI process.
(In reply to Veronika Kabatova from comment #30) > This issue is seriously blocking testing of Fedora rawhide and ELN kernels, > as containers are heavily utilized in the CKI process. Please talk to your container runtime vendor to fix this. Depending on what you use, bug 1908281 may be what you are after. Unfortunately, there has not been any feedback on that bug.
Fedora as container runtime vendor shipping moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64 (with libseccomp-2.5.0-3.fc33.x86_64) in Fedora 33 manifests the problem.
RHEL 8 currently ships libseccomp < 2.4.4. So iiuc, won't any container runtime running on RHEL8 that uses the system libseccomp show this behavior?
(In reply to Michael Hofmann from comment #33) > RHEL 8 currently ships libseccomp < 2.4.4. So iiuc, won't any container > runtime running on RHEL8 that uses the system libseccomp show this behavior? I do not know. There is no technical requirement for a container runtime to use libseccomp, or the system version of that library. I filed bug 1908281 after verifying that a libseccomp update fixed the issue for a particular container runtime. Each runtime is probably different and likely needs a different investigation.
(In reply to Jan Pazdziora from comment #32) > Fedora as container runtime vendor shipping > moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64 (with > libseccomp-2.5.0-3.fc33.x86_64) in Fedora 33 manifests the problem. Would you please file a bug against moby-engine? Thanks.
Actually, the same version of Docker (docker-ce-20.10.2-3) works on Fedora 33, but not in RHEL 8.2. I mean, with the same container engine version, running Fedora Rawhide shows this issue on RHEL 8.2 but not on Fedora 33.
(In reply to Christoph Junghans from comment #23) > I am using the "container:" keyword in GitHub action to run on rawhide, is > there are workaround for that, too? Yes, you can supply arbitrary docker options: https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idcontaineroptions container: image: docker.io/foo:bar options: --security-opt=... (you can also use --privileged)
(In reply to Martin Pitt from comment #37) > (In reply to Christoph Junghans from comment #23) > > I am using the "container:" keyword in GitHub action to run on rawhide, is > > there are workaround for that, too? > > Yes, you can supply arbitrary docker options: > https://docs.github.com/en/actions/reference/workflow-syntax-for-github- > actions#jobsjob_idcontaineroptions > > container: > image: docker.io/foo:bar > options: --security-opt=... > > (you can also use --privileged) I figured that out yesterday, too, but thanks for mentioning it here!
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34.
FYI I've done more investigation into the situation with GitLab CI The mormal GitLab CI job environment is *fine* with faccessat2() - we can see that it correctly returns ENOSYS and glibc does the fallback Only If using docker:dind (docker-in-docker) then we see faccesat2() returning EPERM. This is not GitLab's fault, rather the problem is in the current "docker:dind" image. This has a version of "runc" that lacks the fix in https://github.com/opencontainers/runc/pull/2750. It appears the docker:dind is updated reasonably frequently with newer runc, so hopefully this should resolve itself in the not too distant future, at which point I think common uses of GitLab CI will be unaffected by this problem FYI, my repo pipeline showing the different scenarios, with only dind failing is https://gitlab.com/berrange/scratch/-/pipelines/253835120
*** Bug 1931616 has been marked as a duplicate of this bug. ***
(In reply to david08741 from comment #21) > Same here. With travis and github actions a work around is to not restrict > the container [1] > > sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ > --name mobydick registry.fedoraproject.org/fedora:rawhide \ > /tmp/BOUT-dev/.travis_fedora.sh mpich > > [1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24 I have a question about the temporary work around. Which command option is better in the following 2 command options to add it to `docker run`? * `--cap-add=SYS_PTRACE --security-opt seccomp=unconfined` * `--security-opt seccomp=unconfined` I tested this issue on my repository with small reproducer. Both command options work. https://bugzilla.redhat.com/show_bug.cgi?id=1931616 https://github.com/junaruga/fedora-test-command-test