Bug 1900021 - glibc: faccessat() is broken when running under containers with restrictive seccomp filters.
Summary: glibc: faccessat() is broken when running under containers with restrictive s...
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1899913 1906575 1910208 1914984 1931616 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-20 15:45 UTC by Daniel Berrangé
Modified: 2021-02-23 17:03 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Daniel Berrangé 2020-11-20 15:45:09 UTC
Description of problem:

This problem was previously reported and fixed in:

  https://bugzilla.redhat.com/show_bug.cgi?id=1869030

but the patch for that bug was then backed out, because it was only thought to be relevant in context of systemd-nspawn.

In practice this appears to hit much more widely. 

This glibc change has broken Fedora rawhide when running inside docker containers. This has broken libvirt CI running on GitLab CI, which uses docker

Downgrading from glibc-2.32.9000-16.fc34.x86_64.rpm, back to glibc-2.32.9000-15.fc34.x86_64.rpm, which has the workaround, fixes execution inside docker but that's not viable for users todo manually every time.

A patch has been sent to Docker to allow faccessat2, but that is pretty recent and so it is not widely deployed at this time.

   https://github.com/moby/moby/pull/41353/files

I think that glibc needs to keep the workaround from bug 1869030 a good while longer yet, to allow time for fixed docker to become available widely.


I suspect the more general root cause problem lies in 'runc' which is returning EPERM when filtering syscalls instead of ENOSYS

https://github.com/opencontainers/runc/issues/2151

Version-Release number of selected component (if applicable):
glibc-2.32.9000-16.fc34.x86_64.rpm


How reproducible:
I've seen it in docker under GitLab Ci, but not in podman running locally, but I suspect that's because my podman setup isn't doing syscall filtering

Comment 1 Jan Pazdziora 2020-11-20 15:56:57 UTC
*** Bug 1899913 has been marked as a duplicate of this bug. ***

Comment 2 Jan Pazdziora 2020-11-21 09:58:12 UTC
The issue can be demonstrated on Fedora 33 host with docker from moby-engine packaged in Fedora 33, see bug 1897493.

Comment 3 Iñaki Ucar 2020-11-23 13:36:57 UTC
This is bad. I think I'm hitting the same issue. R (in a rawhide docker image) stopped working after updating glibc to glibc-2.32.9000-16.fc34. I see:

$ R
ERROR: R_HOME ('/usr/lib64/R') not found

Comment 4 Florian Weimer 2020-11-24 13:41:58 UTC
glibc is basically sitting between the kernel and the cloud. I've brought the discussion to what I think are the appropriate forums:

https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/
https://groups.google.com/a/opencontainers.org/g/dev/c/8Phfq3VBxtw

I've also posted a glibc upstream patch to show what it would look like:

https://sourceware.org/pipermail/libc-alpha/2020-November/119955.html

Personally, I find it difficult to support such an approach technically, and I would like to see some reassurance from kernel developers that this is okay.

Comment 5 Daniel Berrangé 2020-11-24 14:25:56 UTC
Can we get the previous workaround re-applied to rawhide as a stop gap until the upstream discussions reach some conclusion about the right long term fix ? This broken faccessat() is quite disruptive to people using rawhide in containers

Comment 6 Florian Weimer 2020-12-01 14:17:37 UTC
The workaround has been categorically rejected by kernel developers and glibc developers alike. Work is under way to address this in runc and potentially libseccomp.

Comment 7 Christoph Junghans 2020-12-01 15:52:19 UTC
Here is another case that fails for the same reason:

FROM registry.fedoraproject.org/fedora:rawhide
RUN echo "echo test" > test.sh
RUN chmod +x test.sh
RUN ls -l test.sh
RUN test -x test.sh

Comment 15 Florian Bezdeka 2020-12-08 18:44:49 UTC
I really appreciate that the fix is on the way.

I just want to point out again that runc und libseccomp are components that are often in the scope of infrastructure operators.
So it will take a (long) time until they get updated. There are so many people involved that it simply will take a lot of time...

Developer providing the fix, upstream review and hopefully merge, backports to stable versions used by all the major 
distributions out there, package distribution and finally the operator who accepts the new version and deploys it...

Some projects already had to disabled their CI jobs for building and testing on current Fedora releases because of this issue.

Any chance to get something like a "special" package for the use in containers or provide "working" Fedora 34 container images?

Comment 16 Carlos O'Donell 2020-12-08 19:38:47 UTC
(In reply to Florian Bezdeka from comment #15)
> Some projects already had to disabled their CI jobs for building and testing
> on current Fedora releases because of this issue.

Only Fedora Rawhide is impacted, and the goals of Rawhide are different from the goals of a stable release.

Please review the Fedora Rawhide goals here:
https://fedoraproject.org/wiki/Releases/Rawhide#Goals

"To identify and fix issues with packages before they reach a stable release of Fedora."

> Any chance to get something like a "special" package for the use in
> containers or provide "working" Fedora 34 container images?

Fedora 34 does not release until May 20th 2021:
https://fedorapeople.org/groups/schedule/f-34/f-34-key-tasks.html

My opinion is that we have some time to work on a solution that integrates the best possible fixes from upstream.

Thank you for your comments.

Comment 17 Laurent Rineau 2020-12-09 09:05:55 UTC
(In reply to Carlos O'Donell from comment #16)
> Only Fedora Rawhide is impacted, and the goals of Rawhide are different from
> the goals of a stable release.
> 
> Please review the Fedora Rawhide goals here:
> https://fedoraproject.org/wiki/Releases/Rawhide#Goals
> 
> "To identify and fix issues with packages before they reach a stable release
> of Fedora."

I understand that point of view. I will now give the point of view of one of those open source projects running Fedora-rawhide images in the CI: in the CGAL project (https://www.cgal.org/ and https://github.com/CGAL/cgal), we want to identify and fix issues when our software library is compiled with the compilers and system libraries of Fedora Rawhide, so that our software is always ready to run under Fedora XY as soon as it is released.  When the `glibc` or the kernel of Rawhide have an issue with `runc`, we can no longer test our software with it. That is probably what Florian wanted to point out in comment #15.

Comment 20 Christoph Junghans 2020-12-09 18:38:04 UTC
I totally agree, we run CI against fedora:rawhide to catch compiler and library problems early and currently this isn't possible anymore.

Comment 21 david08741 2020-12-09 21:07:39 UTC
Same here. With travis and github actions a work around is to not restrict the container [1]

sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
         --name mobydick registry.fedoraproject.org/fedora:rawhide \
	     /tmp/BOUT-dev/.travis_fedora.sh mpich

[1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24

Comment 22 Carlos O'Donell 2020-12-09 21:49:41 UTC
(In reply to david08741 from comment #21)
> Same here. With travis and github actions a work around is to not restrict
> the container [1]
> 
> sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
>          --name mobydick registry.fedoraproject.org/fedora:rawhide \
> 	     /tmp/BOUT-dev/.travis_fedora.sh mpich
> 
> [1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24

Or alternatively pass an updated seccomp default profile that includes faccessat2?

I see that Moby updated their default profile to include faccessat2 about 4 months ago to SCMP_ACT_ALLOW:
https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L97

Comment 23 Christoph Junghans 2020-12-09 22:10:51 UTC
I am using the "container:" keyword in GitHub action to run on rawhide, is there are workaround for that, too?

Comment 24 Daniel Berrangé 2020-12-10 09:22:23 UTC
An another alternative is to just explicitly downgrade glibc in your rawhide containers. This is viable as a short term hack, as long as new glibc doesn't introduce a new symbol that apps pick up a dependency on, which hasn't been a problem in this rawhide cycle so far. This is how we've temporarily worked around this problem in libvirt, for example

https://gitlab.com/libvirt/libvirt-appdev-guide-python/-/commit/93837ef20164a46469e495cfe7bd887e59828bdb

Comment 25 Carlos O'Donell 2020-12-11 05:31:44 UTC
(In reply to Christoph Junghans from comment #20)
> I totally agree, we run CI against fedora:rawhide to catch compiler and
> library problems early and currently this isn't possible anymore.

Those are great reasons to use fedora:rawhide. Thank you for using it!

Unfortunately your infrastructure providers have limited your access to kernel functionality and you can no longer run fedora:rawhide.

We will continue to track this situation and raise the issue with affected upstreams.

We will track this closely as Fedora Rawhide approaches release as Fedora 34.

(In reply to Laurent Rineau from comment #17)
> I understand that point of view. I will now give the point of view of one of
> those open source projects running Fedora-rawhide images in the CI: in the
> CGAL project (https://www.cgal.org/ and https://github.com/CGAL/cgal), we
> want to identify and fix issues when our software library is compiled with
> the compilers and system libraries of Fedora Rawhide, so that our software
> is always ready to run under Fedora XY as soon as it is released.  When the
> `glibc` or the kernel of Rawhide have an issue with `runc`, we can no longer
> test our software with it. That is probably what Florian wanted to point out
> in comment #15.

Please reach out to your infrastructure providers and ask them to update their seccomp filters?

This has been done already by systemd for systemd-nspawn to support Fedora and Fedora COPR builders.

Upstream for moby looks updated with faccessat2.

Upstream updates for runc, docker, and others is still in progress (last I checked) to fix this "once and for all" so the problem doesn't keep happening.

Otherwise this will happen again and again until the infrastructure is updated to correctly manage and mediate access to new kernel functionality.

Comment 26 Bartłomiej Piotrowski 2020-12-11 12:40:12 UTC
The main place it needs changing is in libseccomp, and the fix is part of the 2.4.4 release[1] onwards.

No distribution traditionally used for CI workers ships it. The closest is Ubuntu 20.04 at 2.3.3 but it still means manual poking just to get the fedora:rawhide image working correctly.

It's hard to swallow you really expect every "infrastructure provider" to happily jump in to backport newer libseccomp to every server used for CI.

[1] https://github.com/seccomp/libseccomp/commit/b3206ad5645dceda89538ea8acc984078ab697ab

Comment 27 Carlos O'Donell 2020-12-15 14:27:41 UTC
*** Bug 1906575 has been marked as a duplicate of this bug. ***

Comment 28 Kamil Dudka 2020-12-25 10:01:10 UTC
*** Bug 1910208 has been marked as a duplicate of this bug. ***

Comment 29 Kamil Dudka 2021-01-11 16:54:47 UTC
*** Bug 1914984 has been marked as a duplicate of this bug. ***

Comment 30 Veronika Kabatova 2021-01-15 11:37:44 UTC
This issue is seriously blocking testing of Fedora rawhide and ELN kernels, as containers are heavily utilized in the CKI process.

Comment 31 Florian Weimer 2021-01-15 11:47:22 UTC
(In reply to Veronika Kabatova from comment #30)
> This issue is seriously blocking testing of Fedora rawhide and ELN kernels,
> as containers are heavily utilized in the CKI process.

Please talk to your container runtime vendor to fix this.

Depending on what you use, bug 1908281 may be what you are after. Unfortunately, there has not been any feedback on that bug.

Comment 32 Jan Pazdziora 2021-01-15 12:00:27 UTC
Fedora as container runtime vendor shipping moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64 (with libseccomp-2.5.0-3.fc33.x86_64) in Fedora 33 manifests the problem.

Comment 33 Michael Hofmann 2021-01-15 12:13:47 UTC
RHEL 8 currently ships libseccomp < 2.4.4. So iiuc, won't any container runtime running on RHEL8 that uses the system libseccomp show this behavior?

Comment 34 Florian Weimer 2021-01-15 12:21:56 UTC
(In reply to Michael Hofmann from comment #33)
> RHEL 8 currently ships libseccomp < 2.4.4. So iiuc, won't any container
> runtime running on RHEL8 that uses the system libseccomp show this behavior?

I do not know. There is no technical requirement for a container runtime to use libseccomp, or the system version of that library. I filed bug 1908281 after verifying that a libseccomp update fixed the issue for a particular container runtime. Each runtime is probably different and likely needs a different investigation.

Comment 35 Florian Weimer 2021-01-15 12:21:56 UTC
(In reply to Jan Pazdziora from comment #32)
> Fedora as container runtime vendor shipping
> moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64 (with
> libseccomp-2.5.0-3.fc33.x86_64) in Fedora 33 manifests the problem.

Would you please file a bug against moby-engine? Thanks.

Comment 36 Juanje Ojeda 2021-01-15 12:38:26 UTC
Actually, the same version of Docker (docker-ce-20.10.2-3) works on Fedora 33, but not in RHEL 8.2.
I mean, with the same container engine version, running Fedora Rawhide shows this issue on RHEL 8.2 but not on Fedora 33.

Comment 37 Martin Pitt 2021-01-27 18:42:08 UTC
(In reply to Christoph Junghans from comment #23)
> I am using the "container:" keyword in GitHub action to run on rawhide, is
> there are workaround for that, too?

Yes, you can supply arbitrary docker options: https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idcontaineroptions

container:
  image: docker.io/foo:bar
  options: --security-opt=...

(you can also use --privileged)

Comment 38 Christoph Junghans 2021-01-27 21:23:52 UTC
(In reply to Martin Pitt from comment #37)
> (In reply to Christoph Junghans from comment #23)
> > I am using the "container:" keyword in GitHub action to run on rawhide, is
> > there are workaround for that, too?
> 
> Yes, you can supply arbitrary docker options:
> https://docs.github.com/en/actions/reference/workflow-syntax-for-github-
> actions#jobsjob_idcontaineroptions
> 
> container:
>   image: docker.io/foo:bar
>   options: --security-opt=...
> 
> (you can also use --privileged)

I figured that out yesterday, too, but thanks for mentioning it here!

Comment 39 Ben Cotton 2021-02-09 16:12:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 40 Daniel Berrangé 2021-02-09 17:55:00 UTC
FYI I've done more investigation into the situation with GitLab CI

The mormal GitLab CI job environment is *fine* with faccessat2() - we can see that it correctly returns ENOSYS and glibc does the fallback

Only If using  docker:dind (docker-in-docker) then we see faccesat2() returning EPERM.  This is not GitLab's fault, rather the problem is in the current "docker:dind" image. This has a version of "runc" that lacks the fix in https://github.com/opencontainers/runc/pull/2750.

It appears the docker:dind is updated reasonably frequently with newer runc, so hopefully this should resolve itself in the not too distant future, at which point I think common uses of GitLab CI will be unaffected by this problem

FYI, my repo pipeline showing the different scenarios, with only dind failing is

https://gitlab.com/berrange/scratch/-/pipelines/253835120

Comment 41 Kamil Dudka 2021-02-23 08:36:38 UTC
*** Bug 1931616 has been marked as a duplicate of this bug. ***

Comment 42 Jun Aruga 2021-02-23 17:03:11 UTC
(In reply to david08741 from comment #21)
> Same here. With travis and github actions a work around is to not restrict
> the container [1]
> 
> sudo docker create --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
>          --name mobydick registry.fedoraproject.org/fedora:rawhide \
> 	     /tmp/BOUT-dev/.travis_fedora.sh mpich
> 
> [1] https://github.com/boutproject/BOUT-dev/blob/next/.travis_fedora.sh#L24

I have a question about the temporary work around.
Which command option is better in the following 2 command options to add it to `docker run`?

* `--cap-add=SYS_PTRACE --security-opt seccomp=unconfined`
* `--security-opt seccomp=unconfined`

I tested this issue on my repository with small reproducer. Both command options work.
https://bugzilla.redhat.com/show_bug.cgi?id=1931616
https://github.com/junaruga/fedora-test-command-test


Note You need to log in before you can comment on or make changes to this bug.