Bug 1869030 - glibc: Back out glibc-rhbz1869030-faccessat2-eperm.patch workaround for systemd UAPI breakage
Summary: glibc: Back out glibc-rhbz1869030-faccessat2-eperm.patch workaround for syste...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1869597 1869624 (view as bug list)
Depends On:
Blocks: 1862977
TreeView+ depends on / blocked
 
Reported: 2020-08-15 15:50 UTC by Fabio Valentini
Modified: 2020-12-03 11:23 UTC (History)
41 users (show)

Fixed In Version: systemd-246.2-1.fc33, glibc-2.32.9000-16.fc34
Doc Type: Release Note
Doc Text:
Older versions of systemd-nspawn do not implement the faccessat2 correctly. Hosts that use systemd-nspawn to launch containers may have to update the systemd package on the host before they can launch containers that contain an update glibc version.
Clone Of:
Environment:
Last Closed: 2020-11-10 16:50:19 UTC
Type: Bug
Embargoed:
kdudka: needinfo-


Attachments (Terms of Use)

Description Fabio Valentini 2020-08-15 15:50:34 UTC
In chroots based on systemd-nspawn (e.g. local mock builds and COPR builds), this leads to a very broken system, primarily caused by permission errors, including "which" being unable to find binaries, RPM scriptlets failing to execute bash scripts, failure to extract debuginfo information from binaries, ln -s unable to create symlinks, etc.

I suspect that this change is to blame:
- Linux: Use faccessat2 to implement faccessat (bug 18683)

because systemd-nspawn filters syscalls. Running mock builds with --isolation=simple works around the issue.

Comment 1 Michael S. 2020-08-15 16:08:06 UTC
I can confirm the problem. I can reproduce with mock on F32. 

If I set a chroot with mock --shell --isolation=nspawn and try to build protobuf with fedpkg local, it work.

As soon as I upgraded glibc from glibc-2.32-1.fc33 (due to mirror lag) to glibc-2.32.9000-1.fc34, it failed like this:

+ mkdir -p third_party/googletest/m4
+ autoreconf -f -i -Wall,no-obsolete
/usr/bin/autoconf: This script requires a shell more modern than all
/usr/bin/autoconf: the shells that I found on your system.
/usr/bin/autoconf: Please tell bug-autoconf about your system,
/usr/bin/autoconf: including any error possibly output before this
/usr/bin/autoconf: message. Then install a modern shell, or manually run
/usr/bin/autoconf: the script under such a shell if you do have one.
autoreconf: /usr/bin/autoconf failed with exit status: 1
error: Bad exit status from /var/tmp/rpm-tmp.UY2ZwJ (%build)



Looking at strace log, and as discussed on IRC with Neal and Fabio, it seems that indeed there is filtering:

faccessat2(AT_FDCWD, "/", X_OK, AT_EACCESS) = -1 EPERM (Operation not permitted)

Comment 2 Michael S. 2020-08-15 16:13:45 UTC
I submitted https://github.com/systemd/systemd/pull/16739 on systemd.

Comment 3 Fabio Valentini 2020-08-15 20:02:23 UTC
Looking at the upstream commit
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=3d3ab573a5f3071992cbc4f57d50d1d29d55bde2

I'm also wondering why the usage of the new syscall is *outside* of the #if condition for the conditional compilation.
Is this *intended* to just fail if glibc was built against kernel 5.8 but is running on < 5.8?

Comment 4 Florian Weimer 2020-08-16 18:37:25 UTC
(In reply to Fabio Valentini from comment #3)
> Looking at the upstream commit
> https://sourceware.org/git/?p=glibc.git;a=commitdiff;
> h=3d3ab573a5f3071992cbc4f57d50d1d29d55bde2
> 
> I'm also wondering why the usage of the new syscall is *outside* of the #if
> condition for the conditional compilation.
> Is this *intended* to just fail if glibc was built against kernel 5.8 but is
> running on < 5.8?

The glibc system call footprint no longer depends on the Linux kernel header version used during built. There are built-in system call tables for every architecture, verified against the installed kernel headers during build.

This was more or less a requirement for Y2038 support because otherwise, glibc behavior at run time would have depended too much on the version of the kernel headers at build time.

Comment 5 Zbigniew Jędrzejewski-Szmek 2020-08-17 17:41:28 UTC
Fixed in rawhide.

Comment 6 Fabio Valentini 2020-08-17 17:46:59 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5)
> Fixed in rawhide.

Great! Now can we get it fixed on stable fedora as well please? And preferably, on COPR hosts? :)
Because that's where most of the problems are  - local mock builds.
And for rawhide COPR builds, the "mock --isolation=simple" workaround obviously doesn't help at all :(

Comment 7 Carlos O'Donell 2020-08-17 18:45:37 UTC
I'm marking this CLOSED/NOTABUG since this isn't an issue in glibc.

Comment 8 Paul Howarth 2020-08-17 18:54:56 UTC
It's a systemd issue affecting ability to use mock on stable Fedora releases, so I re-opened the bug and changed component to systemd.

Comment 9 Zbigniew Jędrzejewski-Szmek 2020-08-17 20:16:22 UTC
Right. So we updated the filter in systemd, which fixes the issue when the newest systemd is used. But as Fabio correctly pointed out, this does not really fix the issue, because we use older systemd in various contexts. I'll also update the systemd in older releases with the same change, but it will be weeks before the change percolates to all releases, and even that is not enough to guarantee that everybody has updated. 

A matching fix needs to be done in glibc as well, i.e. glibc should be updated to fall back to the old code path also when EPERM is encountered. This only needs to be done in rawhide and can be done immediately, and it'll resolve the issue for the contexts where newest glibc is used with older nspawn. (An alternative would be to revert the glibc changes and block them until the updated systemd is everywhere, but I don't think this is an attractive solution at all.)

Comment 10 Carlos O'Donell 2020-08-20 22:19:40 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #9)
> A matching fix needs to be done in glibc as well, i.e. glibc should be
> updated to fall back to the old code path also when EPERM is encountered.
> This only needs to be done in rawhide and can be done immediately, and it'll
> resolve the issue for the contexts where newest glibc is used with older
> nspawn. (An alternative would be to revert the glibc changes and block them
> until the updated systemd is everywhere, but I don't think this is an
> attractive solution at all.)

Agreed. I'm testing a Fedora Rawhide-only fix for this. Upstream won't accept this because the expectation is ENOSYS.

Comment 11 Carlos O'Donell 2020-08-21 04:09:58 UTC
Confirmed new glibc fixes the issue by retesting in --isolation=nspawn on F32 with new glibc and running io/tst-faccessat.

[pid    30] 1597981801.808172 syscall_0x1b7(0x4, 0x55d7ff4ae07f, 0, 0x200, 0xd, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808211 faccessat(4, "should-not-work", F_OK) = -1 ENOTDIR (Not a directory)
[pid    30] 1597981801.808245 close(4)  = 0

This is the ENOTDIR test.

[pid    30] 1597981801.808271 syscall_0x1b7(0x3, 0x55d7ff4ae031, 0, 0x200, 0xffffffff, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808297 faccessat(3, "some-file", F_OK) = 0

This is the F_OK test.

[pid    30] 1597981801.808326 syscall_0x1b7(0x3, 0x55d7ff4ae031, 0x2, 0x200, 0, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808351 faccessat(3, "some-file", W_OK) = 0

This is the W_OK test.

[pid    30] 1597981801.808379 syscall_0x1b7(0x3, 0x55d7ff4ae031, 0x1, 0x200, 0, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808404 faccessat(3, "some-file", X_OK) = -1 EACCES (Permission denied)

This is the faccess X_OK on nonexecutable test.

[pid    30] 1597981801.808433 fchmodat(3, "some-file", 0400) = 0
[pid    30] 1597981801.808466 syscall_0x1b7(0x3, 0x55d7ff4ae031, 0x4, 0x200, 0xffffffff, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808491 faccessat(3, "some-file", R_OK) = 0

This is the R_OK test.

[pid    30] 1597981801.808521 syscall_0x1b7(0x3, 0x55d7ff4ae031, 0x2, 0x200, 0, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808546 faccessat(3, "some-file", W_OK) = 0

This is the W_OK test.

[pid    30] 1597981801.808574 geteuid() = 0
[pid    30] 1597981801.808605 dup(3)    = 4
[pid    30] 1597981801.808629 close(4)  = 0
[pid    30] 1597981801.808653 syscall_0x1b7(0x4, 0x55d7ff4ae031, 0, 0x200, 0, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808678 faccessat(4, "some-file", F_OK) = -1 EBADF (Bad file descriptor)

This is the test for bad descriptor behaviour for existing file.

[pid    30] 1597981801.808709 syscall_0x1b7(0x4, 0x55d7ff4ae0e0, 0, 0x200, 0xffffffff, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808735 faccessat(4, "non-existing-file", F_OK) = -1 EBADF (Bad file descriptor)

This is the test for bad descriptor behaviour for non-existing file.

[pid    30] 1597981801.808765 unlinkat(3, "some-file", 0) = 0
[pid    30] 1597981801.808808 close(3)  = 0
[pid    30] 1597981801.808832 syscall_0x1b7(0xffffffff, 0x55d7ff4ae031, 0, 0x200, 0xffffffff, 0x7fbc36964a60) = -1 EPERM (Operation not permitted)
[pid    30] 1597981801.808857 faccessat(-1, "some-file", F_OK) = -1 EBADF (Bad file descriptor)

Test for fd -1 fails with EBADF.

[pid    30] 1597981801.808889 getpid()  = 30
[pid    30] 1597981801.808929 exit_group(0) = ?
[pid    30] 1597981801.809007 +++ exited with 0 +++

Overall the behaviour is correct and functional and with the test working I'm confident it should work.

If we fail with EPERM in a real permission case then we just duplicate the EPERM test again.

I wasn't able to test on a real 5.8 kernel because --isolation=nspawn doesn't work for me in my Fedora Rawhide VM.

Either way the results look good.

I've pushed this into Fedora Rawhide to fix the builders using the older systemd-nspawn.

Comment 12 Kamil Dudka 2020-08-24 08:44:21 UTC
*** Bug 1869624 has been marked as a duplicate of this bug. ***

Comment 13 Kamil Dudka 2020-08-24 08:45:49 UTC
*** Bug 1869597 has been marked as a duplicate of this bug. ***

Comment 14 Florian Weimer 2020-10-07 06:47:59 UTC
Carlos, we need to back out this downstream-only patch, so that it does not land in Red Hat Enterprise Linux 9 Alpha. I do not see any discussion in this bug or in glibc-rhbz1869030-faccessat2-eperm.patch itself indicating when it is safe to remove the patch.

We should not add long-term downstream-only patches without a plan to back them out again.

Comment 15 Carlos O'Donell 2020-10-16 03:05:41 UTC
(In reply to Florian Weimer from comment #14)
> Carlos, we need to back out this downstream-only patch, so that it does not
> land in Red Hat Enterprise Linux 9 Alpha. I do not see any discussion in
> this bug or in glibc-rhbz1869030-faccessat2-eperm.patch itself indicating
> when it is safe to remove the patch.

Absolutely a good point, the patch should have had a comment about when it
is safe to remove. It can't be removed until we sync with the COPR admins
and determine the systemd-nspawn version used is fixed. We need enough time
between the systemd fix to propagate to those systems and have them get
updated. We may need to push on that issue.

> We should not add long-term downstream-only patches without a plan to back
> them out again.

I agree completely.

We've had the patch in for ~2 months now, and that might be enough time
for COPR builders to be fixed, but we should check with them.

Regarding RHEL9 my plan here was to come back and review this at a later date
when reviewing the set of existing patches we had between RHEL 8 and RHEL 9 to
see if there was anything that was still missing upstream that we need to add
back to RHEL 9. This is the usual RHEL X to RHEL X+1 review. Which we did for
RHEL 8, but didn't get through all the patches. We should do this again for
RHEL 9, but follow through fully on that review... we only have 377 patches
to review (thankfully we can quickly move through them given the rigour we
have with commit messages).

In summary:
- We need to check with COPR admins to see if their systemd-nspawn is new enough to drop the patch.
- We need to do a RHEL8->RHEL9 patch review (which identifies this patch for dropping).

Comment 17 Florian Weimer 2020-10-20 14:08:26 UTC
I've contacted the COPR folks about their status: <https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/message/2PUEHENZKTNOGYBESR7PNUXQTDTKIBZP/>

Comment 18 Ben Cotton 2020-10-21 15:17:18 UTC
Clearing the fedora_prioritized_bug flag as the behavior that prompted nomination has been fixed and the purpose of the bug has changed.

Comment 22 Carlos O'Donell 2020-11-10 16:50:19 UTC
Fedora 33 has been released and the COPR builders are updated to have the latest systemd-nspawn.

The workaround for faccessat is no longer needed. With glibc build glibc-2.32.9000-16.fc34 we drop the patch.

This is now fixed (patch reverted and back to original upstream behaviour).

Fedora 33 never had the faccessat patch so it does not need to be reverted there (thanks for the doc text Florian).

No further reverts required.

Comment 24 Jan Pazdziora 2020-11-13 17:32:24 UTC
Hello, I've recently started to see failures in Docker containers running registry.fedoraproject.org/fedora:rawhide images -- it manifests itself as

   /usr/lib64/apr-1/build/libtool --silent --mode=link gcc -Wl,-z,relro,-z,now   -o mod_authnz_pam.la  -lpam -rpath /usr/lib64/httpd/modules -module -avoid-version    mod_authnz_pam.lo
   libtool:   error: 'mod_authnz_pam.lo' is not a valid libtool object

on Travis CI's Ubuntu

   https://travis-ci.com/github/adelton/mod_authnz_pam/jobs/435689235

and I also see the same failure on my Fedora 33 host with moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64. Stracing the process shows

   154568 stat("mod_authnz_pam.lo", {st_mode=S_IFREG|0644, st_size=280, ...}) = 0
   154568 faccessat2(AT_FDCWD, "mod_authnz_pam.lo", R_OK, AT_EACCESS) = -1 EPERM (Operation not permitted)

which brought me to this bugzilla.

The problem is not present when running the same job on Fedora 33 host with podman-2.1.1-12.fc33.x86_64.

Does the change in rawhide's glibc mean that rawhide will likely no longer work under Dockers? Or that some sort of seccomp incompatilibity that should be relatively easy to workaround?

Comment 25 Florian Bezdeka 2020-11-17 10:01:10 UTC
Hi, I'm coming from https://bugzilla.redhat.com/show_bug.cgi?id=1869624 which 
was marked as a duplicate of this bug.

The problem was solved for several weeks now, but came back now.

My understanding is that according to the info provided by Florian Weimer the 
problem is somewhere on the host. That would mean that the Fedora 34 container 
is not usable on such systems until the host environment gets fixed. Right?

In my case the docker environment is controlled by external parties and I guess 
they are already using the newest systemd package that is available for their 
distribution. So we would no longer be able to test on / for Fedora for a very 
long time.

Comment 26 Fabio Valentini 2020-11-17 10:11:04 UTC
Welp, fedora 34 container images not working on cloud providers would be bad

Comment 27 Kamil Dudka 2020-11-20 14:27:59 UTC
*** Bug 1899913 has been marked as a duplicate of this bug. ***

Comment 28 Jan Pazdziora 2020-11-20 14:44:47 UTC
Is this bugzilla the correct one to use as the dupe target for bug reports that say how this bugzilla broke stuff for other use cases? It is currently CLOSED RAWHIDE -- should it be reopened, or should we use some other bugzilla to track the problems that the change caused?

Comment 29 Daniel Berrangé 2020-11-20 15:46:27 UTC
Commenting on a closed bug generally isn't too productive, so I've filed a new issue to request the workaround be reintroduced https://bugzilla.redhat.com/show_bug.cgi?id=1900021

While docker has been fixed to allow faccessat2, it will be a long while before we can rely on that fix making it into common public cloud deployments


Note You need to log in before you can comment on or make changes to this bug.