Bug 1652867

Summary: glibc: Avoid the need for manually running ldconfig after downgrade
Product: [Fedora] Fedora Reporter: Vít Ondruch <vondruch>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: arjun.is, asosedki, codonell, dj, dominik, fweimer, igor.raits, jonemilj, law, mfabian, nicolas.mailhot, pfrankli, rth, siddhesh, sipoyare, udovdh
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.33.9000-15.fc35 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-05 10:41:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vít Ondruch 2018-11-23 10:53:48 UTC
Description of problem:
Trying to downgrade glibc in mock, the triggers fail:

~~~
Downgraded: glibc-common-2.28.9000-19.fc30.x86_64
  Running scriptlet: glibc-all-langpacks-2.28-5.fc30.x86_64                                                                                                                                                  10/10 
  Running scriptlet: glibc-common-2.28.9000-19.fc30.x86_64                                                                                                                                                   10/10 
error: failed to exec scriptlet interpreter /bin/sh: No such file or directory
warning: %triggerin(info-6.5-11.fc30.x86_64) scriptlet failed, exit status 127

Error in <unknown> scriptlet in rpm package glibc-common
error: failed to exec scriptlet interpreter /bin/sh: No such file or directory
warning: %triggerpostun(glibc-common-2.28-5.fc30.x86_64) scriptlet failed, exit status 127

  Running scriptlet: glibc-common-2.28-5.fc30.x86_64                                                                                                                                                         10/10 
Error in <unknown> scriptlet in rpm package glibc-common
error: failed to exec scriptlet interpreter /bin/sh: No such file or directory
warning: %triggerin(glibc-common-2.28-5.fc30.x86_64) scriptlet failed, exit status 127

Error in <unknown> scriptlet in rpm package glibc-common
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. $ mock -r fedora-rawhide-x86_64 --scrub=all
2. $ mock -r fedora-rawhide-x86_64 -i https://kojipkgs.fedoraproject.org//packages/glibc/2.28/5.fc30/x86_64/glibc{,-{common,all-langpacks,devel,headers}}-2.28-5.fc30.x86_64.rpm
3.

Actual results:
The triggers during downgrade fail rendering the root unusable.


Expected results:
The do downgrade should be possible.


Additional info:
It seems that Lua used to be used for the triggers [1]. Not sure if there were some different issues triggering this change, but in the context of this issue, it was the better option and the commit should be reverted.


[1] https://src.fedoraproject.org/rpms/glibc/c/28e47feb91230b98bc4f57b4906e47c93a2e2dd7

Comment 1 Vít Ondruch 2018-11-23 11:09:20 UTC
This is the PR which introduced this triggers:

https://src.fedoraproject.org/rpms/glibc/pull-request/8

Comment 2 Vít Ondruch 2018-11-23 11:21:15 UTC
Interesting. It seems that the failure happens between glibc-2.28.9000-* and glibc-2.28-5.fc30, which was the latest stable 2.28 release. So it might be something different then Lua vs Shell.

IOW this is fine:

glibc-2.28.9000-15.fc30.x86_64 => glibc-2.28.9000-1.fc30.x86_64

While this fails:

glibc-2.28.9000-15.fc30.x86_64 => glibc-2.28-5.fc30.x86_64
glibc-2.28.9000-1.fc30.x86_64 => glibc-2.28-5.fc30.x86_64

Comment 3 Florian Weimer 2018-11-23 11:32:01 UTC
(In reply to Vít Ondruch from comment #0)
> Additional info:
> It seems that Lua used to be used for the triggers [1]. Not sure if there
> were some different issues triggering this change, but in the context of
> this issue, it was the better option and the commit should be reverted.

I don't think this will completely solve the issue.  We need to figure out in which order RPM removes files and updates symbolic links and find a way to work around breakage that results from this.

Grepping for ld-linux|ld-2.28|execve.*/bin/sh during the downgrade shows this:

93186:25    openat(AT_FDCWD, "/lib64/ld-2.28.so;5bf7e290", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0666) = 44
93228:25    chown("/lib64/ld-2.28.so;5bf7e290", 0, 0) = 0
93229:25    chmod("/lib64/ld-2.28.so;5bf7e290", 0755) = 0
93231:25    utimensat(AT_FDCWD, "/lib64/ld-2.28.so;5bf7e290", [{tv_sec=1534319990, tv_nsec=0} /* 2018-08-15T09:59:50+0200 */, {tv_sec=1534319990, tv_nsec=0} /* 2018-08-15T09:59:50+0200 */], AT_SYMLINK_NOFOLLOW) = 0
93232:25    lstat("/lib64/ld-2.28.so", 0x7ffdb3b42440) = -1 ENOENT (No such file or directory)
93233:25    rename("/lib64/ld-2.28.so;5bf7e290", "/lib64/ld-2.28.so") = 0
93234:25    symlink("ld-2.28.so", "/lib64/ld-linux-x86-64.so.2;5bf7e290") = 0
93236:25    lchown("/lib64/ld-linux-x86-64.so.2;5bf7e290", 0, 0) = 0
93237:25    utimensat(AT_FDCWD, "/lib64/ld-linux-x86-64.so.2;5bf7e290", [{tv_sec=1534319883, tv_nsec=0} /* 2018-08-15T09:58:03+0200 */, {tv_sec=1534319883, tv_nsec=0} /* 2018-08-15T09:58:03+0200 */], AT_SYMLINK_NOFOLLOW) = 0
93242:25    lstat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFLNK|0777, st_size=15, ...}) = 0
93243:25    rename("/lib64/ld-linux-x86-64.so.2;5bf7e290", "/lib64/ld-linux-x86-64.so.2") = 0
98327:25    symlink("../../../../lib64/ld-2.28.so", "/usr/lib/.build-id/d2/8755c775e07f0160005830f64d32bb93ea1ff0;5bf7e290") = 0
105825:28    lstat("/lib64/ld-2.28.9000.so", {st_mode=S_IFREG|0755, st_size=247536, ...}) = 0
105828:28    lstat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFLNK|0777, st_size=10, ...}) = 0
105829:28    stat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFREG|0755, st_size=228072, ...}) = 0
105830:28    openat(AT_FDCWD, "/lib64/ld-linux-x86-64.so.2", O_RDONLY) = 4
106309:28    lstat("/lib64/ld-2.28.so", {st_mode=S_IFREG|0755, st_size=228072, ...}) = 0
106669:28    stat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFREG|0755, st_size=228072, ...}) = 0
106670:28    stat("/lib64/ld-2.28.9000.so", {st_mode=S_IFREG|0755, st_size=247536, ...}) = 0
106671:28    lstat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFLNK|0777, st_size=10, ...}) = 0
106672:28    unlink("/lib64/ld-linux-x86-64.so.2") = 0
106673:28    symlink("ld-2.28.9000.so", "/lib64/ld-linux-x86-64.so.2") = 0
123359:30    execve("/bin/sh", ["/bin/sh", "/var/tmp/rpm-tmp.0fnUhD", "1"], 0x7ffdb3b44540 /* 21 vars */) = 0
133119:31    execve("/bin/sh", ["/bin/sh", "/var/tmp/rpm-tmp.XzSbz3", "1"], 0x7ffdb3b44540 /* 21 vars */) = 0
138675:25    lstat("/lib64/ld-linux-x86-64.so.2", {st_mode=S_IFLNK|0777, st_size=15, ...}) = 0
138680:25    lstat("/lib64/ld-2.28.9000.so", {st_mode=S_IFREG|0755, st_size=247536, ...}) = 0
138681:25    lstat("/lib64/ld-2.28.9000.so", {st_mode=S_IFREG|0755, st_size=247536, ...}) = 0
138682:25    lstat("/lib64/ld-2.28.9000.so", {st_mode=S_IFREG|0755, st_size=247536, ...}) = 0
138683:25    removexattr("/lib64/ld-2.28.9000.so", "security.capability") = -1 ENODATA (No data available)
138684:25    unlink("/lib64/ld-2.28.9000.so")  = 0
146272:34    execve("/bin/sh", ["/bin/sh", "/var/tmp/rpm-tmp.7iz4cE", "0", "0"], 0x7ffdb3b44540 /* 21 vars */) = -1 ENOENT (No such file or directory)
146393:35    execve("/bin/sh", ["/bin/sh", "/var/tmp/rpm-tmp.aNyBSe", "0", "0"], 0x7ffdb3b44540 /* 21 vars */) = -1 ENOENT (No such file or directory)
146631:36    execve("/bin/sh", ["/bin/sh", "/var/tmp/rpm-tmp.0uETzP", "0", "0"], 0x7ffdb3b44540 /* 21 vars */ <unfinished ...>

What RPM is doing here isn't helpful at all.  On line 106673, it restores the ld.so symbolic link to the old (pre-downgrade) version, and then proceeds to delete that ld.so version on line 138684.  (This is with rpm-4.14.2.1-3.fc30.x86_64 in the chroot.)

Solving this completely is not easy because RPM performs some file system updates very late in the transaction.  Files deleted during the transaction stay around for a long time, and if ldconfig is executed by a scriptlet, it will break things during glibc downgrades.

Comment 4 Ben Cotton 2019-08-13 16:51:01 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 5 Ben Cotton 2019-08-13 19:13:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 6 Florian Weimer 2019-10-01 13:53:53 UTC
The analysis in comment 3 is slightly off.  It's ldconfig that restores the symbolic links because the old files (from the to-be-erased package) are still around when it runs.

In order to fix this, we would have to get rid of symbolic links and use paths like /lib64/libc.so.6 directly.  This is something that requires a few upstream build system changes, but is probably generally useful.

Comment 7 Carlos O'Donell 2019-10-08 13:41:18 UTC
*** Bug 1636593 has been marked as a duplicate of this bug. ***

Comment 8 Florian Weimer 2019-11-28 13:02:29 UTC
Patches posted upstream: https://sourceware.org/ml/libc-alpha/2019-11/msg00971.html

Comment 9 Carlos O'Donell 2019-12-03 14:35:51 UTC
Note that ppc64le and s390x have ld.so.1 as a name for the dynamic interpreter and without symlinks this doesn't match any of the searches used by ldconfig to find ld.so as a shared library that you can link against. This will need some additional code upstream to handle this.

Comment 10 Ben Cotton 2020-11-03 15:05:26 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Ben Cotton 2020-11-24 20:05:45 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 12 Alexander Sosedkin 2021-05-21 19:05:54 UTC
Still totally a thing, just lost a system to this bug going down from glibc-2.33-8.fc34 to glibc-2.32-4.fc33.

Comment 13 Florian Weimer 2021-05-27 12:00:03 UTC
*** Bug 1965289 has been marked as a duplicate of this bug. ***

Comment 14 udo 2021-05-27 12:06:56 UTC
I.e.: also happens on Fedora 34 when going back from fedora 35 glibc to Fedora 34 glibc.
Is there a (manual) workaround to make the downgraded system work?
(i.e.: boot fedora Live DVD and fix whatever..)

Comment 15 Florian Weimer 2021-05-27 12:12:27 UTC
Just running ldconfig should fix things. You need to follow the usual steps for system recovery (get a writable root file system before, and trigger an SELinux relabel afterwards).

Comment 16 udo 2021-05-27 12:24:30 UTC
Thanks!
yes, `rpm -Uvh --force --nodeps  glibc* libnsl-2.33-8.fc34.x86_64.rpm  nscd-2.33-8.fc34.x86_64.rpm ; ldconfig -v` worked.

Comment 17 Florian Weimer 2021-06-15 12:28:58 UTC
Changes are in dist-git, incorporating a proposed upstream patch before upstream review.

Comment 18 Florian Weimer 2021-06-15 13:49:47 UTC
There is a new warning during updates:

/usr/sbin/ldconfig: /lib64/ld-linux-x86-64.so.2 is not a symbolic link

I think it's harmless. I think it stems from processing a leftover ld-2.33.so file which has not been removed by RPM yet. Its soname is ld-linux-x86-64.so.2, so ldconfig schedules the creation a symbolic link, but realizes later that this file already exists, and does nothing instead (except printing this warning). It's *so* close to not working, but this time, it looks like we are lucky and it actually works.

Comment 19 Fedora Update System 2021-06-16 08:27:44 UTC
FEDORA-2021-9ce0f65a09 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Dominik 'Rathann' Mierzejewski 2021-07-03 22:22:26 UTC
This has caused ld.so to move from /usr/lib64 to /usr/lib on aarch64 and s390x and broken lddtree.py test of pax-utils (https://koschei.fedoraproject.org/build/10505661). Was that intentional?

$ rpmdiff glibc-2.33.9000-14.fc35.aarch64.rpm glibc-2.33.9000-15.fc35.aarch64.rpm
...
SM5.......T /lib/ld-linux-aarch64.so.1
removed     /lib64/ld-2.33.9000.so
removed     /lib64/ld-linux-aarch64.so.1
$ rpm -q glibc
glibc-2.33.9000-15.fc35.aarch64
$ ls -l /lib64/ld*
ls: cannot access '/lib64/ld*': No such file or directory
$ ls -l /lib/ld*
-rwxr-xr-x. 1 root root 814848 Jun 15 14:50 /lib/ld-linux-aarch64.so.1

It seems counter-intuitive to have ld.so in /lib while the rest of the libraries are in /lib64 on a 64-bit arch.

Comment 21 Florian Weimer 2021-07-04 06:18:42 UTC
(In reply to Dominik 'Rathann' Mierzejewski from comment #20)
> This has caused ld.so to move from /usr/lib64 to /usr/lib on aarch64 and
> s390x and broken lddtree.py test of pax-utils
> (https://koschei.fedoraproject.org/build/10505661). Was that intentional?

Yes, it's required to work around the RPM issue.

> It seems counter-intuitive to have ld.so in /lib while the rest of the
> libraries are in /lib64 on a 64-bit arch.

The path to the dynamic loader is mandated by the psABI supplement. (It is hard-coded into main programs.) We cannot change it.

Test expectations will have to be adjusted accordingly.

Comment 22 Dominik 'Rathann' Mierzejewski 2021-07-05 07:36:23 UTC
(In reply to Florian Weimer from comment #21)
> (In reply to Dominik 'Rathann' Mierzejewski from comment #20)
[...]
> > It seems counter-intuitive to have ld.so in /lib while the rest of the
> > libraries are in /lib64 on a 64-bit arch.
> 
> The path to the dynamic loader is mandated by the psABI supplement. (It is
> hard-coded into main programs.) We cannot change it.

Could you point to the specific document saying that ld.so must be in /lib on aarch64?
I found the aarch64 ABI documentation (https://github.com/ARM-software/abi-aa), but
I'm unable to find this specific requirement.

> Test expectations will have to be adjusted accordingly.

Obviously.

Comment 23 Florian Weimer 2021-07-05 08:10:53 UTC
(In reply to Dominik 'Rathann' Mierzejewski from comment #22)
> Could you point to the specific document saying that ld.so must be in /lib
> on aarch64?
> I found the aarch64 ABI documentation
> (https://github.com/ARM-software/abi-aa), but
> I'm unable to find this specific requirement.

The PT_INTERP value must be consistent across distributions. Not all of them use /lib64 paths, which makes /lib a more natural choice in some ways.

I believe the System V psABI supplement for AArch64 has not yet been published. There are various ELF-related AArch64 specifications, but they equally apply to embedded scenarios. They do not specify Linux-specific aspects such as the ELF interpreter name or the minimum and maximum page size.

Comment 24 Dominik 'Rathann' Mierzejewski 2021-07-05 09:42:33 UTC
I'm only asking because you changed the location from /lib64 to /lib on aarch64 and s390x and Carlos only tested on i686 and x86_64 where you actually have both /lib and /lib64 due to multilib. This move was not explicitly mentioned, either. There may be other software out there that's expecting ld.so to be in /lib64 on 64-bit arches.

Comment 25 Florian Weimer 2021-07-05 09:57:25 UTC
The official name of the program interpreter was always located under /lib, and that didn't change. Software which expects the program interpreter to exist in /lib64 is already non-portable. For example, Debian does not have a /lib64 directory at all: https://packages.debian.org/buster/arm64/libc6/filelist

Comment 26 Dominik 'Rathann' Mierzejewski 2021-07-05 10:26:00 UTC
I get the point. Thanks for the explanation. Shouldn't this bug be closed, by the way?

Comment 27 Florian Weimer 2021-07-05 10:41:40 UTC
(In reply to Dominik 'Rathann' Mierzejewski from comment #26)
> I get the point. Thanks for the explanation. Shouldn't this bug be closed,
> by the way?

Indeed, closing.