Bug 990706

Summary: glibc: upgrade creates bogus /lib64/ld-linux-x86-64.so.2 link
Product: Red Hat Enterprise Linux 8 Reporter: Eric Blake <eblake>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED UPSTREAM QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: ashankar, codonell, dj, eblake, fweimer, jorton, mnewsome, pfrankli
Target Milestone: rcKeywords: Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-09 16:36:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Blake 2013-07-31 19:46:15 UTC
Description of problem:
During the process of upgrading glibc, something in the upgrade updates the /lib64/ld-linux-x86-64.so.2 symlink to point to the latest ld-*.so.  Presumably, this is done so that applications can link against a fixed name (the symlink) and automatically use the latest and greatest ld, rather than having to be recompiled when ld bumps version numbers with back-compatible content.  However, the algorithm used in determining the latest .so is rather puny, in that it will happily pick up the wrong file, even a non-executable; once this happens, any attempt to exec an application that was linked against the ld-linux-x86-64.so.2 name will fail.

Below, I'll demonstrate the problem in a safe manner, but I encountered it in a situation where my system was impossible to boot on its own (the kernel panicked when it could not exec the init process).  I was able to fix the problem by recreating a proper symlink when booting into a rescue cd, mounting the right partition, and fixing the symlink while deleting the problematic filename (in my case, I had done a 'cp ld-2.12.so{,.bak}' prior to a 'yum update', in order to be able to more quickly toggle between two versions of glibc for testing whether a particular problem had been fixed, but my testing was thwarted when post-upgrade, processes wouldn't run because the symlink had been redirected to the non-executable copy).

Version-Release number of selected component (if applicable):
glibc-2.12-1.127.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. all steps as root: cd /lib64
2. ls -l ld-*
3. ln ld-2.12.so{,.bak}
4. yum reinstall glibc -y
5. ls -l ld-*
[to undo the damage]
# ln -s ld-2.12.so ld-linux-x86-64.so.2a
# chcon --ref ld-linux-x86-64.so.2{,a}
# mv ld-linux-x86-64.so.2{a,}
mv: overwrite `ld-linux-x86-64.so.2'? y
# rm ld-2.12.so.bak 
rm: remove regular file `ld-2.12.so.bak'? y


Actual results:
2.
# ls -l ld-*
-rwxr-xr-x. 1 root root 154520 Jul 26 21:24 ld-2.12.so
lrwxrwxrwx. 1 root root     10 Jul 31 13:09 ld-linux-x86-64.so.2 -> ld-2.12.so
lrwxrwxrwx. 1 root root     20 Oct 19  2012 ld-lsb-x86-64.so -> ld-linux-x86-64.so.2
lrwxrwxrwx. 1 root root     20 Dec  7  2012 ld-lsb-x86-64.so.3 -> ld-linux-x86-64.so.2

5.
# ls -l ld-*
-rwxr-xr-x. 1 root root 154520 Jul 26 21:24 ld-2.12.so
-rwxr-xr-x. 1 root root 154520 Jul 26 21:24 ld-2.12.so.bak
lrwxrwxrwx. 1 root root     14 Jul 31 13:33 ld-linux-x86-64.so.2 -> ld-2.12.so.bak
lrwxrwxrwx. 1 root root     20 Oct 19  2012 ld-lsb-x86-64.so -> ld-linux-x86-64.so.2
lrwxrwxrwx. 1 root root     20 Dec  7  2012 ld-lsb-x86-64.so.3 -> ld-linux-x86-64.so.2

Oops - the symlink has been redirected to ld-2.12.so.bak, merely because that filename matched the ld-*so* glob and comes later than ld-2.12.so; there was no check for whether the redirected symlink points to an executable file, or even if the file is a version of ld.so by contents.

Expected results:
in step 5, the ld-linux-x86-64.so.2 symlink should not be redirected.  I don't care if its timestamp got updated because the link was removed and recreated, but I do care that it still points to the version of ld-*.so shipped by the rpm, and not some random ld-*.so.* that I left as pollution in the directory.

Additional info:
To make your system unusable and unbootable, replace step 3 with: 'cat ld-2.12.so > ld-2.12.so.bak', and watch as step 4 hits some spectacular failures when it can no longer exec postinstall scriptlets, as well as pretty much else on the system being hosed if it tries to exec a new process.  If you do this, I hope you're comfortable with using a rescue cd to repair the damage.

Comment 4 Ľuboš Kardoš 2014-11-25 14:46:06 UTC
This magic with symlinks is caused by program  /usr/sbin/glibc_post_upgrade.x86_64 that is executed as rpm postinstall program. So I am reassigning this bug back to glibc.

# rpm -qp --scripts /root/glibc-2.12-1.132.el6.x86_64.rpm 
warning: /root/glibc-2.12-1.132.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
postinstall program: /usr/sbin/glibc_post_upgrade.x86_64
postuninstall program: /sbin/ldconfig
 
# ls -l ld-*
-rwxr-xr-x. 1 root root 154520 Nov  5  2013 ld-2.12.so
-rwxr-xr-x. 1 root root 154520 Nov  5  2013 ld-2.12.so.bak
lrwxrwxrwx. 1 root root     10 Nov 25 09:37 ld-linux-x86-64.so.2 -> ld-2.12.so

# /usr/sbin/glibc_post_upgrade.x86_64
# ls -l ld-*
-rwxr-xr-x. 1 root root 154520 Nov  5  2013 ld-2.12.so
-rwxr-xr-x. 1 root root 154520 Nov  5  2013 ld-2.12.so.bak
lrwxrwxrwx. 1 root root     14 Nov 25 09:39 ld-linux-x86-64.so.2 -> ld-2.12.so.bak

Comment 5 Carlos O'Donell 2014-11-28 01:27:36 UTC
(In reply to Ľuboš Kardoš from comment #4)
> postuninstall program: /sbin/ldconfig

Lubos,

Thanks. I completely forgot we ran ldconfig from the post upgrade. That's lame of me and I should have noticed that. Thanks for passing this back to me.

Eric,

Please be aware that changing the dynamic loader from new to old is only part of the solution to having two parallel runtimes. The entire implementation needs to be changed atomically in order to have the safest transition, but we can't do that, so we aim to change all libraries as quickly as possible from a single static upgrade execution image. I expect you were copying *every* library and file from the original to *.bak and back for testing?

The reason I had dejavu is because this is something I had to document for another ticket where a library package was breaking upon upgrade.

You start with this:
ld-lsb-x86-64.so.3 (lsb unique) -> ld-linux-x86-64.so.2 (SONAME)
ld-linux-x86-64.so.2 (SONAME) -> ld-2.12.so (package name)

You change it to this:
ld-lsb-x86-64.so.3 (lsb unique) -> ld-linux-x86-64.so.2 (SONAME)
ld-linux-x86-64.so.2 (SOANME) -> ld-2.12.so (package name)
ld-2.12.so.bak (user created file) <-> ld-2.12.so (hard link)

Then ldconfig will produce this:
ld-lsb-x86-64.so.3 (lsb unique) -> ld-linux-x86-64.so.2 (SONAME)
ld-lsb-x86-64.so (ldconfig created link for linking against lsb unique) -> ld-linux-x86-64.so.2 (SONAME)
ld-linux-x86-64.so.2 (SONAME) -> ld-2.12.so.bak (user created file)
ld-2.12.so <-> ld-2.12.so.bak (hard link)

The *.bak name is considered, by ldconfig, to be another implementation of the DSO. Why? The SONAME of ld-2.12.so and ld-2.12.so.bak are the same, but the ld-2.12.so.bak file was ahead of ld-2.12.so in the directory search.

When updating symlinks ldconfig looks at the soname of all *.so.* files and that will include the *.so.bak which is simply considered another library (a package specific name).

I could add a check to see if the file is executable and refuse to use it in the process, but keep in mind that such a check will pass even during a copy and the symlinks won't be what you expect.

Thoughts?

Comment 6 Eric Blake 2014-12-01 18:42:18 UTC
(In reply to Carlos O'Donell from comment #5)

> Please be aware that changing the dynamic loader from new to old is only
> part of the solution to having two parallel runtimes. The entire
> implementation needs to be changed atomically in order to have the safest
> transition, but we can't do that, so we aim to change all libraries as
> quickly as possible from a single static upgrade execution image. I expect
> you were copying *every* library and file from the original to *.bak and
> back for testing?

No, I was not aware that I needed to do that.  But for the test I was performing at the time I created this bug, I got lucky and things mostly worked with just changing the one library, all until the postinstall script screwed up the symlink to a bogus value.


> 
> When updating symlinks ldconfig looks at the soname of all *.so.* files and
> that will include the *.so.bak which is simply considered another library (a
> package specific name).

Why are you even looking at files not ending in .so or .so.NUMBERS?  Why not do some filename filtering to find only files that could match a valid installed DSO naming scheme (that is, avoid even attempting to determine whether .so.bak is a DSO, because it doesn't match .so.NUMBERS).  Same thing for files ending in ~, or pretty much any other backup scheme - if the file name isn't something with a valid versioning suffix, it shouldn't be considered.

> 
> I could add a check to see if the file is executable and refuse to use it in
> the process, but keep in mind that such a check will pass even during a copy
> and the symlinks won't be what you expect.

Checking for non-executability may help, but won't necessarily catch everything (that is, it will depend on whether a user used 'cp' or 'cp -p' on whether the destination of the copy will still have executable permission).  That said, I'm all for anything you can do to make the process more robust and less likely to create a dangling symlink.

Comment 7 Carlos O'Donell 2014-12-01 19:32:20 UTC
(In reply to Eric Blake from comment #6)
> (In reply to Carlos O'Donell from comment #5)
> 
> > Please be aware that changing the dynamic loader from new to old is only
> > part of the solution to having two parallel runtimes. The entire
> > implementation needs to be changed atomically in order to have the safest
> > transition, but we can't do that, so we aim to change all libraries as
> > quickly as possible from a single static upgrade execution image. I expect
> > you were copying *every* library and file from the original to *.bak and
> > back for testing?
> 
> No, I was not aware that I needed to do that.  But for the test I was
> performing at the time I created this bug, I got lucky and things mostly
> worked with just changing the one library, all until the postinstall script
> screwed up the symlink to a bogus value.

Right. The rate of change in glibc is such that it should work to change just ld.so, but that might fail in weird ways, so please be aware of the problem.

In fact we are looking at being able to provide at least multiple versions of glibc installed in parallel with the ability to switch between them at boot. This would allow you to recover your system, and put back the symlinks and files that were used by an older glibc (similar to the way you can have multiple kernels installed).
 
> > When updating symlinks ldconfig looks at the soname of all *.so.* files and
> > that will include the *.so.bak which is simply considered another library (a
> > package specific name).
> 
> Why are you even looking at files not ending in .so or .so.NUMBERS?  Why not
> do some filename filtering to find only files that could match a valid
> installed DSO naming scheme (that is, avoid even attempting to determine
> whether .so.bak is a DSO, because it doesn't match .so.NUMBERS).  Same thing
> for files ending in ~, or pretty much any other backup scheme - if the file
> name isn't something with a valid versioning suffix, it shouldn't be
> considered.

I don't know that there was any coherent design decision made to support arbitrary endings for DSO files. However, it is what we support today. Changing it would limit what can be used with ldconfig and represents a backwards incompatible change. At present the use case you describe is not sufficient justification to make this change.

> > I could add a check to see if the file is executable and refuse to use it in
> > the process, but keep in mind that such a check will pass even during a copy
> > and the symlinks won't be what you expect.
> 
> Checking for non-executability may help, but won't necessarily catch
> everything (that is, it will depend on whether a user used 'cp' or 'cp -p'
> on whether the destination of the copy will still have executable
> permission).  That said, I'm all for anything you can do to make the process
> more robust and less likely to create a dangling symlink.

The other problem here is that libraries need not have any executable bits set in their permission to be loaded by the dynamic loader. It is only the dynamic loader itself that needs the execute permission. So while this might work to special case the detection of `ld.*\.so` it can't be used for other libraries.

In summary
- Detect `ld.*\.so.*` is being scanned.
- Make sure `ld.*\.so.*` target symlink has any executable bit set, otherwise ignore it.

Comment 12 Carlos O'Donell 2015-10-19 20:02:06 UTC
We won't be fixing this in RHEL 6.x, but we will look at this in RHEL 7.x, thus moving to RHEL 7.x.

Comment 17 Carlos O'Donell 2020-06-09 16:36:10 UTC
We are going to track this issue upstream here:
https://sourceware.org/bugzilla/show_bug.cgi?id=26101

I'm writing a patch to test _dl_cache_libcmp()'s bheaviour, which controls how libraries are sorted and chosen by ldconfig for symlinking.

Likewise I'm going to update the docs to try document this.

I'm marking this CLOSED/UPSTREAM. When upstream is fixed we can decide to backport or let it get pulled into the next major release.

Comment 18 Florian Weimer 2021-06-15 13:52:05 UTC
The behavior described in bug 1652867 comment 18 suggests that this bug also addresses the issue: .bak files are effectively ignored for linking purposes once the soname exists as a real file.