Bug 2296951 - glibc: ld.so asserts in some narrow circumstances when an auditor and the application try to load the same library.
Summary: glibc: ld.so asserts in some narrow circumstances when an auditor and the app...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 40
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-07-09 22:37 UTC by Ben Woodard
Modified: 2025-05-20 09:42 UTC (History)
14 users (show)

Fixed In Version: glibc-2.40.9000-9.fc42
Clone Of:
Environment:
Last Closed: 2025-05-20 09:33:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
reproducer (2.88 KB, application/gzip)
2024-07-09 22:45 UTC, Ben Woodard
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHEL-47403 0 None None None 2024-07-16 06:43:56 UTC
Sourceware 31986 0 P2 NEW Loading the same library within an audit library and within an application can cause ld.so to crash with an assert. 2024-07-17 19:01:54 UTC

Description Ben Woodard 2024-07-09 22:37:57 UTC
This is a particularly serious issue for auditor-based tools that need to interface with binaries within the application namespace. Tools often need to make calls to a library immediately when it is loaded before application code starts to use the library. It is not safe to call into the library prior to its init constructors and the auditor interface does not provide a callback after init constructors have run, thus the only alternative is to "promote" the init constructors through a recursive call to dl*open during la_activity(CONSISTENT).

This bug makes this approach unstable in practice. For instance, in HPCToolkit as part of initialization we call into libcuda.so to set up callbacks for monitoring CUDA operations. We call dlopen/dlsym to access the libcuda.so API without creating a direct dependency (to prevent loading libcuda.so for non-CUDA applications). Some application frameworks initiate CUDA operations during their init constructors, to allow us to capture these operations we initialize when libcuda.so is loaded (if we have not already done so to capture other operations of interest, such as thread creation). If the first action by an application framework's init constructor is a dlopen(libcuda.so) (seen in IBM’s XL OpenMP runtime when used by Clang at LLNL for OpenMP offloading), we initialize during this call and recursively dlopen(libcuda.so), and subsequently crash due to this bug.


Reproducible: Always

Steps to Reproduce:
1. tar xvzf recursive-dlopen-crashes.tar.gz
2. cd recursive-dlopen-crashes
3. make
Actual Results:  
<snip> 
Outer dlopen(libinit), inner dlopen(libinit):
LD_AUDIT=./auditor.so ./main
[main] Dlopening libinit...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin: Assertion `r_state == RT_CONSISTENT' failed!
make: [Makefile:19: test] Error 127 (ignored)

Outer dlopen(libwrap), inner dlopen(libwrap):
LD_AUDIT=./auditor-wrap.so ./main-wrap
[main] Dlopening libwrap...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libwrap...
Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin: Assertion `r_state == RT_CONSISTENT' failed!
make: [Makefile:22: test] Error 127 (ignored)


Expected Results:  
no asserts

The problem also affects RHEL9 and this problem was reported a long time ago along with other auditor problems that were fixed in 2.35 but this particular issue was never addressed. Although I didn't specifically test it there, it is fairly likely that upstream glibc is also affected.

Comment 1 Ben Woodard 2024-07-09 22:45:07 UTC
The affected package was glibc-2.39-17.fc40.x86_64 on Fedora. I think it is likely that the rawhide package also has this problem but I haven't tested it yet.

Comment 2 Ben Woodard 2024-07-09 22:45:57 UTC
Created attachment 2039355 [details]
reproducer

Comment 3 Carlos O'Donell 2024-09-06 13:56:01 UTC
v2 upstream patch for this under review:
https://patchwork.sourceware.org/project/glibc/list/?series=37208

Comment 4 Florian Weimer 2024-11-29 14:41:50 UTC
This needs to be backported.

Comment 5 Aoife Moloney 2025-04-28 13:29:13 UTC
This message is a reminder that Fedora Linux 40 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '40'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 40 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 6 Aoife Moloney 2025-05-20 09:33:38 UTC
Fedora Linux 40 entered end-of-life (EOL) status on 2025-05-13.

Fedora Linux 40 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 7 Florian Weimer 2025-05-20 09:42:20 UTC
I don't think we need to backport this fix in to Fedora 41. It's in Fedora 42.


Note You need to log in before you can comment on or make changes to this bug.