This is a particularly serious issue for auditor-based tools that need to interface with binaries within the application namespace. Tools often need to make calls to a library immediately when it is loaded before application code starts to use the library. It is not safe to call into the library prior to its init constructors and the auditor interface does not provide a callback after init constructors have run, thus the only alternative is to "promote" the init constructors through a recursive call to dl*open during la_activity(CONSISTENT). This bug makes this approach unstable in practice. For instance, in HPCToolkit as part of initialization we call into libcuda.so to set up callbacks for monitoring CUDA operations. We call dlopen/dlsym to access the libcuda.so API without creating a direct dependency (to prevent loading libcuda.so for non-CUDA applications). Some application frameworks initiate CUDA operations during their init constructors, to allow us to capture these operations we initialize when libcuda.so is loaded (if we have not already done so to capture other operations of interest, such as thread creation). If the first action by an application framework's init constructor is a dlopen(libcuda.so) (seen in IBM’s XL OpenMP runtime when used by Clang at LLNL for OpenMP offloading), we initialize during this call and recursively dlopen(libcuda.so), and subsequently crash due to this bug. Reproducible: Always Steps to Reproduce: 1. tar xvzf recursive-dlopen-crashes.tar.gz 2. cd recursive-dlopen-crashes 3. make Actual Results: <snip> Outer dlopen(libinit), inner dlopen(libinit): LD_AUDIT=./auditor.so ./main [main] Dlopening libinit... [audit] libinit has been loaded (but not initialized) [audit] First CONSISTENT with libinit, dlopening libinit... Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin: Assertion `r_state == RT_CONSISTENT' failed! make: [Makefile:19: test] Error 127 (ignored) Outer dlopen(libwrap), inner dlopen(libwrap): LD_AUDIT=./auditor-wrap.so ./main-wrap [main] Dlopening libwrap... [audit] libinit has been loaded (but not initialized) [audit] First CONSISTENT with libinit, dlopening libwrap... Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin: Assertion `r_state == RT_CONSISTENT' failed! make: [Makefile:22: test] Error 127 (ignored) Expected Results: no asserts The problem also affects RHEL9 and this problem was reported a long time ago along with other auditor problems that were fixed in 2.35 but this particular issue was never addressed. Although I didn't specifically test it there, it is fairly likely that upstream glibc is also affected.
The affected package was glibc-2.39-17.fc40.x86_64 on Fedora. I think it is likely that the rawhide package also has this problem but I haven't tested it yet.
Created attachment 2039355 [details] reproducer
v2 upstream patch for this under review: https://patchwork.sourceware.org/project/glibc/list/?series=37208
This needs to be backported.
This message is a reminder that Fedora Linux 40 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '40'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 40 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 40 entered end-of-life (EOL) status on 2025-05-13. Fedora Linux 40 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.
I don't think we need to backport this fix in to Fedora 41. It's in Fedora 42.