Bug 2279885

Summary: TLS for a library gets inappropriately marked unallocated when a library is loaded in two contexts
Product: [Fedora] Fedora Reporter: Ben Woodard <woodard>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 40CC: arjun, codonell, dj, fberat, fweimer, jlaw, josmyers, mcermak, mcoufal, mfabian, pfrankli, sipoyare, skolosov
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glibc-2.40.9000-1.fc42 glibc-2.40-11.fc41 glibc-2.39-30.fc40 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-04-25 10:45:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Woodard 2024-05-09 14:41:47 UTC
When a library is loaded and relocated as part of an application and it has TLS memory that it updates and then the library is dynamically loaded with dlopen(), the second load can change the generation counter leaving the library's TLS area set as unallocated causing it to be reallocated the next time that it is used. This results in any information that had been stored in the TLS before the dlopen() of the library being lost.

This problem was reported by a customer when using libomp and librocprofiler and in that case libomp loses the mappings to its threads.

This problem seems to have existed for quite some time. I have verified that it exists as far back as glibc-2.28 in RHEL8 and it still exists in the latest glibc-2.39 found in rawhide. In other words it seems like practically all versions of glibc are affected. 

The sequence of operations is as follows:

    Libraries A and B are loaded and relocated
    A's init constructor is called:
        Inside, A calls a function that resolves to B
        B accesses and alters its TLS
    B is then dlopen()'d by "C" (which may be A or B or neither)
        Inside, Glibc advances the generation counter and marks B's TLS as "unallocated"
    B accesses its TLS again, changes from before are lost

In the failing case (audit + rocprof), B is libhpcrun.so, A is libomp.so and "C" is librocprofiler.so. In the suspicious case (no audit + rocprof), B is libomp.so, and A and "C" are libomptarget.so.

Sometimes this bug is masked by the fact that B's TLS is a static block and so even though its TLS gets "reallocated" in the middle it gets the same memory back and it isn't reinitialized in between, so it looks like nothing happened. In other cases, the library whose TLS gets reallocated is written robustly enough that it simply reinitializes its TLS data and the only apparent effect is a loss of allocated memory.

Reproducible: Always

Steps to Reproduce:
The sequence of operations is as follows:

    Libraries A and B are loaded and relocated
    A's init constructor is called:
        Inside, A calls a function that resolves to B
        B accesses and alters its TLS
    B is then dlopen()'d by "C" (which may be A or B or neither)
        Inside, Glibc advances the generation counter and marks B's TLS as "unallocated"
    B accesses its TLS again, changes from before are lost

Comment 1 Ben Woodard 2024-05-09 14:42:28 UTC
Created attachment 2032307 [details]
reproducer

Comment 4 Carlos O'Donell 2024-05-31 13:21:46 UTC
This is currently blocked on upstream review:
https://patchwork.sourceware.org/project/glibc/patch/87a5kpolfw.fsf@oldenburg.str.redhat.com/

Once this is fixed upstream we'll have something we can integrate further downstream for testing.

Comment 5 Carlos O'Donell 2024-06-28 13:21:34 UTC
v5 in upstream review:
https://patchwork.sourceware.org/project/glibc/list/?series=35599

Comment 6 Carlos O'Donell 2024-07-19 13:31:32 UTC
Still blocked on upstream review.

Comment 7 Carlos O'Donell 2024-09-06 14:00:31 UTC
This is fixed upstream, and now fixed in Fedora Rawhide.

We still should fix this in:

upstream: release/2.39/master, release/2.40/master
fedora: f40, f41

Keeping this open to track backports.

Comment 8 Fedora Update System 2024-11-07 08:20:49 UTC
FEDORA-2024-bd3757cab1 (glibc-2.40-11.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-bd3757cab1

Comment 9 Fedora Update System 2024-11-08 02:11:45 UTC
FEDORA-2024-bd3757cab1 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-bd3757cab1`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-bd3757cab1

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 10 Fedora Update System 2024-11-14 03:01:56 UTC
FEDORA-2024-bd3757cab1 (glibc-2.40-11.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 11 Fedora Update System 2024-12-12 10:41:28 UTC
FEDORA-2024-d135dd8f39 (glibc-2.39-30.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-d135dd8f39

Comment 12 Fedora Update System 2024-12-13 02:38:59 UTC
FEDORA-2024-d135dd8f39 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-d135dd8f39`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d135dd8f39

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2024-12-15 02:40:27 UTC
FEDORA-2024-d135dd8f39 (glibc-2.39-30.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 14 Aoife Moloney 2025-04-25 10:40:02 UTC
This message is a reminder that Fedora Linux 40 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '40'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 40 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.