When a library is loaded and relocated as part of an application and it has TLS memory that it updates and then the library is dynamically loaded with dlopen(), the second load can change the generation counter leaving the library's TLS area set as unallocated causing it to be reallocated the next time that it is used. This results in any information that had been stored in the TLS before the dlopen() of the library being lost. This problem was reported by a customer when using libomp and librocprofiler and in that case libomp loses the mappings to its threads. This problem seems to have existed for quite some time. I have verified that it exists as far back as glibc-2.28 in RHEL8 and it still exists in the latest glibc-2.39 found in rawhide. In other words it seems like practically all versions of glibc are affected. The sequence of operations is as follows: Libraries A and B are loaded and relocated A's init constructor is called: Inside, A calls a function that resolves to B B accesses and alters its TLS B is then dlopen()'d by "C" (which may be A or B or neither) Inside, Glibc advances the generation counter and marks B's TLS as "unallocated" B accesses its TLS again, changes from before are lost In the failing case (audit + rocprof), B is libhpcrun.so, A is libomp.so and "C" is librocprofiler.so. In the suspicious case (no audit + rocprof), B is libomp.so, and A and "C" are libomptarget.so. Sometimes this bug is masked by the fact that B's TLS is a static block and so even though its TLS gets "reallocated" in the middle it gets the same memory back and it isn't reinitialized in between, so it looks like nothing happened. In other cases, the library whose TLS gets reallocated is written robustly enough that it simply reinitializes its TLS data and the only apparent effect is a loss of allocated memory. Reproducible: Always Steps to Reproduce: The sequence of operations is as follows: Libraries A and B are loaded and relocated A's init constructor is called: Inside, A calls a function that resolves to B B accesses and alters its TLS B is then dlopen()'d by "C" (which may be A or B or neither) Inside, Glibc advances the generation counter and marks B's TLS as "unallocated" B accesses its TLS again, changes from before are lost
Created attachment 2032307 [details] reproducer
This is currently blocked on upstream review: https://patchwork.sourceware.org/project/glibc/patch/87a5kpolfw.fsf@oldenburg.str.redhat.com/ Once this is fixed upstream we'll have something we can integrate further downstream for testing.
v5 in upstream review: https://patchwork.sourceware.org/project/glibc/list/?series=35599
Still blocked on upstream review.
This is fixed upstream, and now fixed in Fedora Rawhide. We still should fix this in: upstream: release/2.39/master, release/2.40/master fedora: f40, f41 Keeping this open to track backports.
FEDORA-2024-bd3757cab1 (glibc-2.40-11.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2024-bd3757cab1