Created attachment 1844409 [details] TLS test case Description of problem: When two separate threads load TLS library functions sequentially, one thread will be very slow due to a generation counter mismatch (and thus glibc thinking it needs to reallocate memory for it). Version-Release number of selected component (if applicable): 2.28-164.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. yum install gcc gcc-c++ make glibc-devel openssl-devel 2. Unzip shell archive with test case. 3. Run "make". 4. Execute program "tls-test". Actual results: One thread is slower than the other to access TLS variables: none loaded main normal variable : 554.770ms main thread-local variable : 578.829ms lib1 loaded main normal variable : 536.941ms main thread-local variable : 504.300ms lib1 variable : 2079.362ms lib2 loaded main normal variable : 451.575ms main thread-local variable : 434.603ms lib1 variable : 5567.543ms lib2 accessed main normal variable : 424.644ms main thread-local variable : 429.140ms lib1 variable : 1911.933ms Expected results: lib1 variable access time is consistent. Additional info: - Issue was first noted in 2016 (https://patchwork.ozlabs.org/project/glibc/patch/1465309688.1188.19.camel@mailbox.tu-dresden.de/), and a patch was proposed.
Under bug 1991001, we are considering backporting changes to the DTV TLS management, so that it aligns with upstream. This is a prerequisite for backporting an eventual upstream fix for this bug here, which does not exist at this time. We backported the glibc.rtld.optional_static_tls upstream tunable as part of bug 1817513. With the tunable, it is possible to get initial-exec TLS in dlopen'ed shared objects working in more cases. Initial-exec TLS does not suffer from the performance degradation, so it might be an alternative approach. For instance, glibc malloc uses initial-exec TLS to access its thread-local data, so it is not affected by this.