This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2028296 - glibc: TLS performance degradation when loading two or more threads
Summary: glibc: TLS performance degradation when loading two or more threads
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: 8.5
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-01 22:58 UTC by Andrew Mike
Modified: 2023-09-04 12:22 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-04 12:22:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
TLS test case (5.14 KB, application/x-shellscript)
2021-12-01 22:58 UTC, Andrew Mike
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-2122 0 None Migrated None 2023-09-04 12:17:46 UTC
Red Hat Issue Tracker RHELPLAN-104505 0 None None None 2021-12-01 23:01:27 UTC
Sourceware 19924 0 P2 NEW TLS performance degradation after dlopen 2023-09-04 12:12:59 UTC

Description Andrew Mike 2021-12-01 22:58:24 UTC
Created attachment 1844409 [details]
TLS test case

Description of problem: When two separate threads load TLS library functions sequentially, one thread will be very slow due to a generation counter mismatch (and thus glibc thinking it needs to reallocate memory for it).

Version-Release number of selected component (if applicable):
2.28-164.el8.x86_64

How reproducible: 100%

Steps to Reproduce:
1. yum install gcc gcc-c++ make glibc-devel openssl-devel
2. Unzip shell archive with test case.
3. Run "make".
4. Execute program "tls-test".

Actual results:
One thread is slower than the other to access TLS variables:

none loaded
  main normal variable          : 554.770ms
  main thread-local variable    : 578.829ms
lib1 loaded
  main normal variable          : 536.941ms
  main thread-local variable    : 504.300ms
  lib1 variable                 : 2079.362ms
lib2 loaded
  main normal variable          : 451.575ms
  main thread-local variable    : 434.603ms
  lib1 variable                 : 5567.543ms
lib2 accessed
  main normal variable          : 424.644ms
  main thread-local variable    : 429.140ms
  lib1 variable                 : 1911.933ms

Expected results: lib1 variable access time is consistent.

Additional info:

- Issue was first noted in 2016 (https://patchwork.ozlabs.org/project/glibc/patch/1465309688.1188.19.camel@mailbox.tu-dresden.de/), and a patch was proposed.

Comment 2 Florian Weimer 2021-12-02 07:32:36 UTC
Under bug 1991001, we are considering backporting changes to the DTV TLS management, so that it aligns with upstream. This is a prerequisite for backporting an eventual upstream fix for this bug here, which does not exist at this time.

We backported the glibc.rtld.optional_static_tls upstream tunable as part of bug 1817513. With the tunable, it is possible to get initial-exec TLS in dlopen'ed shared objects working in more cases. Initial-exec TLS does not suffer from the performance degradation, so it might be an alternative approach. For instance, glibc malloc uses initial-exec TLS to access its thread-local data, so it is not affected by this.

Comment 4 Florian Weimer 2023-09-04 12:14:30 UTC
Upstream commit:

commit d2123d68275acc0f061e73d5f86ca504e0d5a344
Author: Szabolcs Nagy <szabolcs.nagy>
Date:   Tue Feb 16 12:55:13 2021 +0000

    elf: Fix slow tls access after dlopen [BZ #19924]
    
    In short: __tls_get_addr checks the global generation counter and if
    the current dtv is older then _dl_update_slotinfo updates dtv up to the
    generation of the accessed module. So if the global generation is newer
    than generation of the module then __tls_get_addr keeps hitting the
    slow dtv update path. The dtv update path includes a number of checks
    to see if any update is needed and this already causes measurable tls
    access slow down after dlopen.
    
    It may be possible to detect up-to-date dtv faster.  But if there are
    many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
    least walking the slotinfo list.
    
    This patch tries to update the dtv to the global generation instead, so
    after a dlopen the tls access slow path is only hit once.  The modules
    with larger generation than the accessed one were not necessarily
    synchronized before, so additional synchronization is needed.
    
    This patch uses acquire/release synchronization when accessing the
    generation counter.
    
    Note: in the x86_64 version of dl-tls.c the generation is only loaded
    once, since relaxed mo is not faster than acquire mo load.
    
    I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
    powerpc, sparc, x86 who reported that it fixes the performance issue
    of bug 19924.
    
    Reviewed-by: Adhemerval Zanella  <adhemerval.zanella>

Comment 5 RHEL Program Management 2023-09-04 12:17:11 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 6 RHEL Program Management 2023-09-04 12:22:15 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.


Note You need to log in before you can comment on or make changes to this bug.