Bug 1871396
Summary: | glibc: Improve use of static TLS surplus for optimizations. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Carlos O'Donell <codonell> | ||||||
Component: | glibc | Assignee: | Florian Weimer <fweimer> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Sergey Kolosov <skolosov> | ||||||
Severity: | urgent | Docs Contact: | Zuzana Zoubkova <zzoubkov> | ||||||
Priority: | unspecified | ||||||||
Version: | 8.4 | CC: | ashankar, bugproxy, codonell, dhorak, dikonoor, dj, fweimer, hannsj_uhl, jomiller, mnewsome, myselfsravank, pfrankli, sipoyare, skolosov, tulioqm | ||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||
Target Release: | 8.4 | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glibc-2.28-145.el8 | Doc Type: | Bug Fix | ||||||
Doc Text: |
.The `glibc` dynamic linker now restricts part of the static thread-local storage space to static TLS allocations
Previously, the `glibc` dynamic linker used all available static thread-local storage (TLS) space for dynamic TLS, on a first come, first served basis. Consequently, loading additional shared objects at run time using the `dlopen` function sometimes failed, because dynamic TLS allocations had already consumed all available static TLS space. This problem occurred particularly on the 64-bit ARM architecture and IBM Power Systems.
Now, the dynamic linker restricts part of the static TLS area to static TLS allocations and does not use this space for dynamic TLS optimizations. As a result, `dlopen` calls succeed in more cases with the default setting. Applications that require more allocated static TLS than the default setting allows can use a new `glibc.rtld.optional_static_tls` tunable.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-05-18 14:36:39 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1796871 | ||||||||
Attachments: |
|
Description
Carlos O'Donell
2020-08-23 02:49:33 UTC
Tulio, would be able to check a current 8.4 build (glibc-2.28-145.el8 or later) to see if it addresses your needs? I had forgotten about this bug, but the ld.so updates should bring in the relevant updates. I've just been contacted about an issue that I suspect is related to this. I'm still collecting information in order to reproduce the issue. Assuming that I can reproduce the issue locally, I'll be glad to test it ------- Comment From tulioqm.com 2021-02-02 10:17 EDT------- Hi Florian, Red Hat, Vijay (IBM PowerVC) managed to create an environment that reproduces the customer issue. We together executed this test and confirmed that glibc-2.28-145.el8 and related packages did fix the issue we were seeing. With that said, I'm marking this bug as verified. Thank you! We see that in the same environment, this problem is sometimes reproducible and sometimes not. Can anyone list down the steps on how this can be reproduced? (In reply to Divya from comment #9) > We see that in the same environment, this problem is sometimes reproducible > and sometimes not. Can anyone list down the steps on how this can be > reproduced? Is this a question for Red Hat or IBM? We have seen a complex reproducer involving the Fedora installer (anaconda). The backported patch also comes with a test case, but making this test architecture-independent is a bit tricky. You may have to fudge the TLS sizes in the test a bit in order to reproduce the issue without the patch. ------- Comment From dikonoor.com 2021-02-03 06:55 EDT------- PowerVC 2.0 GA happened last Dec and we have many environments (internal and external), who are hitting this issue. There is a possibility that any customer trying to install PowerVC on RHEL8.3 will hit this issue. If a fix is anyway available for RHEL8.4 and if it works for RHEL8.3 , we really need this fix to be made available to customers using RHEL8.3 (as part of RHEL8.3 next rolling update). Otherwise, this bug will severely impact installation and adoption of PowerVC 2.0. Apart from our on-prem customers, IBM Cloud will also move to this release soon and they will also be impacted. A custom who wants to install PowerVC today cannot wait for RHEL8.4 to be GAed in May 2021. Request you to please take this as a high priority bug and do the needful so that this fix is made available in 8.3 at the earliest. Reopening the bug for the same reason. (In reply to IBM Bug Proxy from comment #11) > ------- Comment From dikonoor.com 2021-02-03 06:55 EDT------- > PowerVC 2.0 GA happened last Dec and we have many environments (internal and > external), who are hitting this issue. There is a possibility that any > customer trying to install PowerVC on RHEL8.3 will hit this issue. If a fix > is anyway available for RHEL8.4 and if it works for RHEL8.3 , we really need > this fix to be made available to customers using RHEL8.3 (as part of RHEL8.3 > next rolling update). Otherwise, this bug will severely impact installation > and adoption of PowerVC 2.0. Apart from our on-prem customers, IBM Cloud > will also move to this release soon and they will also be impacted. A custom > who wants to install PowerVC today cannot wait for RHEL8.4 to be GAed in May > 2021. Request you to please take this as a high priority bug and do the > needful so that this fix is made available in 8.3 at the earliest. Reopening > the bug for the same reason. Are the problems you see with PowerVC a regression compared to Red Hat Enterprise Linux 7 or 8.2? I do not think this bug is something that we can successfully backport in to a z-stream release, sorry. It is only scheduled for inclusion into 8.4.0 because we deemed it too intertwined with other changes that we were backporting. There is also some lead time for 8.3.z updates, effectively narrowing the gap between the theoretical availability of an 8.3.z update and 8.4.0 GA to a few weeks. It is very likely that it is possible to address the issue in PowerVC itself, with few (if any) code changes. We would have to look at what precisely triggers the dlopen failure, and find the best way to mitigate that. We should probably move the details to an off-bug discussion. ------- Comment From tulioqm.com 2021-02-25 08:54 EDT------- > Are the problems you see with PowerVC a regression compared to Red Hat > Enterprise Linux 7 or 8.2? The bug is being closed, but I think it's worth documenting the following for the future: We've just found out that libmysqlclient has a new patch [1] forcing the usage of static TLS on mysql-libs-8.0.21-1.module+el8.2.0+7855+47abd494. Package mysql-libs-8.0.17-3.module+el8.0.0+3898+e09bb8de is the last build without this patch. There is a suspicious that more libraries started to use static TLS recently. But we haven't identified them yet. [1] https://github.com/mysql/mysql-server/commit/735bd2a53834266c7256830c8d34672ea55fe17b After much experiments, it is found that we are able to consistency reproduce this problem in RHEL8.3 (haven't tried on 8.2) environments where OS installation was performed (server or server with GUI) with additional packages (and our OpenStack based product is installed on top of it). All other things kept the same, when OS installation is performed with no additional packages we never run into this. When additional packages are installed, we run into this problem when OpenStack nova DB sync command is run: [-] Traceback (most recent call last): [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 93, in db_version [-] return _db_version(repository, database, context) [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 100, in _db_version [-] return versioning_api.db_version(get_engine(database, context=context), [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 41, in get_engine [-] return db_session.get_engine(context=context) [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 148, in get_engine [-] return ctxt_mgr.writer.get_engine() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 832, in get_engine [-] return self._factory.get_writer_engine() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 372, in get_writer_engine [-] self._start() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 510, in _start [-] engine_args, maker_args) [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 534, in _setup_for_connection [-] sql_connection=sql_connection, **engine_kwargs) [-] File "/usr/lib/python3.6/site-packages/debtcollector/renames.py", line 43, in decorator [-] return wrapped(*args, **kwargs) [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py", line 177, in create_engine [-] engine = sqlalchemy.create_engine(url, **engine_args) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 479, in create_engine [-] return strategy.create(*args, **kwargs) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 87, in create [-] dbapi = dialect_cls.dbapi(**dbapi_args) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 118, in dbapi [-] return __import__("MySQLdb") [-] File "/usr/lib64/python3.6/site-packages/MySQLdb/__init__.py", line 18, in <module> [-] from . import _mysql [-] ImportError: /lib64/libstdc++.so.6: cannot allocate memory in static TLS block We tried some workarounds like export LD_PRELOAD=/usr/lib64/mysql/libmysqlclient.so.21 > that did help with proceeding with the installation (i e nova DB sync works without errors) but we start seeing TLS errors when OpenStack nova service tries to start. So this workaround was not very helpful. Divya, I would appreciate if you could attach LD_DEBUG=all output from the failing process to this bug. Thanks. Is abrt installed on the system by chance? I think it loads the Python rpm module, which brings in a lot of additional dependencies. # rpm -qa | grep abrt abrt-addon-coredump-helper-2.10.9-20.el8.ppc64le abrt-addon-pstoreoops-2.10.9-20.el8.ppc64le python3-abrt-2.10.9-20.el8.ppc64le abrt-addon-ccpp-2.10.9-20.el8.ppc64le abrt-cli-2.10.9-20.el8.ppc64le abrt-libs-2.10.9-20.el8.ppc64le abrt-dbus-2.10.9-20.el8.ppc64le abrt-addon-vmcore-2.10.9-20.el8.ppc64le abrt-addon-kerneloops-2.10.9-20.el8.ppc64le abrt-addon-xorg-2.10.9-20.el8.ppc64le abrt-tui-2.10.9-20.el8.ppc64le abrt-2.10.9-20.el8.ppc64le python3-abrt-addon-2.10.9-20.el8.ppc64le We will try and make LD_DEBUG O/P available. Created attachment 1763336 [details]
ld.so_output
Hi Florian, I have zipped ld.so output and uploaded as attachment, please do check and revert if you need any other information. Thanks Thanks, but the ZIP file appears to be empty: Archive: /tmp/bugzilla-1871396.zip Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 0 Stored 0 0% 03-11-2021 09:13 00000000 bugzilla-1871396/ -------- ------- --- ------- 0 0 0% 1 file Created attachment 1763343 [details]
LD_DEBUG output
some issue with earlier one,, please refer to new one. Thanks *sigh* I forgot how useless the LD_DEBUG output is for this purposes. So I don't think this sheds much light on what is going on, sorry. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: glibc security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1585 |