Backport the following commits to improve the use of status TLS surplus: 0c7b002fac12dcb2f53ba83ee56bb3b5d2439447 rtld: Add rtld.nns tunable for the number of supported namespaces 17796419b5fd694348cceb65c3f77601faae082c rtld: Account static TLS surplus for audit modules ffb17e7ba3a5ba9632cee97330b325072fbe41dd rtld: Avoid using up static TLS surplus for optimizations [BZ #25051] The important tunable to add is the static TLS surplus tunable for users to tune how much they need for their workloads.
Tulio, would be able to check a current 8.4 build (glibc-2.28-145.el8 or later) to see if it addresses your needs? I had forgotten about this bug, but the ld.so updates should bring in the relevant updates.
I've just been contacted about an issue that I suspect is related to this. I'm still collecting information in order to reproduce the issue. Assuming that I can reproduce the issue locally, I'll be glad to test it
------- Comment From tulioqm.com 2021-02-02 10:17 EDT------- Hi Florian, Red Hat, Vijay (IBM PowerVC) managed to create an environment that reproduces the customer issue. We together executed this test and confirmed that glibc-2.28-145.el8 and related packages did fix the issue we were seeing. With that said, I'm marking this bug as verified. Thank you!
We see that in the same environment, this problem is sometimes reproducible and sometimes not. Can anyone list down the steps on how this can be reproduced?
(In reply to Divya from comment #9) > We see that in the same environment, this problem is sometimes reproducible > and sometimes not. Can anyone list down the steps on how this can be > reproduced? Is this a question for Red Hat or IBM? We have seen a complex reproducer involving the Fedora installer (anaconda). The backported patch also comes with a test case, but making this test architecture-independent is a bit tricky. You may have to fudge the TLS sizes in the test a bit in order to reproduce the issue without the patch.
------- Comment From dikonoor.com 2021-02-03 06:55 EDT------- PowerVC 2.0 GA happened last Dec and we have many environments (internal and external), who are hitting this issue. There is a possibility that any customer trying to install PowerVC on RHEL8.3 will hit this issue. If a fix is anyway available for RHEL8.4 and if it works for RHEL8.3 , we really need this fix to be made available to customers using RHEL8.3 (as part of RHEL8.3 next rolling update). Otherwise, this bug will severely impact installation and adoption of PowerVC 2.0. Apart from our on-prem customers, IBM Cloud will also move to this release soon and they will also be impacted. A custom who wants to install PowerVC today cannot wait for RHEL8.4 to be GAed in May 2021. Request you to please take this as a high priority bug and do the needful so that this fix is made available in 8.3 at the earliest. Reopening the bug for the same reason.
(In reply to IBM Bug Proxy from comment #11) > ------- Comment From dikonoor.com 2021-02-03 06:55 EDT------- > PowerVC 2.0 GA happened last Dec and we have many environments (internal and > external), who are hitting this issue. There is a possibility that any > customer trying to install PowerVC on RHEL8.3 will hit this issue. If a fix > is anyway available for RHEL8.4 and if it works for RHEL8.3 , we really need > this fix to be made available to customers using RHEL8.3 (as part of RHEL8.3 > next rolling update). Otherwise, this bug will severely impact installation > and adoption of PowerVC 2.0. Apart from our on-prem customers, IBM Cloud > will also move to this release soon and they will also be impacted. A custom > who wants to install PowerVC today cannot wait for RHEL8.4 to be GAed in May > 2021. Request you to please take this as a high priority bug and do the > needful so that this fix is made available in 8.3 at the earliest. Reopening > the bug for the same reason. Are the problems you see with PowerVC a regression compared to Red Hat Enterprise Linux 7 or 8.2? I do not think this bug is something that we can successfully backport in to a z-stream release, sorry. It is only scheduled for inclusion into 8.4.0 because we deemed it too intertwined with other changes that we were backporting. There is also some lead time for 8.3.z updates, effectively narrowing the gap between the theoretical availability of an 8.3.z update and 8.4.0 GA to a few weeks. It is very likely that it is possible to address the issue in PowerVC itself, with few (if any) code changes. We would have to look at what precisely triggers the dlopen failure, and find the best way to mitigate that. We should probably move the details to an off-bug discussion.
------- Comment From tulioqm.com 2021-02-25 08:54 EDT------- > Are the problems you see with PowerVC a regression compared to Red Hat > Enterprise Linux 7 or 8.2? The bug is being closed, but I think it's worth documenting the following for the future: We've just found out that libmysqlclient has a new patch [1] forcing the usage of static TLS on mysql-libs-8.0.21-1.module+el8.2.0+7855+47abd494. Package mysql-libs-8.0.17-3.module+el8.0.0+3898+e09bb8de is the last build without this patch. There is a suspicious that more libraries started to use static TLS recently. But we haven't identified them yet. [1] https://github.com/mysql/mysql-server/commit/735bd2a53834266c7256830c8d34672ea55fe17b
After much experiments, it is found that we are able to consistency reproduce this problem in RHEL8.3 (haven't tried on 8.2) environments where OS installation was performed (server or server with GUI) with additional packages (and our OpenStack based product is installed on top of it). All other things kept the same, when OS installation is performed with no additional packages we never run into this. When additional packages are installed, we run into this problem when OpenStack nova DB sync command is run: [-] Traceback (most recent call last): [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 93, in db_version [-] return _db_version(repository, database, context) [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 100, in _db_version [-] return versioning_api.db_version(get_engine(database, context=context), [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/migration.py", line 41, in get_engine [-] return db_session.get_engine(context=context) [-] File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 148, in get_engine [-] return ctxt_mgr.writer.get_engine() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 832, in get_engine [-] return self._factory.get_writer_engine() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 372, in get_writer_engine [-] self._start() [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 510, in _start [-] engine_args, maker_args) [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 534, in _setup_for_connection [-] sql_connection=sql_connection, **engine_kwargs) [-] File "/usr/lib/python3.6/site-packages/debtcollector/renames.py", line 43, in decorator [-] return wrapped(*args, **kwargs) [-] File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py", line 177, in create_engine [-] engine = sqlalchemy.create_engine(url, **engine_args) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/__init__.py", line 479, in create_engine [-] return strategy.create(*args, **kwargs) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 87, in create [-] dbapi = dialect_cls.dbapi(**dbapi_args) [-] File "/usr/lib64/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 118, in dbapi [-] return __import__("MySQLdb") [-] File "/usr/lib64/python3.6/site-packages/MySQLdb/__init__.py", line 18, in <module> [-] from . import _mysql [-] ImportError: /lib64/libstdc++.so.6: cannot allocate memory in static TLS block We tried some workarounds like export LD_PRELOAD=/usr/lib64/mysql/libmysqlclient.so.21 > that did help with proceeding with the installation (i e nova DB sync works without errors) but we start seeing TLS errors when OpenStack nova service tries to start. So this workaround was not very helpful.
Divya, I would appreciate if you could attach LD_DEBUG=all output from the failing process to this bug. Thanks. Is abrt installed on the system by chance? I think it loads the Python rpm module, which brings in a lot of additional dependencies.
# rpm -qa | grep abrt abrt-addon-coredump-helper-2.10.9-20.el8.ppc64le abrt-addon-pstoreoops-2.10.9-20.el8.ppc64le python3-abrt-2.10.9-20.el8.ppc64le abrt-addon-ccpp-2.10.9-20.el8.ppc64le abrt-cli-2.10.9-20.el8.ppc64le abrt-libs-2.10.9-20.el8.ppc64le abrt-dbus-2.10.9-20.el8.ppc64le abrt-addon-vmcore-2.10.9-20.el8.ppc64le abrt-addon-kerneloops-2.10.9-20.el8.ppc64le abrt-addon-xorg-2.10.9-20.el8.ppc64le abrt-tui-2.10.9-20.el8.ppc64le abrt-2.10.9-20.el8.ppc64le python3-abrt-addon-2.10.9-20.el8.ppc64le We will try and make LD_DEBUG O/P available.
Created attachment 1763336 [details] ld.so_output
Hi Florian, I have zipped ld.so output and uploaded as attachment, please do check and revert if you need any other information. Thanks
Thanks, but the ZIP file appears to be empty: Archive: /tmp/bugzilla-1871396.zip Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 0 Stored 0 0% 03-11-2021 09:13 00000000 bugzilla-1871396/ -------- ------- --- ------- 0 0 0% 1 file
Created attachment 1763343 [details] LD_DEBUG output
some issue with earlier one,, please refer to new one. Thanks
*sigh* I forgot how useless the LD_DEBUG output is for this purposes. So I don't think this sheds much light on what is going on, sorry.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: glibc security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1585