Yum no longer crashes in certain _nss_ and _nspr_ update scenario
Previously, when the *yum* installer updated a certain combination of _nss_ and _nspr_ package versions, the transaction sometimes terminated prematurely due to a following symbol lookup error:
/lib64/libnsssysinit.so: undefined symbol: PR_GetEnvSecure
This then caused stale rpm locks. *Yum* has been updated to correctly deal with this particular _nss_ and _nspr_ update scenario. As a result, *yum* does not terminate anymore in the described scenario.
(This might be yum/rhsm/nss issue, I'm filing it under yum as it seems the most likely candidate.)
Description of problem:
During a ZStream/EUS yum update from GA (RHEL-7.3, RHEL-7.2, maybe others), the following happens:
Cleanup : glibc-common-2.17-157.el7.x86_64 249/251
Cleanup : glibc-2.17-157.el7.x86_64 250/251
Cleanup : tzdata-2016g-2.el7.noarch 251/251
/usr/bin/python: symbol lookup error: /lib64/libnsssysinit.so: undefined symbol: PR_GetEnvSecure
Afterwards, some yum database seems to be in an inconsistent state, but yum appears to recover on its next invocation:
BDB2053 Freeing read locks for locker 0x129: 1364/140363639228224
(repeats ~30 times).
Steps to Reproduce:
1. install a minimal RHEL-7.3 GA system from ISO (or other source, but use kickstart which doesn't use 'yum' in %post), make sure to include subscription-manager in the base installation (have it in %packages)
2. register the system using subscription-manager, enable rhel-7-server-rpms, rhel-7-server-optional-rpms - optionally, enable the EUS repos as well
3. do 'yum update' for a full system update, or just 'yum update nss'
4. when asked [y/n], answer 'n'
5. repeat step 3 again, answer 'y'
6. answer 'y' to gpgkey imports
7. observe the error from above
Alternative Steps to Reproduce:
1. have a system with 7.3 GA nss/nspr RPMs
2. register it via subscription-manager, like step 2 before
3. 'rm -rf /var/cache/yum'
4. 'yum makecache fast'
5. 'yum update nss'
6. see the error
7. (to reproduce again, downgrade to RPM versions from 1 and goto 3)
OLD (7.3 GA) RPM versions:
NEW (7.3 ZStream) RPM versions:
I would like to emphasize that to reproduce this, you need to follow the exact steps provided as this issue CANNOT be reproduced:
- if you use just 'yum clean all' instead of 'rm -rf' on the yum cache
- if yum downloads any repo metadata during the upgrade process,
hence the 'yum makecache fast' or 'n' answer beforehand
- if you upgrade from locally available RPMs instead of RHSM
- if you do any yum operation prior to the update (ie. installing
an unrelated RPM package)
Based on the above, I have a possible theory of what could be causing the issue.
Yum seems to download
rhel-7-server-eus-optional-rpms/x86_64/productid | 2.1 kB 00:00:00
rhel-7-server-eus-rpms/x86_64/productid | 2.1 kB 00:00:00
rhel-7-server-optional-rpms/x86_64/productid | 2.1 kB 00:00:00
rhel-7-server-rpms/x86_64/productid | 2.1 kB 00:00:00
right after Cleanup phases of installed RPMs, but *only* on the second yum invocation; that is one yum invocation *after* repo metadata are downloaded. If nss/nspr is updated during the same yum invocation, the error appears. Hence the magic around 'yum makecache', because if nss/nspr was updated during the same invocation as repo update, the productid bits wouldn't be downloaded (that would happen on next yum run) and nss/nspr would update successfully.
It is possible that the real cause is different, but I'm fairly certain this additional download plays some role in it.
Version-Release number of selected component (if applicable):
do you have an update on this one, it's start to appear in a lot of RHOS upgrade, OSP8->9 and OSP9->10 are currently known to be affected.
I do not have an update, I can still see the issue.
we have some more logs for it https://bugzilla.redhat.com/show_bug.cgi?id=1451275#c10, do you need some more info on this one ?
Created attachment 1294350 [details]
NSS workaround, changing nss-sysinit to avoid dependency on PR_GetEnvSecure
for reference, I'm adding the current upstream review https://review.openstack.org/#/c/483074/ to workaround the problem during RHOS update and cover most of the case during RHOS upgrade (RHOS update should bring RHEL upgrade so we are covered, but we adding a safety net for people "forgetting" to do a RHOS update before RHOS upgrade.)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.