Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause:
When upgrading dbm databases with lots of Certificates with private keys, the resulting sqlite database becomes extremely slow to access. This is because the sqlite db will contain extra Trust objects for these certs that are unneccessary.
Consequence:
Accessing the resulting sqlite database becomes extremely slow
Fix:
1) this patch speeds up accessing trust objects that don't affect the actual trust values.
2) fixes dbm so that it no longer creates the extra trust objects for certs that have private keys.
Result:
Access to these sqlite databases are now faster. Customers can get faster still results by reupdating the databases from the original dbm after the patch has been applied.
Description of problem:
Background: The nss deployed by rhel8 defaults to the sqlite backend for NSS databases, as opposed to the dbm backend for NSS databases on rhel7. This change affects certmonger, which will consider cert storage of type nssdb to be in sqlite storage format. If it encounters an NSS database for cert/key storage that is in (legacy) dbm format, it performs an automatic migration and uses sqlite storage from then on.
The challenge: We use certmonger with a remote SCEP CA. To migrate our productive certificate management from rhel7 to rhel8.5, we copied
* the directory /var/lib/certmonger
* the NSS database containing approx. 100 keys/certs
from old to new machine. All files are located on virtualized storage (same as in the rhel7 installation), on xfs (also same).
The problem: When starting certmonger.service, service startup time exceeded the system default timeout (120 seconds), and we had to increase it to >1000 seconds to be able to start the service at all. A startup time in the minutes is not acceptable for our certificate management.
The workaround: Analyzing the cause of the performance regression (with some help from certmonger devs! thank you!) we found out, that if we force certmonger to switch to (legacy) dbm storage, performance increased manyfold, to levels comparable with the rhel7 installation. We accomplished this by
1. sed -i s/(cert|key)_storage_location=/&dbm:/ /var/lib/certmonger/requests/*
2. Prepend the nss directory location with "dmb:" when calling "getcert":
gertcert ... -d dbm:<nss directory>
Version-Release number of selected component (if applicable):
* certmonger-0.79.13-3.el8.x86_64
* nss-3.67.0-7.el8_5.x86_64
* nss-tools-3.67.0-7.el8_5.x86_64
How reproducible:
Always.
Steps to Reproduce:
1. Use certmonger on rhel7 to create an nss database directory containing multiple entries (the startup time grows linear with the number of requests), like "getcert add-ca ..." "getcert -c <my-ca> -d <my-nss-dir> -P <my-nss-dir-pin> -I <task-nickname> -n <cert-nickname>" -N <subject>"
2. Install certmonger on rhel8, copy /var/lib/certmonger and the nss dir from rhel7.
3 Set OPTS="-d 4" in /etc/sysconfig/certmonger.
4. Start certmonger.service
Actual results:
Starting certmonger.service will take several seconds per request to look up key and certificate, possibly exceeding systemd's service startup timeout. Initial start is even longer since certmonger will perform an NSS database migration from dbm to sqlite.
Expected results:
For ~ 100 managed requests, certmonger.service startup time should be in the seconds, not in the minutes.
Additional info:
The underlying problem seems to be with the change of the default database backend in NSS as designed here: https://fedoraproject.org/wiki/Changes/NSSDefaultFileFormatSql where such performance impact was apparently not considered/foreseen.
The issue seems to be with a database that is migrated from dbm to sqlite.
The attached script generates a self-signed CA and 100 server certificates.
To run it:
$ mkdir /tmp/nssdb
$ bash gencert dbm:/tmp/nssdb
$ echo httptest > /tmp/nssdb/passwd
Listing all the keys takes less than a second:
$ time certutil -K -d dbm:/tmp/nssdb -f /tmp/nssdb/passwd
real 0m0.559s
user 0m0.444s
sys 0m0.086s
Upgrade it to sqlite:
$ certutil -d sql:/tmp/nssdb/ -N -f /tmp/nssdb/passwd -@ /tmp/nssdb/passwd
Same listing of keys:
$ time certutil -K -d dbm:/tmp/nssdb -f /tmp/nssdb/passwd
real 0m46.905s
user 0m45.400s
sys 0m0.177s
Now if we create the database directly as sqlite the timing is more in line with dbm:
$ mkdir /tmp/nssdb2
$ bash gencert sql:/tmp/nssdb2
$ echo httptest > /tmp/nssdb2/passwd
And list the keys:
$ time certutil -K -d sql:/tmp/nssdb2 -f /tmp/nssdb2/passwd
real 0m0.742s
user 0m0.581s
sys 0m0.032s
Also worth mentioning that generating the sqlite database using gencert takes significantly longer than the dbm database. It's plausible that entropy on this VM is simply exhausted.
Reproduced with nss-3.67.0-6.el8_4.x86_64
This is almost certainly caused by the cache trashing bug when we added integrity to AES. The issue is the key for the decrypt and the key for the integrity check are different, and they would throuh each other out of the cache, so you ended up doing the PBE for every key. (The issue is seen with databases with large numbers of private keys). Does this happen on RHEL-9? If not it should be fixed on the next NSS rebase next month.
dbm isn't allowed in Fedora since I think Fedora 32 or 33.
$ certutil -N -d dbm:/tmp/nssdb
certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The certificate/key database is in an old, unsupported format.
Oh, I thought that it was just that the database on sqlite was being slow. Hmmm If you copy the database dbm upgraded database to rhel-9 or fedora, is it still slow?
There is an upstream bug that was fixed where if you have 100 or so keys, sqlite was really slow listing them. The fix for this is not in RHEL-8. I wonder why we aren't tripping over this when you create the database in sqlite?
bob
It takes about 4s to list the 100 keys from the same database using nss-3.71.0-1.fc33.x86_64
$ time certutil -K -d sql:/tmp/nssdb/ -f /tmp/nssdb/passwd
real 0m4.155s
user 0m4.102s
sys 0m0.031s
slight error in comment 1> Upgrade it to sqlite:
>
> $ certutil -d sql:/tmp/nssdb/ -N -f /tmp/nssdb/passwd -@ /tmp/nssdb/passwd
>
> Same listing of keys:
>
> $ time certutil -K -d dbm:/tmp/nssdb -f /tmp/nssdb/passwd
This last line should be:
$ time certutil -K -d sql:/tmp/nssdb -f /tmp/nssdb/passwd
> Also worth mentioning that generating the sqlite database using gencert takes significantly longer than the dbm database.
> It's plausible that entropy on this VM is simply exhausted.
No keygen against the sql database definitely takes longer I can see that in both the rhel-8 certutil and my current upsteam certutil.
So the issuer is the CERT_USERDB bit in the trust, fools the legacydb (dbm) into presenting trust objects that are actually empty trust objects. Since NSS checks the integrity of trust objects if you've logged in (which you have to to display the keys), it takes quite some time to display each cert.
There are two fixes: 1) we can skip the integrity check if the value we are checking is the value we would default to if there wasn't any trust value (which you get when the integrity check fails. This speeds up the listing of the databases with these dead trust values by about 10x. 2) Fix dbm to to correctly skip cert trust objects with the CERTDB_USER bit and nothing else. This will fix the case the created the bad databases, but won't fix the displaying of the bad databases.
NSS 3.79 shipped today, so it won't be upstreamed in time to patch this there. We'll carry the patch until the next release of NSS.