Bug 1959057

Summary: An error has ocorred (IPA Error 4301:CertificateOperationError)
Product: Red Hat Enterprise Linux 8 Reporter: cilmar <cilmar>
Component: pki-coreAssignee: Chris Kelley <ckelley>
Status: CLOSED ERRATA QA Contact: idm-cs-qe-bugs
Severity: medium Docs Contact:
Priority: urgent    
Version: 8.3CC: ademir.ladeira, amayberr, arajendr, ckelley, frenaud, msauton, negativo17, pcech, rcritten, skhandel, tscherf
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: pki-core-10.6-8080020230203154518.c5b4fe3c Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2164347 2164348 2164349 (view as bug list) Environment:
Last Closed: 2023-05-16 08:36:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2164347, 2164348, 2164349, 2192625, 2192969    

Description cilmar@redhat.com 2021-05-10 16:20:53 UTC
Description of problem:
-- For RHEL8 IdM CA --
 IT is observed that the Web UI is (by default) attempting to load all stored certificates when opening the certificate search page.  Initially the certificate query would crash after 1000 entries, entry 1001 would receive the error.  When adjusting `nsSizeLimit: 10000` the certificate query would crash on entry 10001.   With ~25,000 certificates, we elected to raise `nsSizeLimit: 100000` and saw no further error.

SEVERE: Unable to search for certificates: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntry
java.lang.RuntimeException: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntr

 -  The Web UI search page should not attempt to pull all certificates by default (or have a configuration toggle, or some sane default, such as only pulling certs for the IdM replicas) this page takes a significant time to load.

 -  The even if the results per page (ca.crl.pageSize /etc/pki/pki-tomcat/ca/CS.cfg) is raised from the default (100) to a larger number (1000) the Web UI does not have a navigation link to page through the results, so any result outside of this range isn't presented in the Web UI anyway.  The UI should either give page navigation, or limit the query to the page result limit so it is not wasting resources querying for results that it can't show.

Version-Release number of selected component (if applicable):
   OS Version: 8.3 (Ootpa)
   389-ds-base-1.4.3
   ipa-server-4.8.7

How reproducible:
  After integrate with RHEL 7 cluster (as new replica) in order to migrate the environment.

Steps to Reproduce:
1. Join a new RHEL 8 as replica on RHEL 7 cluster (migration purpose) 
2. Try to access the GUI interface on menu: Authentication > Certificates > Certificates
3. You will see a error on GUI:
   An error has ocorred (IPA Error 4301:CertificateOperationError)

Actual results:
   [Fri May 07 10:47:37.625637 2021] [wsgi:error] [pid 92846:tid 140680679278336] [remote 172.18.127.128:51191] ipa: INFO: [jsonserver_session] user.COM: cert_find(None, version='2.239'): CertificateOperationError

Expected results:
   We expecte to see the certificates after install process.


Additional info:
   We figure out a workaround doing:

cat << EOF > /var/tmp/add.nssizelimit.to.uid.pkidbuser.ldif
dn: uid=pkidbuser,ou=people,o=ipaca
changetype: modify
add: nsSizeLimit
nsSizeLimit: 100000            <<--RAISED TO A HIGH NUMBER
EOF
sed -i 's/[[:space:]]*$//' /var/tmp/add.nssizelimit.to.uid.pkidbuser.ldif
ldapmodify -xD "cn=directory manager" -W -f /var/tmp/add.nssizelimit.to.uid.pkidbuser.ldif

See attachment as well

Comment 2 Florence Blanc-Renaud 2021-05-18 11:41:48 UTC
Since the error happens in pki, moving to pki-core component (IPA webUI only shows the error it is receiving from a call to PKI)

Comment 3 Marc Sauton 2021-05-18 17:07:04 UTC
we should have the matching LDAP access and errors log events to the Java exception:

2021-05-07 13:58:32 [ajp-nio-127.0.0.1-8009-exec-2] SEVERE: Unable to search for certificates: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntry
java.lang.RuntimeException: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntry
        at com.netscape.cmscore.dbs.DBVirtualList.getEntries(DBVirtualList.java:523)

from the exception, it may be the LDAP VLV indexes are corrupted.

it seem it is a different problem than what is listed in the description, with a mix of several problems and different actions, so it is not that clear what this bug report is about, we may want to focus on one problem.
I could be wrong, but I think we need to test this scenario more.


1-
"IT is observed that the Web UI is (by default) attempting to load all stored certificates when opening the certificate search page."
->
there is a list and a search PKI interface,
the list uses /ca/ee/ca/listCerts and does paging with LDAP VLV browsing
the search uses /ca/ee/ca/srchCerts with direct LDAP searches using filters depending on the inputs of the search, for example with a serial range search, in the form of "(&(serialno>=011)(serialno<=016))"
IPA may not be using the correct PKI feature, that would be a separate issue.


2-
"Initially the certificate query would crash after 1000 entries, entry 1001 would receive the error."
->
which error?
that would be a LDAP error 11, relayed to some other error in the IdM web UI.
but this is not a crash, and more a normal return code for a search than returned more entries than allowed by the current configuration.


3-
"
When adjusting `nsSizeLimit: 10000` the certificate query would crash on entry 10001.
"
->
this is not a crash, or we may want to indicate what was crashing, add the corresponding PKI debug trace snippet with the LDAP access log events


4-
"
With ~25,000 certificates, we elected to raise `nsSizeLimit: 100000` and saw no further error.
"
I suggested this LDAP configuration workaround in the case, but this it not ideal, will create problems later.
no UI should get hundreds of records immediately ( most likely very inefficient ), paged search should be done.


5-
"
The even if the results per page (ca.crl.pageSize /etc/pki/pki-tomcat/ca/CS.cfg) is raised from the default (100) to a larger number (1000) the Web UI does not have a navigation link to page through the results, so any result outside of this range isn't presented in the Web UI anyway.  The UI should either give page navigation, or limit the query to the page result limit so it is not wasting resources querying for results that it can't show.
"
->
the PKI parameter ca.crl.pageSize has nothing to do with searching of listing certificates, it is a parameter related to CRL updates.
so I am not sure why this is mentioned in this bug report, it must be for a different issue.

Comment 4 Alex Mayberry 2021-05-19 15:12:19 UTC
If "crash" is an inappropriate term, then we can say,  "The page attempts to load, then ceases its attempt at loading, and presents an error message"  instead of "crash".
I'll attach a screenshot of this error, it's also in the associated ticket, and I'll put the text of that error message here:
  --
      An error has occurred (IPA Error 4301: CertificateOperationError)
      Certificate operation cannot be completed: Unable to communicate with CMS (500)
      Please try the following options:
         * Refresh the page.
         * Return to the main page and retry the operation
         * Reload the browser.
      If the problem persists please contact the system administrator. 
  -- 

When viewed from the system logs we see this associated error message (until the nsSizeLimit has been extented to beyond the number of certificates) :
      `SEVERE: Unable to search for certificates: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntry java.lang.RuntimeException: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntr`


Would you recommend multiple bug reports?  How should they be distributed?

    -  Web UI automatically attempts to query every cert when browsing to the certificate tab 
       + suggest that no search be triggered automatically, 
       + or default search is limited to only show certs for the replicas so that the search is constrained to a safe number of results ]

    -  Web UI Fails to show navigation links on the page to look through each page of results (only shows the first 100 entries as a single page, no "next page" button)
       + suggest paginated query results be navigable via UI buttons
 
    -  Web UI presents an error message and fails to load the certificate page when the search query returns results greater than the nsSizeLimit
       - `SEVERE: Unable to search for certificates: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntry
java.lang.RuntimeException: java.lang.ClassCastException: netscape.ldap.LDAPException cannot be cast to netscape.ldap.LDAPEntr`
       +  If this were ldap, I would say do a paginated query to avoid result sizes that break nsSizeLimit, but maybe this isn't appropriate for the PKI? Or maybe there's a better way?

    -  RFE: Mechanism to clean up these certificates so that the `nsSizeLimit` value does not need to be constantly increased to keep pace with growth.
    -       Or a dynamic nsSizeLimit based on the number of records, perhaps?  Maybe as some function of allocated system memory, or some other sizing metric to key off of for a rational limit?

Comment 8 Marc Sauton 2021-10-11 21:47:23 UTC
so the workaround is to add a LDAP user defined sizelimit value to the pkidbuser entry, with a static nsSizeLimit value chosen high enough, like for example 20K:

dn: uid=pkidbuser,ou=people,o=ipaca
changetype: modify
add: nsSizeLimit
nsSizeLimit: 20000

Comment 20 errata-xmlrpc 2023-05-16 08:36:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pki-core:10.6 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2826