Bug 1927532 - Large CRL file operation causes OOM error in Candlepin
Summary: Large CRL file operation causes OOM error in Candlepin
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 6.11.0
Assignee: satellite6-bugs
QA Contact: Danny Synk
URL:
Whiteboard:
: 1999089 (view as bug list)
Depends On: 1958127 1958128
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-10 22:21 UTC by Mike McCune
Modified: 2024-03-25 18:09 UTC (History)
9 users (show)

Fixed In Version: candlepin-3.1.31-1,candlepin-4.0.8-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1928161 1958127 1958128 2027358 (view as bug list)
Environment:
Last Closed: 2022-07-05 14:28:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6177582 0 None None None 2022-03-25 14:53:09 UTC
Red Hat Product Errata RHSA-2022:5498 0 None None None 2022-07-05 14:29:24 UTC

Description Mike McCune 2021-02-10 22:21:41 UTC
Large customer environments occasionally generate a large CRL file after a big subscription update or refresh. Error:

2021-02-08 12:01:46,929 [thread=QuartzScheduler_Worker-6] [job=CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78, org=, csid=] ERROR org.quartz.core.JobRunShell - Job cron group.CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78 threw an unhandled Exception:
java.lang.OutOfMemoryError: input is too large to fit in a byte array
        at com.google.common.io.ByteStreams.toByteArrayInternal(ByteStreams.java:195)

Customer in question had a 2.8G CRL in:

/var/lib/candlepin/candlepin-crl.crl

this blew past the 1.99 GB limit for processing this file and the customer will be forced to take manual steps to get past the collection process.

WORKAROUND:

1) stop services:

# satellite-maintain service stop

2) start Postgres:

# systemctl start postgresql

3) move CRL out of the way:

# mv /var/lib/candlepin/candlepin-crl.crl /var/lib/candlepin/candlepin-crl.BAK

4) Update database:

# echo "UPDATE cp_cert_serial SET collected=true WHERE revoked=true;" | sudo -u postgres psql -d candlepin
UPDATE 134330

5) start services and resume operations

# satellite-maintain service start

Comment 2 Nikos Moumoulidis 2021-09-02 10:32:07 UTC
An update on how this issue will be resolved: We have a solution currently under review that will remove the CertificateRevocationListTask job entirely,
and replace it with a new job called CertificateCleanupJob which will be running periodically and:
- Will no longer be generating a CRL file.
- Will be revoking all expired (but not yet revoked) Identity and SCA certificates (these might pile up when hosts register and then never unregister themselves, and time passes so those certs are never revoked, but are expired and therefore invalid).
- Will be deleting all the certificate serials that are both expired and revoked (this includes serials of all 3 types of certs: identity, SCA and entitlement).

Comment 4 Danny Synk 2022-01-12 15:15:14 UTC
Verified on Satellite 7.0.0, snap 4 running on RHEL 7 and RHEL 8.

Steps to Test:

1. Add the following line to /etc/candlepin/candlepin.conf to set the CRLUpdateJob to run every 3 minutes:

candlepin.async.jobs.CRLUpdateJob.schedule=0 0/3 * * * ?

2. Restart the tomcat service:

# systemctl restart tomcat

3. Register a host to Satellite.

4. Verify that /var/lib/candlepin/candlepin-crl.crl is not present.

5. Follow the Candlepin log and wait for the CRLUpdateJob to attempt to run.

6. Between two runs of the job, attach a subscription to the host registered in step 3, then immediately remove that subscription.

7. After the job runs again, verify that the CRL file is still not present.

Expected Results:

The CRL file is not present when Satellite is installed, and the file is not created when the CRLUpdateJob is triggered.

Actual Results:

The CRL file is not present when Satellite is installed, and the file is not created when the CRLUpdateJob is triggered.

The attempted job run results in an error in /var/log/candlepin/candlepin.log:

```
2022-01-11 14:57:00,006 [thread=QuartzScheduler_Worker-14] [=, org=, csid=] INFO  org.candlepin.async.JobManager - Job queued: AsyncJobStatus [id: 8a81828b7e4a0179017e4ab716e4096f, name: CRLUpdateJob, key: CRLUpdateJob, state: QUEUED]
2022-01-11 14:57:00,048 [thread=Thread-151 (ActiveMQ-client-global-threads)] [job=8a81828b7e4a0179017e4ab716e4096f, job_key=CRLUpdateJob, org=, csid=] ERROR org.candlepin.async.JobManager - No registered job class for job: CRLUpdateJob
2022-01-11 14:57:00,048 [thread=Thread-151 (ActiveMQ-client-global-threads)] [=, org=, csid=] ERROR org.candlepin.async.JobMessageReceiver - Job processing failed terminally; committing job message as acknowledged: Message [id: 1292, address: job, body: {"jobId":"8a81828b7e4a0179017e4ab716e4096f","jobKey":"CRLUpdateJob"}]
org.candlepin.async.JobInitializationException: No registered job class for job: CRLUpdateJob
```

This error reflects the fact that the CRLUpdateJob was removed from Candlepin and replaced with the CertificateCleanupJob in https://github.com/candlepin/candlepin/pull/3078.

Comment 6 Nikos Moumoulidis 2022-03-25 14:53:10 UTC
*** Bug 1999089 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2022-07-05 14:28:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498


Note You need to log in before you can comment on or make changes to this bug.