Bug 1928161

Summary: Large CRL file operation causes OOM error in Candlepin
Product: [Community] Candlepin Reporter: Rehana <redakkan>
Component: candlepinAssignee: candlepin-bugs
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1CC: bcourt, ltran, mmccune, nmoumoul, ojanus, redakkan, swick
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: candlepin-4.1.6-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1927532 Environment:
Last Closed: 2021-09-24 11:39:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rehana 2021-02-12 14:16:53 UTC
+++ This bug was initially created as a clone of Bug #1927532 +++

Large customer environments occasionally generate a large CRL file after a big subscription update or refresh. Error:

2021-02-08 12:01:46,929 [thread=QuartzScheduler_Worker-6] [job=CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78, org=, csid=] ERROR org.quartz.core.JobRunShell - Job cron group.CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78 threw an unhandled Exception:
java.lang.OutOfMemoryError: input is too large to fit in a byte array
        at com.google.common.io.ByteStreams.toByteArrayInternal(ByteStreams.java:195)

Customer in question had a 2.8G CRL in:

/var/lib/candlepin/candlepin-crl.crl

this blew past the 1.99 GB limit for processing this file and the customer will be forced to take manual steps to get past the collection process.

WORKAROUND:

1) stop services:

# satellite-maintain service stop

2) start Postgres:

# systemctl start postgresql

3) move CRL out of the way:

# mv /var/lib/candlepin/candlepin-crl.crl /var/lib/candlepin/candlepin-crl.BAK

4) Update database:

# echo "UPDATE cp_cert_serial SET collected=true WHERE revoked=true;" | sudo -u postgres psql -d candlepin
UPDATE 134330

5) start services and resume operations

# satellite-maintain service start

Comment 1 Nikos Moumoulidis 2021-05-07 09:16:04 UTC
*** Bug 1806626 has been marked as a duplicate of this bug. ***

Comment 3 Samson Wick 2021-08-31 13:50:01 UTC
In case it assists in getting this bug assigned to a release, I'm working with a very large RH customer that has encountered this issue as well.  In their case it seems to be caused by the way they're doing content management - regularly reassigning large numbers of hosts to different LCEs.