Bug 1928161

Summary:	Large CRL file operation causes OOM error in Candlepin
Product:	[Community] Candlepin	Reporter:	Rehana <redakkan>
Component:	candlepin	Assignee:	candlepin-bugs
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.1	CC:	bcourt, ltran, mmccune, nmoumoul, ojanus, redakkan, swick
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	candlepin-4.1.6-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1927532	Environment:
Last Closed:	2021-09-24 11:39:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rehana 2021-02-12 14:16:53 UTC

+++ This bug was initially created as a clone of Bug #1927532 +++

Large customer environments occasionally generate a large CRL file after a big subscription update or refresh. Error:

2021-02-08 12:01:46,929 [thread=QuartzScheduler_Worker-6] [job=CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78, org=, csid=] ERROR org.quartz.core.JobRunShell - Job cron group.CertificateRevocationListTask-911d9c76-5768-4b10-b827-16dfac3c8b78 threw an unhandled Exception:
java.lang.OutOfMemoryError: input is too large to fit in a byte array
        at com.google.common.io.ByteStreams.toByteArrayInternal(ByteStreams.java:195)

Customer in question had a 2.8G CRL in:

/var/lib/candlepin/candlepin-crl.crl

this blew past the 1.99 GB limit for processing this file and the customer will be forced to take manual steps to get past the collection process.

WORKAROUND:

1) stop services:

# satellite-maintain service stop

2) start Postgres:

# systemctl start postgresql

3) move CRL out of the way:

# mv /var/lib/candlepin/candlepin-crl.crl /var/lib/candlepin/candlepin-crl.BAK

4) Update database:

# echo "UPDATE cp_cert_serial SET collected=true WHERE revoked=true;" | sudo -u postgres psql -d candlepin
UPDATE 134330

5) start services and resume operations

# satellite-maintain service start

Comment 1 Nikos Moumoulidis 2021-05-07 09:16:04 UTC

*** Bug 1806626 has been marked as a duplicate of this bug. ***

Comment 3 Samson Wick 2021-08-31 13:50:01 UTC

In case it assists in getting this bug assigned to a release, I'm working with a very large RH customer that has encountered this issue as well.  In their case it seems to be caused by the way they're doing content management - regularly reassigning large numbers of hosts to different LCEs.