1620226 – CertificateRevocationListTask consumes 100% CPU and Memory for large lists

Bug 1620226 - CertificateRevocationListTask consumes 100% CPU and Memory for large lists

Summary: CertificateRevocationListTask consumes 100% CPU and Memory for large lists

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Candlepin
Classification:	Community
Component:	candlepin
Sub Component:
Version:	2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	2.6
Assignee:	Michael Stead
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1672706
TreeView+	depends on / blocked

Reported:	2018-08-22 18:24 UTC by Shayne Riley
Modified:	2019-04-22 15:14 UTC (History)
CC List:	6 users (show)
Fixed In Version:	candlepin-2.6.1-1
Clone Of:
Environment:
Last Closed:	2019-01-23 16:56:08 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	candlepin candlepin pull 2204	0	'None'	closed	1620226: Perform CRL generation without exceeding memory and maxing CPU	2020-06-01 10:21:24 UTC

Description Shayne Riley 2018-08-22 18:24:54 UTC

Description of problem:
When running a CertificateRevocationListTask, the task will eventually hit 100% CPU use, and maxes out the available heap size, 14GB in this case. Additionally, the node becomes unresponsive to any HTTP requests, as the majority of the CPU time is spent in full GC.

Additionally, other nodes that may create/run any async jobs, such as hypervisor checkins or refreshPoolsJobs, will become unresponsive to HTTP requests. They're unresponsive, even though they're barely using any CPU and memory is fine.

Version-Release number of selected component (if applicable):
2.3.9

How reproducible:
Always, but only in prod.


Steps to Reproduce:
1. Schedule a CertificateRevocationListTask
2. Wait 10-15 minutes
3. Try to make a benign HTTP call, like GET status

Actual results:
Task takes over two hours, still doesn't complete, node becomes unresponsive due to high GC, other "worker" nodes become unresponsivce (no GC).


Expected results:
CertificateRevocationListTask can complete without consuming all 14GB of RAM to do so, and does it without locking out the other worker nodes.


Additional info:
This can be considered the sequel to BZ1566244.

Note You need to log in before you can comment on or make changes to this bug.