Bug 1672706

Summary: candlepin's CertificateRevocationListTask does not scale well for 2M+ certificates
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: CandlepinAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: jcallaha
Severity: high Docs Contact:
Priority: high    
Version: 6.3.5CC: andrew.schofield, bbuckingham, bcourt, pdwyer
Target Milestone: 6.6.0Keywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: candlepin-2.6.1-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-22 12:47:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1620226    
Bug Blocks:    

Description Pavel Moravec 2019-02-05 16:28:53 UTC
Description of problem:
Having 2.2M certificates in cp_cert_serial in Satellite 6.3.5, we noticed bug bz1620226 already (to some extent - candlepin gets slower and slower and slower, CRL job started but silently terminated after one hour, and since then performance is good again).

Please backport bz1620226 to 6.4.z and newer (bz1620226 will appear in 6.6 only, by default)


Version-Release number of selected component (if applicable):
6.3.5 / 6.4


How reproducible:
100% with customer DB
shall be straightforward on scaled environment (many hosts with many subs)


Steps to Reproduce:
1. have 40k hosts with 50 subs each (or similarly scaled Sat)
2. observe Sat/candlepin performance at noon
3. optionally, modify the "at noon" by adding:

pinsetter.org.candlepin.pinsetter.tasks.CertificateRevocationListTask.schedule=0 0/5 * * * ?

to candlepin.conf and restarting tomcat


Actual results:
2. shows logs like:
2019-02-05 10:50:00,177 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-b5e0a33b-7ecb-4951-9ba6-30c87d95f73f, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.CertificateRevocationListTask
2019-02-05 10:50:00,178 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-b5e0a33b-7ecb-4951-9ba6-30c87d95f73f, org=, csid=] INFO  org.candlepin.pinsetter.tasks.CertificateRevocationListTask - Executing CRL Job. CRL filePath=/var/lib/candlepin/candlepin-crl.crl

but _without_ termination log like:

2019-02-04 12:00:00,110 [thread=QuartzScheduler_Worker-13] [job=CertificateRevocationListTask-f44c921f-8dc8-4928-ace1-3ebd9fb31f0c, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=8

2. shows high CPU, worse latency etc, worsening over time.


Expected results:
2. the job completes in reasonable time, no big CPU or latency impact


Additional info:

Comment 5 Barnaby Court 2019-02-07 19:00:39 UTC
Moving to modified & tagging with candlepin-2.6.1-1 as that is the build that already contains a fix for this.

Comment 7 jcallaha 2019-05-15 17:23:36 UTC
Verified in Satellite 6.6.0 Snap 2

Setup a system with 50,011 content hosts.
I then attached 51 subscriptions to each.

I added the cron line to candlepin.conf and tailed the candlepin logs to look for the task.

Results:
The task completed within a very reasonable time.

2019-05-15 12:00:00,205 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-f6f6356c-6241-40fc-a41e-c72d38d2d2c6, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.CertificateRevocationListTask
2019-05-15 12:00:00,205 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-f6f6356c-6241-40fc-a41e-c72d38d2d2c6, org=, csid=] INFO  org.candlepin.pinsetter.tasks.CertificateRevocationListTask - Executing CRL Job. CRL filePath=/var/lib/candlepin/candlepin-crl.crl
2019-05-15 12:00:00,266 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-f6f6356c-6241-40fc-a41e-c72d38d2d2c6, org=, csid=] INFO  org.candlepin.util.CrlFileUtil - CRL sync processed a total of 0 serials.
2019-05-15 12:00:00,266 [thread=QuartzScheduler_Worker-12] [job=CertificateRevocationListTask-f6f6356c-6241-40fc-a41e-c72d38d2d2c6, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=61

Comment 9 errata-xmlrpc 2019-10-22 12:47:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172