Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2229095 - CertificateCleanupJob fails with foreign key constraint violation on table cp_certificate
Summary: CertificateCleanupJob fails with foreign key constraint violation on table cp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.11.5
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: 6.15.0
Assignee: Barnaby Court
QA Contact: Shweta Singh
URL:
Whiteboard:
Depends On: 2231503 2234486 2256950 2256951
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-04 06:16 UTC by Alexey Masolov
Modified: 2024-05-02 01:42 UTC (History)
12 users (show)

Fixed In Version: candlepin-4.3.12-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2231503 2256950 2256951 2266145 (view as bug list)
Environment:
Last Closed: 2024-04-23 17:12:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CANDLEPIN-643 0 None None None 2023-08-04 13:56:34 UTC
Red Hat Issue Tracker CANDLEPIN-741 0 None None None 2023-12-12 12:25:11 UTC
Red Hat Issue Tracker RHINRULE-573 0 None None None 2023-12-07 16:33:29 UTC
Red Hat Issue Tracker SAT-20446 0 None None None 2023-09-27 11:27:48 UTC
Red Hat Knowledge Base (Solution) 7048429 0 None None None 2023-12-07 15:20:44 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:12:06 UTC

Description Alexey Masolov 2023-08-04 06:16:55 UTC
Description of problem:

On a very loaded system "CertificateCleanupJob" job shows following error:

2023-08-03 17:45:22,829 [thread=Thread-3 (ActiveMQ-client-global-threads)] [job=8a35b3cf89b9b7aa0189bb44de043b1b, job_key=CertificateCleanupJob, org=, csid=] WARN  org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 23503
2023-08-03 17:45:22,829 [thread=Thread-3 (ActiveMQ-client-global-threads)] [job=8a35b3cf89b9b7aa0189bb44de043b1b, job_key=CertificateCleanupJob, org=, csid=] ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - ERROR: update or delete on table "cp_cert_serial" violates foreign key constraint "fk16370c54b9410fc" on table "cp_certificate"
  Detail: Key (id)=(1298353109733755480) is still referenced from table "cp_certificate".
2023-08-03 17:45:22,840 [thread=Thread-3 (ActiveMQ-client-global-threads)] [job=8a35b3cf89b9b7aa0189bb44de043b1b, job_key=CertificateCleanupJob, org=, csid=] ERROR org.candlepin.async.JobManager - Job "CertificateCleanupJob" failed in 20722815ms

Version-Release number of selected component (if applicable):
Satellite 6.11

Comment 4 William Poteat 2023-08-04 14:17:41 UTC
Can I get a database dump for this?

Comment 15 William Poteat 2023-08-15 13:53:01 UTC
Then certificates that are not yet expired but have been revoked can be used to retrieve content.

What specific performance is currently affected? Deleting rows with no concrete justification is not advisable.

If there are rows that the job is not removing, they are there for a reason.

Comment 22 Pavel Moravec 2023-12-07 09:07:56 UTC
My understanding is there are two issues behind this BZ, foreign key error and performance:


1) CertificateCleanupJob fails with foreign key constraint error:

ERROR: update or delete on table "cp_cert_serial" violates foreign key constraint "fk16370c54b9410fc" on table "cp_certificate"
  Detail: Key (id)=(1298353109733755480) is still referenced from table "cp_certificate".

Do we need a reproducer for this? Is this error acknowledged to be fixed, or is it recommended to just clean some redundant certs?

For the workaround / cleanup: what is the right command? Is it the

su - postgres -c "psql candlepin -c \"delete from cp_certificate where id in (select c.id as certificate_id from cp_cert_serial cs inner join cp_certificate c on c.serial_id = cs.id where cs.revoked = 't');\""

one?


2) Independent problem is the performance: deleting hundreds of thousands (or millions) of such certs takes hours to days what makes candlepin unresponsive during that time.

Is this planned to be fixed? What indices can be added as a workaround - and do we need to remove them before an upgrade (that would, as a part of fix, try to add them and re-adding an index would fail and halt the upgrade, potentially)? Is it the:

 ~~~
 CREATE INDEX cp_cont_access_cert_serial_id_idx on cp_cont_access_cert(serial_id); 
 CREATE INDEX cp_ueber_cert_serial_id_idx on cp_ueber_cert(serial_id); 
 ~~~

 Hao's created the index below too:
 ~~~
 CREATE INDEX cp_consumer_cont_acc_cert_id_idx on cp_consumer(cont_acc_cert_id);
 ~~~

?


(we should write KCS for it but I would like to double-check its content before writing)


(please consider the questions "will it be fixed" as informative, not pushing for specific fix/solution)

Comment 23 William Poteat 2023-12-07 13:31:21 UTC
Cleaning the data on an ad hoc basis is the plan. There is no need to write code to clean these rows. The current code does not leave them and it would not be an effective use of anyone's time.


The indices are not a workaround, they are the fix. They have been added to the current Candlepin for use in Satellite. No patch will be produced to create these indices. They should be applied to a database as needed.

Comment 24 Pavel Moravec 2023-12-07 15:20:45 UTC
Thanks for confirmation, KCS created: https://access.redhat.com/solutions/7048429

Comment 25 Pavel Moravec 2023-12-08 07:53:48 UTC
I am linking support cases where I found (by some limited search) sosreport shows the same error affects these Satellites. Despite the cases were opened for different issues, these customers are affected by this problem as well.

Comment 26 Nikos Moumoulidis 2023-12-12 12:16:51 UTC
To add even more context to Pavel's comment above (#c22), after further investigation:

This issue is really multiple issues:
1. A foreign key violation against fk54b0f288a0b39916 (referencing table cp_upstream_consumer) that was already fixed in https://bugzilla.redhat.com/show_bug.cgi?id=2084130
2. The job being very slow due to lack of appropriate indices.
3. A foreign key violation against fk16370c54b9410fc (referencing table cp_certificate) that has not yet to be addressed. This violation should have never happened under
   normal circumstances because certificates in that table should never be revoked. However, an old db migration script (https://github.com/candlepin/candlepin/blob/6f1a83ced858abc560666af90ecc659f138bb338/src/main/resources/db/changelog/20170301083925-update-revoked-cert-serial-data.xml#L24-L32) accidentally set those to revoked, and once they also were expired, this fk violation started happening.

We will use this BZ id to fix both issues 2 and 3, and also prevent any FK violations from happening in the future due to rogue bad data.

Comment 33 errata-xmlrpc 2024-04-23 17:12:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.