Bug 2229095
Summary: | CertificateCleanupJob fails with foreign key constraint violation on table cp_certificate | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Alexey Masolov <amasolov> | |
Component: | Candlepin | Assignee: | Barnaby Court <bcourt> | |
Status: | CLOSED ERRATA | QA Contact: | Shweta Singh <shwsingh> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 6.11.5 | CC: | ahumbe, alsouza, hyu, jpasqual, momran, nmoumoul, pmoravec, rlavi, saydas, shwsingh, snemeth, zhunting | |
Target Milestone: | 6.15.0 | Keywords: | PrioBumpGSS, Triaged, WorkAround | |
Target Release: | Unused | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | candlepin-4.3.12-1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2231503 2256950 2256951 2266145 (view as bug list) | Environment: | ||
Last Closed: | 2024-04-23 17:12:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2231503, 2234486, 2256950, 2256951 | |||
Bug Blocks: |
Description
Alexey Masolov
2023-08-04 06:16:55 UTC
Can I get a database dump for this? Then certificates that are not yet expired but have been revoked can be used to retrieve content. What specific performance is currently affected? Deleting rows with no concrete justification is not advisable. If there are rows that the job is not removing, they are there for a reason. My understanding is there are two issues behind this BZ, foreign key error and performance: 1) CertificateCleanupJob fails with foreign key constraint error: ERROR: update or delete on table "cp_cert_serial" violates foreign key constraint "fk16370c54b9410fc" on table "cp_certificate" Detail: Key (id)=(1298353109733755480) is still referenced from table "cp_certificate". Do we need a reproducer for this? Is this error acknowledged to be fixed, or is it recommended to just clean some redundant certs? For the workaround / cleanup: what is the right command? Is it the su - postgres -c "psql candlepin -c \"delete from cp_certificate where id in (select c.id as certificate_id from cp_cert_serial cs inner join cp_certificate c on c.serial_id = cs.id where cs.revoked = 't');\"" one? 2) Independent problem is the performance: deleting hundreds of thousands (or millions) of such certs takes hours to days what makes candlepin unresponsive during that time. Is this planned to be fixed? What indices can be added as a workaround - and do we need to remove them before an upgrade (that would, as a part of fix, try to add them and re-adding an index would fail and halt the upgrade, potentially)? Is it the: ~~~ CREATE INDEX cp_cont_access_cert_serial_id_idx on cp_cont_access_cert(serial_id); CREATE INDEX cp_ueber_cert_serial_id_idx on cp_ueber_cert(serial_id); ~~~ Hao's created the index below too: ~~~ CREATE INDEX cp_consumer_cont_acc_cert_id_idx on cp_consumer(cont_acc_cert_id); ~~~ ? (we should write KCS for it but I would like to double-check its content before writing) (please consider the questions "will it be fixed" as informative, not pushing for specific fix/solution) Cleaning the data on an ad hoc basis is the plan. There is no need to write code to clean these rows. The current code does not leave them and it would not be an effective use of anyone's time. The indices are not a workaround, they are the fix. They have been added to the current Candlepin for use in Satellite. No patch will be produced to create these indices. They should be applied to a database as needed. Thanks for confirmation, KCS created: https://access.redhat.com/solutions/7048429 I am linking support cases where I found (by some limited search) sosreport shows the same error affects these Satellites. Despite the cases were opened for different issues, these customers are affected by this problem as well. To add even more context to Pavel's comment above (#c22), after further investigation: This issue is really multiple issues: 1. A foreign key violation against fk54b0f288a0b39916 (referencing table cp_upstream_consumer) that was already fixed in https://bugzilla.redhat.com/show_bug.cgi?id=2084130 2. The job being very slow due to lack of appropriate indices. 3. A foreign key violation against fk16370c54b9410fc (referencing table cp_certificate) that has not yet to be addressed. This violation should have never happened under normal circumstances because certificates in that table should never be revoked. However, an old db migration script (https://github.com/candlepin/candlepin/blob/6f1a83ced858abc560666af90ecc659f138bb338/src/main/resources/db/changelog/20170301083925-update-revoked-cert-serial-data.xml#L24-L32) accidentally set those to revoked, and once they also were expired, this fk violation started happening. We will use this BZ id to fix both issues 2 and 3, and also prevent any FK violations from happening in the future due to rogue bad data. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2010 |