1980023 – satellite-installer times out during long running SQL DELETE transactions

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1980023 - satellite-installer times out during long running SQL DELETE transactions

Summary: satellite-installer times out during long running SQL DELETE transactions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Candlepin
Sub Component:
Version:	6.9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	6.11.0
Assignee:	satellite6-bugs
QA Contact:	Lai
Docs Contact:
URL:
Whiteboard:
Depends On:	1980370 1980678 1980681
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-07 15:39 UTC by Taft Sanders
Modified:	2024-12-20 20:25 UTC (History)
CC List:	4 users (show)
Fixed In Version:	candlepin-3.1.31-1, candlepin-4.0.7-1, candlepin-4.1.6-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1980370 1980678 1980681 (view as bug list)
Environment:
Last Closed:	2022-07-05 14:29:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2022:5498	0	None	None	None	2022-07-05 14:29:45 UTC

Description Taft Sanders 2021-07-07 15:39:03 UTC

Description of problem:
The satellite-installer timeout value by default isn't very long by default. Waiting for DELETE requests on postgres have exceeded the timeout of the installer causing the installer to fail and have to be run again. Could instances of DELETE be changed for TRUNCATE where possible for DB updates/migrations/changes?

Version-Release number of selected component (if applicable):
candlepin-3.1.28-1

How reproducible:
n/a

Steps to Reproduce:
1.
2.
3.

Actual results:
/var/log/foreman-installer/satellite.log:
[DEBUG 2021-07-04T19:28:32 main]  Exec[cpdb update](provider=posix): Executing 'cpdb --update --dbhost=192.168.2.5 --dbport=5432 --database='candlepin' --user='candlepin' --password='candlepin' >> /var/log/candlepin/cpdb.log 2>&1 && touch /var/lib/candlepin/cpdb_update_done'
[ERROR 2021-07-04T19:33:32 main]  Command exceeded timeout
[ERROR 2021-07-04T19:33:32 main]  /Stage[main]/Candlepin::Database::Postgresql/Exec[cpdb update]/returns: change from 'notrun' to ['0'] failed: Command exceeded timeout
[ INFO 2021-07-04T19:33:32 main]  /Stage[main]/Candlepin::Database::Postgresql/Exec[cpdb update]: Evaluated in 300.01 seconds


postgres-<DAY>.log:
postgresql-Sun.log:2021-07-04 20:24:46 CEST STATEMENT:  DELETE FROM public.cp_cont_access_cert
postgresql-Sun.log:2021-07-04 22:45:06 CEST LOG:  duration: 8242870.193 ms  execute <unnamed>: DELETE FROM public.cp_cont_access_cert


Additional tries to run the installer while this DELETE was left running after the installer timed out resulted in the following errors in the cpdb.log:
########## ERROR ############
Error running command: /usr/share/candlepin/liquibase.sh --driver=org.postgresql.Driver --classpath=/var/lib/tomcat/webapps/candlepin/WEB-INF/lib/postgresql-42.2.2.jar:/var
/lib/tomcat/webapps/candlepin/WEB-INF/classes/ --changeLogFile=db/changelog/changelog-update.xml --url="jdbc:postgresql://192.168.2.5:5432/candlepin" --username=$DBUSERNAM
E --password=$DBPASSWORD --logLevel=severe migrate -Dcommunity=False
Status code: 65280
Command output: SEVERE 7/4/21 7:26 PM:liquibase: db/changelog/20201103112757-purge-current-sca-certs.xml::20201103112757-1::crog: Could not release lock
liquibase.exception.LockException: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: This connecti
on has been closed.
        at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:234)
--snip--
        ... 27 more
Liquibase update Failed: Migration failed for change set db/changelog/20201103112757-purge-current-sca-certs.xml::20201103112757-1::crog:
     Reason: liquibase.exception.UnexpectedLiquibaseException: org.postgresql.util.PSQLException: This connection has been closed.

Expected results:
A TRUNCATE action for this table would have completed faster and allowed the installer to keep moving along. I assume this is known and a DELETE was explicitly used here as it was necessary for a particular reason. Filing an RFE just to be sure.

Additional info:

Comment 2 Mike McCune 2021-08-25 13:59:40 UTC

Moving this to a bug as we need to handle these longer running migration tasks better both from the DB perspective and at the installer level.

Comment 5 Danny Synk 2022-06-15 17:56:34 UTC

The fix to this issue was verified by Candlepin dev in [1]. Satellite QE has observed no regressions of this behavior in Satellite 6.11 running on either RHEL 7 or RHEL 8. 

[1] https://github.com/candlepin/candlepin/pull/3099#pullrequestreview-725662506

Comment 8 errata-xmlrpc 2022-07-05 14:29:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498

Note You need to log in before you can comment on or make changes to this bug.