Bug 1980023

Summary:	satellite-installer times out during long running SQL DELETE transactions
Product:	Red Hat Satellite	Reporter:	Taft Sanders <tasander>
Component:	Candlepin	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED ERRATA	QA Contact:	Lai <ltran>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	6.9.0	CC:	dsynk, ehelms, mmccune, redakkan
Target Milestone:	6.11.0	Keywords:	Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	candlepin-3.1.31-1, candlepin-4.0.7-1, candlepin-4.1.6-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1980370 1980678 1980681 (view as bug list)		Environment:
Last Closed:	2022-07-05 14:29:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1980370, 1980678, 1980681
Bug Blocks:

Description Taft Sanders 2021-07-07 15:39:03 UTC

Description of problem:
The satellite-installer timeout value by default isn't very long by default. Waiting for DELETE requests on postgres have exceeded the timeout of the installer causing the installer to fail and have to be run again. Could instances of DELETE be changed for TRUNCATE where possible for DB updates/migrations/changes?

Version-Release number of selected component (if applicable):
candlepin-3.1.28-1

How reproducible:
n/a

Steps to Reproduce:
1.
2.
3.

Actual results:
/var/log/foreman-installer/satellite.log:
[DEBUG 2021-07-04T19:28:32 main]  Exec[cpdb update](provider=posix): Executing 'cpdb --update --dbhost=192.168.2.5 --dbport=5432 --database='candlepin' --user='candlepin' --password='candlepin' >> /var/log/candlepin/cpdb.log 2>&1 && touch /var/lib/candlepin/cpdb_update_done'
[ERROR 2021-07-04T19:33:32 main]  Command exceeded timeout
[ERROR 2021-07-04T19:33:32 main]  /Stage[main]/Candlepin::Database::Postgresql/Exec[cpdb update]/returns: change from 'notrun' to ['0'] failed: Command exceeded timeout
[ INFO 2021-07-04T19:33:32 main]  /Stage[main]/Candlepin::Database::Postgresql/Exec[cpdb update]: Evaluated in 300.01 seconds


postgres-<DAY>.log:
postgresql-Sun.log:2021-07-04 20:24:46 CEST STATEMENT:  DELETE FROM public.cp_cont_access_cert
postgresql-Sun.log:2021-07-04 22:45:06 CEST LOG:  duration: 8242870.193 ms  execute <unnamed>: DELETE FROM public.cp_cont_access_cert


Additional tries to run the installer while this DELETE was left running after the installer timed out resulted in the following errors in the cpdb.log:
########## ERROR ############
Error running command: /usr/share/candlepin/liquibase.sh --driver=org.postgresql.Driver --classpath=/var/lib/tomcat/webapps/candlepin/WEB-INF/lib/postgresql-42.2.2.jar:/var
/lib/tomcat/webapps/candlepin/WEB-INF/classes/ --changeLogFile=db/changelog/changelog-update.xml --url="jdbc:postgresql://192.168.2.5:5432/candlepin" --username=$DBUSERNAM
E --password=$DBPASSWORD --logLevel=severe migrate -Dcommunity=False
Status code: 65280
Command output: SEVERE 7/4/21 7:26 PM:liquibase: db/changelog/20201103112757-purge-current-sca-certs.xml::20201103112757-1::crog: Could not release lock
liquibase.exception.LockException: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: This connecti
on has been closed.
        at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:234)
--snip--
        ... 27 more
Liquibase update Failed: Migration failed for change set db/changelog/20201103112757-purge-current-sca-certs.xml::20201103112757-1::crog:
     Reason: liquibase.exception.UnexpectedLiquibaseException: org.postgresql.util.PSQLException: This connection has been closed.

Expected results:
A TRUNCATE action for this table would have completed faster and allowed the installer to keep moving along. I assume this is known and a DELETE was explicitly used here as it was necessary for a particular reason. Filing an RFE just to be sure.

Additional info:

Comment 2 Mike McCune 2021-08-25 13:59:40 UTC

Moving this to a bug as we need to handle these longer running migration tasks better both from the DB perspective and at the installer level.

Comment 5 Danny Synk 2022-06-15 17:56:34 UTC

The fix to this issue was verified by Candlepin dev in [1]. Satellite QE has observed no regressions of this behavior in Satellite 6.11 running on either RHEL 7 or RHEL 8. 

[1] https://github.com/candlepin/candlepin/pull/3099#pullrequestreview-725662506

Comment 8 errata-xmlrpc 2022-07-05 14:29:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498