Bug 1904963 - Command exceeded timeout while Installer executes foreman-rake db:migrate
Summary: Command exceeded timeout while Installer executes foreman-rake db:migrate
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Content Views
Version: 6.8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 6.8.3
Assignee: Justin Sherrill
QA Contact: Lai
URL:
Whiteboard:
Depends On:
Blocks: 1908237
TreeView+ depends on / blocked
 
Reported: 2020-12-07 09:05 UTC by Evgeni Golov
Modified: 2024-03-25 17:25 UTC (History)
10 users (show)

Fixed In Version: tfm-rubygem-katello-3.16.0.18-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1908237 (view as bug list)
Environment:
Last Closed: 2021-01-28 19:40:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 31540 0 High Closed Command exceeded timeout while Installer executes foreman-rake db:migrate 2021-02-19 19:53:05 UTC
Red Hat Product Errata RHBA-2021:0302 0 None None None 2021-01-28 19:41:09 UTC

Description Evgeni Golov 2020-12-07 09:05:42 UTC
Description of problem:
During an upgrade of 6.7.5 to 6.8.2 (snap 2) the db:migrate step in the installer fails, as it doesn't finish in 5 minutes (which is the default timeout for Puppet exec).

[DEBUG 2020-12-04T13:14:09 main]  Exec[foreman-rake-db:migrate](provider=posix): Executing '/usr/sbin/foreman-rake db:migrate'
[DEBUG 2020-12-04T13:14:09 main]  Executing with uid=foreman: '/usr/sbin/foreman-rake db:migrate'
[ERROR 2020-12-04T13:19:09 main]  Command exceeded timeout

One interesting detail is that Puppet does *NOT* kill the process, so the db:migrate is running in the background and tries to re-run the installer will result in other, confusing errors (ActiveRecord::ConcurrentMigrationError, Foreman::PermissionMissingException -- the later seems to happen on the "unless" command)

The migration run took roughtly 6 hours on my setup, and I could finish the upgrade just fine after that.

Looking at the PostgreSQL activity during the migration, I've seen the following query over and over again:

postgres=# select backend_start,xact_start,query_start,query from pg_stat_activity where state='active';

 2020-12-04 13:15:07.457939-05 | 2020-12-04 13:15:08.027364-05 | 2020-12-04 14:07:56.01994-05  | SELECT COUNT(*) FROM "katello_rpms" WHERE "katello_rpms"."id" IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_r
pms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories" WHERE "katello_repositories"."content_view_version_id" = $1 AND (environment_id is NULL)))

As the migrations ran without a terminal attached, I don't know *which* migration triggered that, but looking at the Katello migrations between 6.7.5 and 6.8.2, I think it could be one of these:
db/migrate/20191204214919_add_content_view_version_counts.rb (most probably!)
db/migrate/20200129172534_add_epoch_version_release_arch_to_katello_installed_packages.rb
db/migrate/20200501155054_installed_package_unique_nvrea.rb


I think there are actually two bugs here:
1. the installer should not assume db:migrate can finish in 5 minutes, this is not realistic.
2. the migration should not require 6 hours to finish.

Version-Release number of selected component (if applicable):
Satellite 6.8.2 Snap 2

How reproducible:
100%


Steps to Reproduce:
1. take a 6.7.5 dogfood server (big db, rather slow-ish storage)
2. try to upgrade to 6.8.2

Actual results:
Fails

Expected results:
Succeeds ;-)

Additional info:
This is probably related, but not exactly the same as #1888983

Comment 1 Evgeni Golov 2020-12-09 06:49:57 UTC
When running the migrations outside the installer, I could confirm which migration is the culprit:
== 20191204214919 AddContentViewVersionCounts: migrated (27884.8553s) =========

Comment 3 Tomer Brisker 2020-12-15 15:13:05 UTC
This migration was added in the fix for BZ 1486446, looks like the migration first pulls all CVVs to memory then iterates over them triggering multiple count queries for each CVV (which could be thousands), than saves them running all callbacks, which is also potentially slow. The migration should likely be changed to update the columns directly and possibly calculate the totals in a different more efficient manner.

Comment 5 Justin Sherrill 2020-12-21 17:11:45 UTC
Created redmine issue https://projects.theforeman.org/issues/31540 from this bug

Comment 6 Bryan Kearney 2020-12-24 21:43:26 UTC
Upstream bug assigned to jsherril

Comment 7 Bryan Kearney 2020-12-24 21:43:28 UTC
Upstream bug assigned to jsherril

Comment 8 Bryan Kearney 2021-01-08 13:43:52 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31540 has been resolved.

Comment 12 Justin Sherrill 2021-01-20 16:06:27 UTC
clearing needinfo

Comment 15 Mike McCune 2021-01-26 16:15:41 UTC
1) Cloned 6.7 dogfood server

2) Configured 6.8.3 snaps

3) upgraded to 6.8.3 with satellite-maintain

4) verified the db:migrate passed without error

VERIFIED

Comment 19 errata-xmlrpc 2021-01-28 19:40:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.8.3 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0302


Note You need to log in before you can comment on or make changes to this bug.