Description of problem: During an upgrade of 6.7.5 to 6.8.2 (snap 2) the db:migrate step in the installer fails, as it doesn't finish in 5 minutes (which is the default timeout for Puppet exec). [DEBUG 2020-12-04T13:14:09 main] Exec[foreman-rake-db:migrate](provider=posix): Executing '/usr/sbin/foreman-rake db:migrate' [DEBUG 2020-12-04T13:14:09 main] Executing with uid=foreman: '/usr/sbin/foreman-rake db:migrate' [ERROR 2020-12-04T13:19:09 main] Command exceeded timeout One interesting detail is that Puppet does *NOT* kill the process, so the db:migrate is running in the background and tries to re-run the installer will result in other, confusing errors (ActiveRecord::ConcurrentMigrationError, Foreman::PermissionMissingException -- the later seems to happen on the "unless" command) The migration run took roughtly 6 hours on my setup, and I could finish the upgrade just fine after that. Looking at the PostgreSQL activity during the migration, I've seen the following query over and over again: postgres=# select backend_start,xact_start,query_start,query from pg_stat_activity where state='active'; 2020-12-04 13:15:07.457939-05 | 2020-12-04 13:15:08.027364-05 | 2020-12-04 14:07:56.01994-05 | SELECT COUNT(*) FROM "katello_rpms" WHERE "katello_rpms"."id" IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_r pms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories" WHERE "katello_repositories"."content_view_version_id" = $1 AND (environment_id is NULL))) As the migrations ran without a terminal attached, I don't know *which* migration triggered that, but looking at the Katello migrations between 6.7.5 and 6.8.2, I think it could be one of these: db/migrate/20191204214919_add_content_view_version_counts.rb (most probably!) db/migrate/20200129172534_add_epoch_version_release_arch_to_katello_installed_packages.rb db/migrate/20200501155054_installed_package_unique_nvrea.rb I think there are actually two bugs here: 1. the installer should not assume db:migrate can finish in 5 minutes, this is not realistic. 2. the migration should not require 6 hours to finish. Version-Release number of selected component (if applicable): Satellite 6.8.2 Snap 2 How reproducible: 100% Steps to Reproduce: 1. take a 6.7.5 dogfood server (big db, rather slow-ish storage) 2. try to upgrade to 6.8.2 Actual results: Fails Expected results: Succeeds ;-) Additional info: This is probably related, but not exactly the same as #1888983
When running the migrations outside the installer, I could confirm which migration is the culprit: == 20191204214919 AddContentViewVersionCounts: migrated (27884.8553s) =========
This migration was added in the fix for BZ 1486446, looks like the migration first pulls all CVVs to memory then iterates over them triggering multiple count queries for each CVV (which could be thousands), than saves them running all callbacks, which is also potentially slow. The migration should likely be changed to update the columns directly and possibly calculate the totals in a different more efficient manner.
Created redmine issue https://projects.theforeman.org/issues/31540 from this bug
Upstream bug assigned to jsherril
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31540 has been resolved.
clearing needinfo
1) Cloned 6.7 dogfood server 2) Configured 6.8.3 snaps 3) upgraded to 6.8.3 with satellite-maintain 4) verified the db:migrate passed without error VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Satellite 6.8.3 Async Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0302