During an upgrade from public beta to an internal GA build I had pre-conditions such that the upgrade failed various steps but we never let the user know. It appears that the upgrade was OK but looking at the logs, lots of steps failed. We need to properly detect these failures and report. [root@sat-perf-01 ~]# katello-installer --upgrade Upgrading... Upgrade Step: stop_services... Upgrade Step: start_mongo... Upgrade Step: migrate_pulp... Upgrade Step: migrate_candlepin... Upgrade Step: migrate_foreman... Upgrade Step: Running installer... Installing Done [100%] [..................................................] The full log is at /var/log/katello-installer/katello-installer.log Upgrade Step: migrate_pulp... Upgrade Step: Restarting services... Upgrade Step: db:seed... Upgrade Step: Running errata import task (this may take a while)... Upgrade Step: Update gpg key urls to support capsule isolation (this may take a while)... Upgrade Step: Update repositories to specify metadata_expire (this may take a while)... Katello upgrade completed! [root@sat-perf-01 ~]# hammer ping Error: Request Timeout [root@sat-perf-01 ~]# hammer ping Error: Request Timeout # grep -C 5 ERROR /var/log/katello-installer/katello-installer.log Seeding /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-4.0.2.13/db/seeds.d/50-bootdisk_templates.rb Seeding /opt/rh/ruby193/root/usr/share/gems/gems/foreman_discovery-2.0.0.17/db/seeds.d/60_discovery_proxy_feature.rb All seed files executed [ INFO 2015-07-09 15:10:29 main] Upgrade Step: Running errata import task (this may take a while)... [ERROR 2015-07-09 15:12:58 main] rake aborted! 500 Internal Server Error Tasks: TOP => katello:upgrades:2.1:import_errata (See full trace by running task with --trace) Importing Errata [ INFO 2015-07-09 15:12:58 main] Upgrade Step: Update gpg key urls to support capsule isolation (this may take a while)... [DEBUG 2015-07-09 15:13:27 main] Importing GPG Key Urls to support Capsule Communication [ INFO 2015-07-09 15:13:27 main] Upgrade Step: Update repositories to specify metadata_expire (this may take a while)... [ERROR 2015-07-09 15:13:58 main] rake aborted! There was an issue with the backend service candlepin: 404 Resource Not Found Tasks: TOP => katello:upgrades:2.2:update_metadata_expire (See full trace by running task with --trace) Updating Expire Metadata for Custom Content [ INFO 2015-07-09 15:13:58 main] Katello upgrade completed! [DEBUG 2015-07-09 15:13:58 main] Hook /usr/share/katello-installer/hooks/post/30-upgrade.rb returned [<Logging::Logger:0xda1358 name="main">, <Logging::Logger:0xdd7d68 name="fatal">] [ INFO 2015-07-09 15:13:58 main] All hooks in group post finished [DEBUG 2015-07-09 15:13:58 main] Exit with status code: 2 (signal was 2) [ERROR 2015-07-09 15:13:58 main] Repeating errors encountered during run: [ERROR 2015-07-09 15:13:58 main] rake aborted! 500 Internal Server Error Tasks: TOP => katello:upgrades:2.1:import_errata (See full trace by running task with --trace) Importing Errata [ERROR 2015-07-09 15:13:58 main] rake aborted! There was an issue with the backend service candlepin: 404 Resource Not Found Tasks: TOP => katello:upgrades:2.2:update_metadata_expire (See full trace by running task with --trace) Updating Expire Metadata for Custom Content
The only errors occur in the post hook, and we're not checking the status of those, so we need to add the same checks we have in the pre section. The actual upgrade itself (the puppet run) was successful, however it appears some of the underlying services are in a bad state which is why the post tasks failed. Can we see the foreman-debug to maybe understand the underlying cause? Or do we care? If I understand correctly you were doing a lot of testing with the box?
Created attachment 1050437 [details] foreman-debug
I think we should care if steps in the upgrade fail and report back even if the Satellite is in a bad state before the upgrade is ran.
Created redmine issue http://projects.theforeman.org/issues/11086 from this bug
> I think we should care if steps in the upgrade fail and report back even if the Satellite is in a bad state before the upgrade is ran. Right, I understand, the PR is open upstream to fix the hooks: https://github.com/Katello/katello-installer/pull/239 My question was if you want me to look into the *reason* your Katello upgrade failed, or if you already know or don't care.
I'm not that concerned why the upgrade failed, myself and others had been doing some qpid load testing, stopping services and leaving things in a generally bad state so in this case I'm not concerned *why* it failed, just want to note that it did.
Moving to POST since upstream bug http://projects.theforeman.org/issues/11086 has been closed ------------- Anonymous Applied in changeset commit:katello-installer|28f2c8b80a3b00f8b8f078bdaf56e5c688669fd0.
Verified
VERIFIED. This bug with latest GA Snap 14. There are no evidence of such errors observed. But there are errors of HTTPD service failed to restart is present, for which the bug is logged at BZ1245998. Which is out of scope of this bug. So moving it to verified.
This bug was fixed in Satellite 6.1.1 which was delivered on 12 August, 2015.