1241687 – Errors during upgrade do not get properly reported

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1241687 - Errors during upgrade do not get properly reported

Summary: Errors during upgrade do not get properly reported

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Upgrades
Sub Component:
Version:	6.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	Unspecified
Assignee:	Stephen Benjamin
QA Contact:	Jitendra Yejare
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-09 19:42 UTC by Mike McCune
Modified:	2017-02-23 19:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-12 16:05:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
foreman-debug (437.08 KB, application/x-xz) 2015-07-09 21:15 UTC, Mike McCune	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	11086	0	None	None	None	2016-04-22 16:34:48 UTC

Description Mike McCune 2015-07-09 19:42:46 UTC

During an upgrade from public beta to an internal GA build I had pre-conditions such that the upgrade failed various steps but we never let the user know. It appears that the upgrade was OK but looking at the logs, lots of steps failed.

We need to properly detect these failures and report.

[root@sat-perf-01 ~]# katello-installer --upgrade
Upgrading...
Upgrade Step: stop_services...
Upgrade Step: start_mongo...
Upgrade Step: migrate_pulp...
Upgrade Step: migrate_candlepin...
Upgrade Step: migrate_foreman...
Upgrade Step: Running installer...
Installing             Done                                               [100%] [..................................................]
  The full log is at /var/log/katello-installer/katello-installer.log
Upgrade Step: migrate_pulp...
Upgrade Step: Restarting services...
Upgrade Step: db:seed...
Upgrade Step: Running errata import task (this may take a while)...
Upgrade Step: Update gpg key urls to support capsule isolation (this may take a while)...
Upgrade Step: Update repositories to specify metadata_expire (this may take a while)...
Katello upgrade completed!
[root@sat-perf-01 ~]# hammer ping
Error: Request Timeout
[root@sat-perf-01 ~]# hammer ping
Error: Request Timeout

# grep -C 5 ERROR /var/log/katello-installer/katello-installer.log 
Seeding /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-4.0.2.13/db/seeds.d/50-bootdisk_templates.rb
Seeding /opt/rh/ruby193/root/usr/share/gems/gems/foreman_discovery-2.0.0.17/db/seeds.d/60_discovery_proxy_feature.rb
All seed files executed

[ INFO 2015-07-09 15:10:29 main] Upgrade Step: Running errata import task (this may take a while)...
[ERROR 2015-07-09 15:12:58 main] rake aborted!
500 Internal Server Error

Tasks: TOP => katello:upgrades:2.1:import_errata
(See full trace by running task with --trace)
Importing Errata

[ INFO 2015-07-09 15:12:58 main] Upgrade Step: Update gpg key urls to support capsule isolation (this may take a while)...
[DEBUG 2015-07-09 15:13:27 main] Importing GPG Key Urls to support Capsule Communication

[ INFO 2015-07-09 15:13:27 main] Upgrade Step: Update repositories to specify metadata_expire (this may take a while)...
[ERROR 2015-07-09 15:13:58 main] rake aborted!
There was an issue with the backend service candlepin: 404 Resource Not Found

Tasks: TOP => katello:upgrades:2.2:update_metadata_expire
(See full trace by running task with --trace)
Updating Expire Metadata for Custom Content

[ INFO 2015-07-09 15:13:58 main] Katello upgrade completed!
[DEBUG 2015-07-09 15:13:58 main] Hook /usr/share/katello-installer/hooks/post/30-upgrade.rb returned [<Logging::Logger:0xda1358 name="main">, <Logging::Logger:0xdd7d68 name="fatal">]
[ INFO 2015-07-09 15:13:58 main] All hooks in group post finished
[DEBUG 2015-07-09 15:13:58 main] Exit with status code: 2 (signal was 2)
[ERROR 2015-07-09 15:13:58 main] Repeating errors encountered during run:
[ERROR 2015-07-09 15:13:58 main] rake aborted!
500 Internal Server Error

Tasks: TOP => katello:upgrades:2.1:import_errata
(See full trace by running task with --trace)
Importing Errata

[ERROR 2015-07-09 15:13:58 main] rake aborted!
There was an issue with the backend service candlepin: 404 Resource Not Found

Tasks: TOP => katello:upgrades:2.2:update_metadata_expire
(See full trace by running task with --trace)
Updating Expire Metadata for Custom Content

Comment 2 Stephen Benjamin 2015-07-09 20:10:37 UTC

The only errors occur in the post hook, and we're not checking the 
status of those, so we need to add the same checks we have in the pre section.

The actual upgrade itself (the puppet run) was successful, however it appears some of the underlying services are in a bad state which is why the post tasks failed.

Can we see the foreman-debug to maybe understand the underlying cause? Or do we care? If I understand correctly you were doing a lot of testing with the box?

Comment 3 Mike McCune 2015-07-09 21:15:11 UTC

Created attachment 1050437 [details]
foreman-debug

Comment 4 Mike McCune 2015-07-09 21:16:41 UTC

I think we should care if steps in the upgrade fail and report back even if the Satellite is in a bad state before the upgrade is ran.

Comment 5 Stephen Benjamin 2015-07-10 12:54:50 UTC

Created redmine issue http://projects.theforeman.org/issues/11086 from this bug

Comment 6 Stephen Benjamin 2015-07-10 13:06:59 UTC

> I think we should care if steps in the upgrade fail and report back even if the Satellite is in a bad state before the upgrade is ran.

Right, I understand, the PR is open upstream to fix the hooks: https://github.com/Katello/katello-installer/pull/239

My question was if you want me to look into the *reason* your Katello upgrade failed, or if you already know or don't care.

Comment 7 Mike McCune 2015-07-14 20:11:24 UTC

I'm not that concerned why the upgrade failed, myself and others had been doing some qpid load testing, stopping services and leaving things in a generally bad state so in this case I'm not concerned *why* it failed, just want to note that it did.

Comment 8 Bryan Kearney 2015-07-20 14:04:45 UTC

Moving to POST since upstream bug http://projects.theforeman.org/issues/11086 has been closed
-------------
Anonymous
Applied in changeset commit:katello-installer|28f2c8b80a3b00f8b8f078bdaf56e5c688669fd0.

Comment 11 Jitendra Yejare 2015-07-24 13:34:22 UTC

Verified

Comment 12 Jitendra Yejare 2015-07-24 13:38:25 UTC

VERIFIED.

This bug with latest GA Snap 14.

There are no evidence of such errors observed.

But there are errors of HTTPD service failed to restart is present, for which the bug is logged at BZ1245998. Which is out of scope of this bug.

So moving it to verified.

Comment 13 Bryan Kearney 2015-08-12 16:05:33 UTC

This bug was fixed in Satellite 6.1.1 which was delivered on 12 August, 2015.

Note You need to log in before you can comment on or make changes to this bug.