Bug 767271

Summary: Need better error handling for delayed_jobs
Product: Red Hat Satellite Reporter: Mike McCune <mmccune>
Component: APIAssignee: Tomas Strachota <tstrachota>
Status: CLOSED CURRENTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.0.0CC: cwelton, lzap, tstrachota
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: katello-cli-0.1.29-1, katello-0.1.145-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-22 18:13:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 747354    

Description Mike McCune 2011-12-13 17:15:09 UTC
If an async job like a promotion fails the only thing we have is a record in the task_statues table but no mention in a logfile, a notification or perhaps even an email.

For example, if a user starts a promotion of content and something goes wrong (pulp error, disk full, etc) they have no idea it failed and the only thing they can do is look in the database itself for errors using this procedure:

https://fedorahosted.org/katello/wiki/TaskStatuses

We need a few things to better handle this:

1) First we should dump all errors and exceptions into a logfile:  /var/log/katello/delayed_job.log  would be a fine location.  This would elliminate the need to query the DB for the stacktrace

2) We should consider logging all exceptions as a notification to all users in an Org so they would see a red Error in the UI

if we could get (1) ASAP that would help greatly for debugging

Comment 1 Lukas Zapletal 2011-12-15 08:50:15 UTC
I believe Tomas just fixed this.

Comment 2 Tomas Strachota 2011-12-15 09:25:58 UTC
We are now raising exceptions when any of promotion subtasks fails.

1542d8c2 async tasks - raising exception when a task fails while waiting until it finishes

I just realized that logging of delayed job exceptions is now enabled in development environment only. I'll enable it for production.

One more issue is in cli command 'product promote' where we don't check result of the promotion and blindly print 'success'. I'm taking this one as well.

I'm leaving notifications on UI folks to implement it in their controller.

Comment 3 Tomas Strachota 2011-12-15 16:10:13 UTC
Fixed the above two backend issues in following commits

fc8f2931
767271 - logging for delayed jobs enabled in all environments

cc45e423
767271 - message after 'product promote' takes promotion failure into account


I created new bz for the UI part of this story (#768047).
Moving this one ON_QA.

Comment 4 Mike McCune 2011-12-15 19:42:28 UTC
nice work guys, thanks for the fast turnaround

Comment 6 Corey Welton 2012-02-14 02:59:12 UTC
QA Verified via the UI, in part through verifying bug #768047