Bug 1950836

Summary: [RFE] Generate Alerts if a tasks are in running/Paused state for more than a certain days.
Product: Red Hat Satellite Reporter: Jaskaran Singh Narula <janarula>
Component: NotificationsAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Pavel Novotny <pnovotny>
Severity: high Docs Contact:
Priority: high    
Version: 6.8.0CC: ahumbe, aruzicka, bbuckingham, mkalyat, ogajduse, pcreech, pnovotny, rkhadkik, sajha, saydas, thadzhie
Target Milestone: 6.14.0Keywords: FutureFeature, PrioBumpGSS, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-foreman-tasks-8.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-08 14:17:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaskaran Singh Narula 2021-04-19 02:41:51 UTC
1. Proposed title of this feature request
[RFE] Generate Alerts if tasks are in Running/Paused state for more than 2 days. 

2. What is the nature and description of the request?

In this RFE, it is expected to notify the "Admin User"  regarding the tasks which are generated 2 days back they are not yet completed as stopped. If a running task is taking more than 2 days to complete then there is something wrong with the performance of the satellite or with the task as well. 

In this case, if paused tasks it is clear that the user should check why the task is in paused state.  

3. Why does the customer need this? (List the business requirements here)

If a service got failed and then restarted or for any such reasons, if the tasks are not able to get to complete state they get stuck in the paused/error state and  
they do not get cleaned as well, since they are not in the stopped state. 

Also, tasks like repository sync should not take more than a day but if they are taking then we need to look for them what is causing the delay in them. 

4. How would the customer like to achieve this? (List the functional requirements here)

It would be nice to get a Banner Message and along with an email notification.

5. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.

NA

6. Is there already an existing RFE upstream or in Red Hat Bugzilla?
NA

7. Does the customer have any specific timeline dependencies and which release would  they  like  to  target  (i.e.  RHEL5,RHEL6)?

NA

Comment 2 Mike McCune 2021-05-05 13:49:40 UTC
Expanding this to include the ability for Satellite to attempt to self-repair tasks that are paused:

* Auto-resume tasks that are paused with a limited number of re-tries

* Cancel tasks that fail to resume and repeatedly end up in paused

Comment 11 Pavel Novotny 2023-06-30 18:31:56 UTC
Verified in 
foreman-3.7.0-2.el8sat.noarch
rubygem-foreman-tasks-8.1.1-1.el8sat.noarch
(6.14.0 snap 5)

An admin user can now subscribe to "Long running tasks" under the Email Preferences section in the user profile.

There is also scheduled recurring task "Check for long running tasks", which by default runs every midnight (local system time).

This task checks all tasks that are running or are paused for more than 2 days and if there are any,
an e-mail is sent to the subscribed user.

The e-mail looks as follows:

"""
Subject: [satellite] Tasks pending since 2023-06-28 14:08:01 -0400


Tasks lingering in states running, paused since 2023-06-28 14:08:01 -0400

_("ID") 	                        _("Action") 	                                _("Label") 	                        _("State")      _("State updated at")
e63c7b1e-c3bd-4e90-acc0-9b2f453ec46d 	Remote action: Run sleep 432000 on host1 	Actions::RemoteExecution::RunHostJob 	running 	2023-06-28 13:40:35 UTC

More details

This email was sent from Satellite identified by UUID 5a14ce4a-e184-5302-a691-bbe7b4d56eb9
"""

The task ID is linked to the task details page and "More details" is a link to all long running/paused tasks, i.e., more that 2 days old.


One minor issue is with the table headers, where the _("") localization function (my guess) has not been translated/executed.
I can file a separate bug for this detail, if needed. Let me know @aruzicka

Comment 13 Pavel Novotny 2023-07-19 13:54:58 UTC
(In reply to Pavel Novotny from comment #11)
...
> 
> The e-mail looks as follows:
> 
> """
> Subject: [satellite] Tasks pending since 2023-06-28 14:08:01 -0400
> 
> 
> Tasks lingering in states running, paused since 2023-06-28 14:08:01 -0400
> 
> _("ID") 	                        _("Action") 	                              _("Label")                   _("State")      _("State updated at")
...

Filed bug 2223996 for the untranslated strings.

Comment 14 Dana Singleterry 2023-09-20 10:06:53 UTC
*** Bug 2183565 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2023-11-08 14:17:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.14 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6818