1623151 – task "Pulp disk space notification" should not be getting into the paused state

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1623151 - task "Pulp disk space notification" should not be getting into the paused state

Summary: task "Pulp disk space notification" should not be getting into the paused state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Tasks Plugin
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	6.4.0
Assignee:	Ivan Necas
QA Contact:	Nikhil Kathole
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1122832 1619394
TreeView+	depends on / blocked

Reported:	2018-08-28 15:08 UTC by anerurka
Modified:	2023-09-07 19:20 UTC (History)
CC List:	8 users (show)
Fixed In Version:	tfm-rubygem-katello-3.7.0.26-1,foreman-1.18.0.24-1,tfm-rubygem-dynflow-1.0.5.1-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-16 19:16:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	24761	Normal	Closed	Don't pause on active jobs tasks	2021-02-10 18:32:27 UTC
Foreman Issue Tracker	24763	Normal	Closed	Any error in CreateRssNotifications still marks the task as success and it doesn't re-schedule next event	2021-02-10 18:32:27 UTC
Foreman Issue Tracker	24765	Normal	Closed	Any error in 'Pulp disk space notification' or 'Subscription expiration notification' still marks the task as success an...	2021-02-10 18:32:28 UTC

Description anerurka 2018-08-28 15:08:33 UTC

Description of problem:

[RFE] foreman-maintain should not allow task "Pulp disk space notification" to start or skip its check until its satellite upgrade execution is completed 


Version-Release number of selected component (if applicable):

Satellite Server v.6.4


How reproducible:


Steps to Reproduce:

1. Perform the upgrade
# foreman-maintain upgrade run --target-version 6.4

Actual results:

- Post upgrade checks for foreman-maintain is stuck due to "Pulp disk space notification"

--------------
Running Checks after upgrading to Satellite 6.4
================================================================================
Check for verifying syntax for ISP DHCP configurations:               [SKIPPED]
DHCP feature is not enabled
--------------------------------------------------------------------------------
Check for paused tasks:                                               [FAIL]
There are currently 1 paused tasks in the system
--------------------------------------------------------------------------------
There are multiple steps to proceed:
1) Resume paused tasks
2) Investigate the tasks via UI
Select step to continue, [n(next), q(quit)]
--------------

Expected results:

- Either Satellite Upgrade should not start a task "Pulp disk space notification" or foreman-maintain should skip its check for this task, until satellite upgrade execution is completed.

Comment 2 anerurka 2018-08-28 15:13:43 UTC

Additional info:

A task for is already started before all Upgrade processes (including smart proxy restarts are finished), this starting of Tasks shall be on-hold until the Upgrade finished.


Log with timing of the upgrade process showing that at '2018/08/21 09:10:28' the upgrade was still running 

------------
[DEBUG 2018-08-21T09:10:01 main]  Class[Foreman::Plugin::Tasks]: The container Stage[main] will propagate my refresh event
[ INFO 2018-08-21T09:10:01 main]  Computing checksum on file /etc/httpd/conf.d/05-foreman.d/katello.conf
[ INFO 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Filebucketed /etc/httpd/conf.d/05-foreman.d/katello.conf to puppet with sum d3ea9a4a4bafa678303a935c71ab4f45
[DEBUG 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Removing existing file for replacement with absent
[ WARN 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]/ensure: removed
[ INFO 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Scheduling refresh of Class[Apache::Service]
[DEBUG 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: The container Foreman::Config::Passenger::Fragment[katello] will propagate my refresh event
[DEBUG 2018-08-21T09:10:01 main]  Executing: 'diff -u /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf /tmp/puppet-file20180821-6593-11ufz9k'
[ WARN 2018-08-21T09:10:01 main]  /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman-ssl.d/katello.conf]/content:
[ WARN 2018-08-21T09:10:01 main] --- /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf    2018-06-25 13:34:55.095129760 +0000
[ WARN 2018-08-21T09:10:01 main] +++ /tmp/puppet-file20180821-6593-11ufz9k      2018-08-21 09:10:01.073081030 +0000
[ WARN 2018-08-21T09:10:01 main] @@ -1,17 +1,3 @@
[ WARN 2018-08-21T09:10:01 main] -### File managed with puppet ###
[ WARN 2018-08-21T09:10:01 main] -
[ WARN 2018-08-21T09:10:01 main] -<Location /pulp/api>
[ WARN 2018-08-21T09:10:01 main] -  SSLUsername SSL_CLIENT_S_DN_CN
[ WARN 2018-08-21T09:10:01 main] -</Location>
[ WARN 2018-08-21T09:10:01 main] -
[ WARN 2018-08-21T09:10:01 main] -Alias /pub /var/www/html/pub
[ WARN 2018-08-21T09:10:01 main] -<Location /pub>
[ WARN 2018-08-21T09:10:01 main] -  <IfModule mod_passenger.c>
[ WARN 2018-08-21T09:10:01 main] -    PassengerEnabled off
...
[DEBUG 2018-08-21T09:16:01 main]
[DEBUG 2018-08-21T09:16:01 main] Success!
[DEBUG 2018-08-21T09:16:01 main] katello-service restart finished successfully!
[ INFO 2018-08-21T09:16:01 main] Upgrade Step: db_seed...
------------

- Below we can see that task "Pulp disk space notification" has been started at "2018/08/21 09:10:28", before upgrade process is completed.

#  hammer task list  --search 'state == paused'                           -------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------
ID                                   | NAME                         | OWNER | STARTED AT          | ENDED AT | STATE  | RESULT | TASK ACTION                  | TASK ERRORS
-------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------
ee6fe282-0564-448d-bb1c-f0dbb7c4b4cd | Pulp disk space notification |       | 2018/08/21 09:10:28 |          | paused | error  | Pulp disk space notification |        
-------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------

Action:
Dynflow::ActiveJob::QueueAdapters::JobWrapper
Input:
{"job_class"=>"CreatePulpDiskSpaceNotifications",
 "job_arguments"=>[],
 "queue"=>"default",
 "locale"=>"en",
 "current_user_id"=>nil}
Output:
{}
Exception:
Foreman::WrappedException: ERF50-5345 [Foreman::WrappedException]: Unable to connect ([ProxyAPI::ProxyException]: ERF12-3580 [ProxyAPI::ProxyException]: Unable to detect pulp storage ([Errno::ECONNREFUSED]: Failed to open TCP connection to satellite6.example.com:9090 (Connection refused - connect(2) for "satellite6.example.com" port 9090)) for Capsule https://satellite6.example.com:9090/pulp/status/disk_usage)

Backtrace:
/usr/share/foreman/app/services/proxy_status/base.rb:55:in `rescue in fetch_proxy_data'
/usr/share/foreman/app/services/proxy_status/base.rb:46:in `fetch_proxy_data'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/proxy_status/pulp.rb:7:in `storage'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/ui_notifications/pulp/proxy_disk_space.rb:9:in `block in deliver!'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/ui_notifications/pulp/proxy_disk_space.rb:7:in
-----------

Comment 3 Ivan Necas 2018-08-30 10:59:54 UTC

I think the issue here is actually not the foreman-maintain but the Pulp Disk speed task itself: since this is not an orchestration task, it should not be getting into the paused state and we should fix this in the Dynflow side, to prevent hitting issues like this.

Comment 4 Ivan Necas 2018-08-30 14:44:52 UTC

Created redmine issue https://projects.theforeman.org/issues/24761 from this bug

Comment 6 Satellite Program 2018-08-30 16:14:48 UTC

Upstream bug assigned to inecas

Comment 7 Satellite Program 2018-08-30 16:14:52 UTC

Upstream bug assigned to inecas

Comment 8 Ivan Necas 2018-08-30 17:10:36 UTC

I've filed 3 different issues across the upstream to address some behaviors regarding this kind of tasks:

Verification steps:

1. first search for the exising scheduled task ('label = CreatePulpDiskSpaceNotifications and state = scheduled'), cancel it if there some (just to ensure we get a new task run at the next dynflowd run)
2. systemctl stop foreman-proxy # to cause the task to fail on next run
3. systemctl restart dynflowd # to trigger the new instance of CreatePulpDiskSpaceNotifications
4. search for `label = CreatePulpDiskSpaceNotifications` in monitor->tasks again

Expected result:

* the last task is in 'stopped/warning' state
* there is new instance of CreatePulpDiskSpaceNotifications task in scheduled state

Actual results:

* the last task is in 'paused/error' state
* there is no instance of CreatePulpDiskSpaceNotifications task in scheduled state

This means that there will not be new event happening later, until the dynflowd is restarted while the proxy is running: also any failure in any new instance causes the task not to be scheduled again.

This also applies to other recurring events (such as rss notifications check), but they are not that easy to reproduce.

The fixes are addressing all instances I know of.

Comment 9 Satellite Program 2018-09-05 12:06:26 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24765 has been resolved.

Comment 15 Nikhil Kathole 2018-09-21 05:10:11 UTC

VERIFIED

Version tested:
Satellite 6.4 snap 22

 out: Running Checks after upgrading to Satellite 6.4
 out: ================================================================================
 out: Check for verifying syntax for ISP DHCP configurations:               [OK]
 out: --------------------------------------------------------------------------------
 out: Check for paused tasks:                                               [OK]
 out: --------------------------------------------------------------------------------
 out: Check whether all services are running using hammer ping:             [OK]
 out: --------------------------------------------------------------------------------
 out: 
 out: 
 out: --------------------------------------------------------------------------------
 out: Upgrade finished.
 out: 
HIGHLIGHT:upgrade_logging:Time taken for Satellite Upgrade - 1:07:21

Comment 16 Bryan Kearney 2018-10-16 19:16:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927

Note You need to log in before you can comment on or make changes to this bug.