Bug 1623151
| Summary: | task "Pulp disk space notification" should not be getting into the paused state | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | anerurka |
| Component: | Tasks Plugin | Assignee: | Ivan Necas <inecas> |
| Status: | CLOSED ERRATA | QA Contact: | Nikhil Kathole <nkathole> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.4 | CC: | apatel, aruzicka, egolov, inecas, kgaikwad, mbacovsk, nkathole, peter.vreman |
| Target Milestone: | 6.4.0 | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | tfm-rubygem-katello-3.7.0.26-1,foreman-1.18.0.24-1,tfm-rubygem-dynflow-1.0.5.1-1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-16 19:16:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1122832, 1619394 | ||
|
Description
anerurka
2018-08-28 15:08:33 UTC
Additional info:
A task for is already started before all Upgrade processes (including smart proxy restarts are finished), this starting of Tasks shall be on-hold until the Upgrade finished.
Log with timing of the upgrade process showing that at '2018/08/21 09:10:28' the upgrade was still running
------------
[DEBUG 2018-08-21T09:10:01 main] Class[Foreman::Plugin::Tasks]: The container Stage[main] will propagate my refresh event
[ INFO 2018-08-21T09:10:01 main] Computing checksum on file /etc/httpd/conf.d/05-foreman.d/katello.conf
[ INFO 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Filebucketed /etc/httpd/conf.d/05-foreman.d/katello.conf to puppet with sum d3ea9a4a4bafa678303a935c71ab4f45
[DEBUG 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Removing existing file for replacement with absent
[ WARN 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]/ensure: removed
[ INFO 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: Scheduling refresh of Class[Apache::Service]
[DEBUG 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman.d/katello.conf]: The container Foreman::Config::Passenger::Fragment[katello] will propagate my refresh event
[DEBUG 2018-08-21T09:10:01 main] Executing: 'diff -u /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf /tmp/puppet-file20180821-6593-11ufz9k'
[ WARN 2018-08-21T09:10:01 main] /Stage[main]/Katello::Application/Foreman::Config::Passenger::Fragment[katello]/File[/etc/httpd/conf.d/05-foreman-ssl.d/katello.conf]/content:
[ WARN 2018-08-21T09:10:01 main] --- /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf 2018-06-25 13:34:55.095129760 +0000
[ WARN 2018-08-21T09:10:01 main] +++ /tmp/puppet-file20180821-6593-11ufz9k 2018-08-21 09:10:01.073081030 +0000
[ WARN 2018-08-21T09:10:01 main] @@ -1,17 +1,3 @@
[ WARN 2018-08-21T09:10:01 main] -### File managed with puppet ###
[ WARN 2018-08-21T09:10:01 main] -
[ WARN 2018-08-21T09:10:01 main] -<Location /pulp/api>
[ WARN 2018-08-21T09:10:01 main] - SSLUsername SSL_CLIENT_S_DN_CN
[ WARN 2018-08-21T09:10:01 main] -</Location>
[ WARN 2018-08-21T09:10:01 main] -
[ WARN 2018-08-21T09:10:01 main] -Alias /pub /var/www/html/pub
[ WARN 2018-08-21T09:10:01 main] -<Location /pub>
[ WARN 2018-08-21T09:10:01 main] - <IfModule mod_passenger.c>
[ WARN 2018-08-21T09:10:01 main] - PassengerEnabled off
...
[DEBUG 2018-08-21T09:16:01 main]
[DEBUG 2018-08-21T09:16:01 main] Success!
[DEBUG 2018-08-21T09:16:01 main] katello-service restart finished successfully!
[ INFO 2018-08-21T09:16:01 main] Upgrade Step: db_seed...
------------
- Below we can see that task "Pulp disk space notification" has been started at "2018/08/21 09:10:28", before upgrade process is completed.
# hammer task list --search 'state == paused' -------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------
ID | NAME | OWNER | STARTED AT | ENDED AT | STATE | RESULT | TASK ACTION | TASK ERRORS
-------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------
ee6fe282-0564-448d-bb1c-f0dbb7c4b4cd | Pulp disk space notification | | 2018/08/21 09:10:28 | | paused | error | Pulp disk space notification |
-------------------------------------|------------------------------|-------|---------------------|----------|--------|--------|------------------------------|------------
Action:
Dynflow::ActiveJob::QueueAdapters::JobWrapper
Input:
{"job_class"=>"CreatePulpDiskSpaceNotifications",
"job_arguments"=>[],
"queue"=>"default",
"locale"=>"en",
"current_user_id"=>nil}
Output:
{}
Exception:
Foreman::WrappedException: ERF50-5345 [Foreman::WrappedException]: Unable to connect ([ProxyAPI::ProxyException]: ERF12-3580 [ProxyAPI::ProxyException]: Unable to detect pulp storage ([Errno::ECONNREFUSED]: Failed to open TCP connection to satellite6.example.com:9090 (Connection refused - connect(2) for "satellite6.example.com" port 9090)) for Capsule https://satellite6.example.com:9090/pulp/status/disk_usage)
Backtrace:
/usr/share/foreman/app/services/proxy_status/base.rb:55:in `rescue in fetch_proxy_data'
/usr/share/foreman/app/services/proxy_status/base.rb:46:in `fetch_proxy_data'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/proxy_status/pulp.rb:7:in `storage'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/ui_notifications/pulp/proxy_disk_space.rb:9:in `block in deliver!'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.9/app/services/katello/ui_notifications/pulp/proxy_disk_space.rb:7:in
-----------
I think the issue here is actually not the foreman-maintain but the Pulp Disk speed task itself: since this is not an orchestration task, it should not be getting into the paused state and we should fix this in the Dynflow side, to prevent hitting issues like this. Created redmine issue https://projects.theforeman.org/issues/24761 from this bug Upstream bug assigned to inecas Upstream bug assigned to inecas I've filed 3 different issues across the upstream to address some behaviors regarding this kind of tasks:
Verification steps:
1. first search for the exising scheduled task ('label = CreatePulpDiskSpaceNotifications and state = scheduled'), cancel it if there some (just to ensure we get a new task run at the next dynflowd run)
2. systemctl stop foreman-proxy # to cause the task to fail on next run
3. systemctl restart dynflowd # to trigger the new instance of CreatePulpDiskSpaceNotifications
4. search for `label = CreatePulpDiskSpaceNotifications` in monitor->tasks again
Expected result:
* the last task is in 'stopped/warning' state
* there is new instance of CreatePulpDiskSpaceNotifications task in scheduled state
Actual results:
* the last task is in 'paused/error' state
* there is no instance of CreatePulpDiskSpaceNotifications task in scheduled state
This means that there will not be new event happening later, until the dynflowd is restarted while the proxy is running: also any failure in any new instance causes the task not to be scheduled again.
This also applies to other recurring events (such as rss notifications check), but they are not that easy to reproduce.
The fixes are addressing all instances I know of.
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24765 has been resolved. VERIFIED Version tested: Satellite 6.4 snap 22 out: Running Checks after upgrading to Satellite 6.4 out: ================================================================================ out: Check for verifying syntax for ISP DHCP configurations: [OK] out: -------------------------------------------------------------------------------- out: Check for paused tasks: [OK] out: -------------------------------------------------------------------------------- out: Check whether all services are running using hammer ping: [OK] out: -------------------------------------------------------------------------------- out: out: out: -------------------------------------------------------------------------------- out: Upgrade finished. out: HIGHLIGHT:upgrade_logging:Time taken for Satellite Upgrade - 1:07:21 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927 |