Bug 1939450

Summary: ERROR -- /parallel-executor-core: no manager for Dynflow::Director::Event
Product: Red Hat Satellite Reporter: Michael Vollmer <mivollme>
Component: DynflowAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: medium Docs Contact:
Priority: high    
Version: 6.8.0CC: aruzicka, jkrajice, pcreech
Target Milestone: 6.10.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-foreman-tasks-4.13, tfm-rubygem-dynflow-1.4.9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-16 14:10:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Vollmer 2021-03-16 12:17:03 UTC
Description of problem:
After any REX operation there is an error logged to /var/log/messages. This error is dumped even the REX operation completed successfully.

The delay between the start of the REX and the error message is exactly 10mins.

Version-Release number of selected component (if applicable):
Satellite Version: 6.8.4
tfm-rubygem-smart_proxy_dynflow-0.2.4-5.el7sat.noarch
foreman-dynflow-sidekiq-2.1.2.25-1.el7sat.noarch
tfm-rubygem-smart_proxy_dynflow_core-0.2.6-1.fm2_1.el7sat.noarch
tfm-rubygem-dynflow-1.4.7-1.fm2_1.el7sat.noarch

How reproducible:
any time after a rex operation

Steps to Reproduce:
1. Run hosts job: Run Ansible roles against hosts
2. wait 10mins
3. check the log

Actual results:
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: E, [2021-03-16T12:25:06.906185 #70350] ERROR -- /parallel-executor-core: no manager for Dynflow::Director::Event[request_id: 74481eab-f494-4bbb-8284-603189aca499-14, execution_plan_id: 638db861-064c-4988-aa38-09879c779f0d, step_id: 2, event: A
ctions::Middleware::WatchDelegatedProxySubTasks::CheckOnProxyActions, result: #<Concurrent::Promises::ResolvableFuture:0x000000001fadeea8 pending>] (Dynflow::Error)
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/director.rb:167:in `handle_event'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/abstract/core.rb:35:in `handle_event'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: [ concurrent-ruby ]
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/abstract/core.rb:113:in `on_message'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: [ concurrent-ruby ]
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/parallel.rb:42:in `delayed_event'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/clock.rb:36:in `apply'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/clock.rb:75:in `run_ready_timers'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/clock.rb:57:in `tick'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: [ concurrent-ruby ]
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/clock.rb:96:in `block in sleep_to'
Mar 16 12:25:06 satellite dynflow-sidekiq@orchestrator: [ concurrent-ruby ]

Expected results:
no error in the logs

Additional info:

Comment 1 Michael Vollmer 2021-03-17 09:31:03 UTC
*** Bug 1939454 has been marked as a duplicate of this bug. ***

Comment 2 Adam Ruzicka 2021-03-19 11:59:33 UTC
When a rex job starts we schedule an event to have the job check on the capsules in case they don't call Satellite back. Once this event is scheduled, there's currently no way to cancel it. If the job finishes before the event is delivered, dynflow complains that it cannot deliver an event to a stopped execution plan. From dynflow's point of view, this is an error state, from Satellite's point of view not at all. The job already finished so there's no need to check on the capsules.

While this is a valid bug, it is completely harmless since it has no other impact apart from logging an error. With this in mind, could we reconsider the priority and severity? I have patches for this ready, so no worries about that, it just might send a wrong message implying this is actually harmful.

Comment 4 Lukáš Hellebrandt 2021-09-02 09:15:09 UTC
Verified with Sat 6.10.0 snap 15.0.

Used reproducer from OP, after ~1 day after successfully running an Ansible role against a host, there is still no error message similar to the one mentioned, in either /var/log/messages or /var/log/foreman/production.log

Comment 7 errata-xmlrpc 2021-11-16 14:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702