Bug 1665470

Summary: Dynflow executor termination may hang if there is an action which keeps the executor occupied
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: jcallaha
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: aruzicka, inecas, jcallaha, zhunting
Target Milestone: 6.4.2Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-dynflow-1.0.5.3-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1654975 Environment:
Last Closed: 2019-02-13 19:08:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1654975    
Bug Blocks:    
Attachments:
Description Flags
ongoing create none

Comment 10 jcallaha 2019-01-27 02:04:38 UTC
Failed QA in Satellite 6.4.2 Snap 1

I followed the steps outlined in #9.

Unfortunately, the process is still stuck on sleep and the executor was never restarted. 
Attached is a screenshot of the create task running for more than a day (>28hrs at this point)

Comment 12 jcallaha 2019-01-28 14:22:22 UTC
Created attachment 1524247 [details]
ongoing create

Totally forgot to attach the screenshot!

No, I see no evidence that the executor was attempted to be restarted.
However, I only know the dynflow_executor log location. If you have a more relevant one, I can check that. 

As of now, the task is still "going".

Comment 13 Ivan Necas 2019-02-05 16:25:34 UTC
With `sleep` in place, the tasks will not restart on it's own: the memory limit needs to be set accordingly and the threshold needs to be reached

So while doing https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9, reproducer steps from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 need to be performed as well to see the behavior when the memory recycler restarts the executor.

So the reproducer steps should be:

1. setup the memory limit
2. follow https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9 to simulate the stuck task
3. finish reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 to hit the memory limit


expectation: the dynflowd service would get restarted, and the stuck task would eventually end up in paused state

Comment 14 jcallaha 2019-02-06 14:53:09 UTC
Verified in Satellite 6.4.2 Snap 1.

Followed the revised steps outlined in #13

The memory limit was reached, after publishing 10 content views and performing validation syncs on 6 RHEL repositories.
In the log, I can see that the executor reaches its limit and is then restarted after the error.

The product create task was then moved to a paused state.


E, [2019-02-06T15:36:09.507961 #2769] ERROR -- /parallel-executor-core: cannot accept event: Dynflow::Director::Event[execution_plan_id: 03e8929b-e814-4175-9f8e-08c7e8876351, step_id: 159, event: Dynflow::Action::Polling::Poll, result: <#Concurrent::Edge::CompletableFuture:0x7f7aa63285e8 pending>] core is terminating (Dynflow::Error)
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.0.5.5/lib/dynflow/executors/parallel/core.rb:40:in `handle_event'
...
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
World has been terminatedExiting
Starting Rails environment
Starting dynflow with the following options: {:rails_root=>"/usr/share/foreman", :process_name=>"dynflow_executor", :pid_dir=>"/usr/share/foreman/tmp/pids", :log_dir=>"/usr/share/foreman/log", :wait_attempts=>300, :wait_sleep=>1, :executors_count=>1, :memory_limit=>419430400.0, :memory_init_delay=>60, :memory_polling_interval=>60}
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::ERRATA_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: previous definition of ERRATA_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::TRACE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: previous definition of TRACE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::SUBSCRIPTION_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: previous definition of SUBSCRIPTION_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_docker-4.1.0/app/controllers/api/v2/containers_controller.rb:107: warning: constant ::Fixnum is deprecated
Everything ready for world: cd0889f8-aff2-4d01-b98d-6510c25c6e7c

Comment 16 errata-xmlrpc 2019-02-13 19:08:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0345