Bug 1665470 - Dynflow executor termination may hang if there is an action which keeps the executor occupied
Summary: Dynflow executor termination may hang if there is an action which keeps the e...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Tasks Plugin
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: 6.4.2
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On: 1654975
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-11 14:10 UTC by Ivan Necas
Modified: 2019-10-30 19:28 UTC (History)
4 users (show)

Fixed In Version: tfm-rubygem-dynflow-1.0.5.3-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1654975
Environment:
Last Closed: 2019-02-13 19:08:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
ongoing create (39.34 KB, image/png)
2019-01-28 14:22 UTC, jcallaha
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 25593 0 Normal Closed Dynflow executor termination may hang if there is an action which keeps the executor occupied 2021-02-19 14:51:50 UTC
Red Hat Product Errata RHBA-2019:0345 0 None None None 2019-02-13 19:08:23 UTC

Comment 10 jcallaha 2019-01-27 02:04:38 UTC
Failed QA in Satellite 6.4.2 Snap 1

I followed the steps outlined in #9.

Unfortunately, the process is still stuck on sleep and the executor was never restarted. 
Attached is a screenshot of the create task running for more than a day (>28hrs at this point)

Comment 12 jcallaha 2019-01-28 14:22:22 UTC
Created attachment 1524247 [details]
ongoing create

Totally forgot to attach the screenshot!

No, I see no evidence that the executor was attempted to be restarted.
However, I only know the dynflow_executor log location. If you have a more relevant one, I can check that. 

As of now, the task is still "going".

Comment 13 Ivan Necas 2019-02-05 16:25:34 UTC
With `sleep` in place, the tasks will not restart on it's own: the memory limit needs to be set accordingly and the threshold needs to be reached

So while doing https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9, reproducer steps from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 need to be performed as well to see the behavior when the memory recycler restarts the executor.

So the reproducer steps should be:

1. setup the memory limit
2. follow https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9 to simulate the stuck task
3. finish reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 to hit the memory limit


expectation: the dynflowd service would get restarted, and the stuck task would eventually end up in paused state

Comment 14 jcallaha 2019-02-06 14:53:09 UTC
Verified in Satellite 6.4.2 Snap 1.

Followed the revised steps outlined in #13

The memory limit was reached, after publishing 10 content views and performing validation syncs on 6 RHEL repositories.
In the log, I can see that the executor reaches its limit and is then restarted after the error.

The product create task was then moved to a paused state.


E, [2019-02-06T15:36:09.507961 #2769] ERROR -- /parallel-executor-core: cannot accept event: Dynflow::Director::Event[execution_plan_id: 03e8929b-e814-4175-9f8e-08c7e8876351, step_id: 159, event: Dynflow::Action::Polling::Poll, result: <#Concurrent::Edge::CompletableFuture:0x7f7aa63285e8 pending>] core is terminating (Dynflow::Error)
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.0.5.5/lib/dynflow/executors/parallel/core.rb:40:in `handle_event'
...
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
World has been terminatedExiting
Starting Rails environment
Starting dynflow with the following options: {:rails_root=>"/usr/share/foreman", :process_name=>"dynflow_executor", :pid_dir=>"/usr/share/foreman/tmp/pids", :log_dir=>"/usr/share/foreman/log", :wait_attempts=>300, :wait_sleep=>1, :executors_count=>1, :memory_limit=>419430400.0, :memory_init_delay=>60, :memory_polling_interval=>60}
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::ERRATA_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: previous definition of ERRATA_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::TRACE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: previous definition of TRACE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::SUBSCRIPTION_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: previous definition of SUBSCRIPTION_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_docker-4.1.0/app/controllers/api/v2/containers_controller.rb:107: warning: constant ::Fixnum is deprecated
Everything ready for world: cd0889f8-aff2-4d01-b98d-6510c25c6e7c

Comment 16 errata-xmlrpc 2019-02-13 19:08:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0345


Note You need to log in before you can comment on or make changes to this bug.