Bug 1665470 - Dynflow executor termination may hang if there is an action which keeps the executor occupied
Summary: Dynflow executor termination may hang if there is an action which keeps the e...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Tasks Plugin
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: 6.4.2
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On: 1654975
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-11 14:10 UTC by Ivan Necas
Modified: 2019-10-30 19:28 UTC (History)
4 users (show)

Fixed In Version: tfm-rubygem-dynflow-1.0.5.3-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1654975
Environment:
Last Closed: 2019-02-13 19:08:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
ongoing create (39.34 KB, image/png)
2019-01-28 14:22 UTC, jcallaha
no flags Details


Links
System ID Priority Status Summary Last Updated
Foreman Issue Tracker 25593 Normal Closed Dynflow executor termination may hang if there is an action which keeps the executor occupied 2020-04-20 04:43:38 UTC
Red Hat Product Errata RHBA-2019:0345 None None None 2019-02-13 19:08:23 UTC

Comment 10 jcallaha 2019-01-27 02:04:38 UTC
Failed QA in Satellite 6.4.2 Snap 1

I followed the steps outlined in #9.

Unfortunately, the process is still stuck on sleep and the executor was never restarted. 
Attached is a screenshot of the create task running for more than a day (>28hrs at this point)

Comment 12 jcallaha 2019-01-28 14:22:22 UTC
Created attachment 1524247 [details]
ongoing create

Totally forgot to attach the screenshot!

No, I see no evidence that the executor was attempted to be restarted.
However, I only know the dynflow_executor log location. If you have a more relevant one, I can check that. 

As of now, the task is still "going".

Comment 13 Ivan Necas 2019-02-05 16:25:34 UTC
With `sleep` in place, the tasks will not restart on it's own: the memory limit needs to be set accordingly and the threshold needs to be reached

So while doing https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9, reproducer steps from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 need to be performed as well to see the behavior when the memory recycler restarts the executor.

So the reproducer steps should be:

1. setup the memory limit
2. follow https://bugzilla.redhat.com/show_bug.cgi?id=1665470#c9 to simulate the stuck task
3. finish reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1654217#c0 to hit the memory limit


expectation: the dynflowd service would get restarted, and the stuck task would eventually end up in paused state

Comment 14 jcallaha 2019-02-06 14:53:09 UTC
Verified in Satellite 6.4.2 Snap 1.

Followed the revised steps outlined in #13

The memory limit was reached, after publishing 10 content views and performing validation syncs on 6 RHEL repositories.
In the log, I can see that the executor reaches its limit and is then restarted after the error.

The product create task was then moved to a paused state.


E, [2019-02-06T15:36:09.507961 #2769] ERROR -- /parallel-executor-core: cannot accept event: Dynflow::Director::Event[execution_plan_id: 03e8929b-e814-4175-9f8e-08c7e8876351, step_id: 159, event: Dynflow::Action::Polling::Poll, result: <#Concurrent::Edge::CompletableFuture:0x7f7aa63285e8 pending>] core is terminating (Dynflow::Error)
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.0.5.5/lib/dynflow/executors/parallel/core.rb:40:in `handle_event'
...
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
World has been terminatedExiting
Starting Rails environment
Starting dynflow with the following options: {:rails_root=>"/usr/share/foreman", :process_name=>"dynflow_executor", :pid_dir=>"/usr/share/foreman/tmp/pids", :log_dir=>"/usr/share/foreman/log", :wait_attempts=>300, :wait_sleep=>1, :executors_count=>1, :memory_limit=>419430400.0, :memory_init_delay=>60, :memory_polling_interval=>60}
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::ERRATA_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:7: warning: previous definition of ERRATA_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: already initialized constant Katello::Concerns::ContentFacetHostExtensions::TRACE_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/content_facet_host_extensions.rb:14: warning: previous definition of TRACE_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: already initialized constant Katello::Concerns::SubscriptionFacetHostExtensions::SUBSCRIPTION_STATUS_MAP
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/concerns/subscription_facet_host_extensions.rb:13: warning: previous definition of SUBSCRIPTION_STATUS_MAP was here
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_docker-4.1.0/app/controllers/api/v2/containers_controller.rb:107: warning: constant ::Fixnum is deprecated
Everything ready for world: cd0889f8-aff2-4d01-b98d-6510c25c6e7c

Comment 16 errata-xmlrpc 2019-02-13 19:08:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0345


Note You need to log in before you can comment on or make changes to this bug.