Bug 1609371

Summary: The dynflow scheduling mechanism can lead to tasks initiated later to be executed sooner, leaving older tasks waiting
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: Tasks PluginAssignee: Ivan Necas <inecas>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: andrew.schofield, aruzicka, bkearney, dgross, egolov, inecas, ktordeur, mmccune, pmoravec, sthirugn
Target Milestone: 6.6.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-dynflow-1.2.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1717697 1721131 (view as bug list) Environment:
Last Closed: 2019-10-22 12:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1718889    
Bug Blocks: 1721131    

Description Ivan Necas 2018-07-27 19:10:52 UTC
Description of problem:

Due to original dynflow scheduling, some tasks can be discriminated in favour of newer tasks when under heavy load: even though the overall execution takes overaly the same amount, there are higher spikes on longer-running tasks.


Version-Release number of selected component (if applicable):


How reproducible:
requires special setup to observer reliably

Steps to Reproduce:
1. trigger a lot (200+) of small execution plans (applicability on large amount hosts is a good example)
2. watch task duration of the tasks

Actual results:
some tasks finish relatively fast, while others wait unexpectedly long to finish: newer tasks of the same type get finished much sooner, than other older tasks.

Expected results:
FIFO: tasks triggered sooner should finish sooner, than newer tasks of the same type and complexity.

Comment 1 Ivan Necas 2018-07-27 19:11:32 UTC
Created redmine issue https://projects.theforeman.org/issues/24435 from this bug

Comment 2 Ivan Necas 2018-07-27 19:30:48 UTC
The fix for the bug is proposed upstream https://github.com/Dynflow/dynflow/pull/293

Comment 7 Bryan Kearney 2019-01-23 09:03:05 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/24435 has been resolved.

Comment 15 Mike McCune 2019-06-13 19:07:14 UTC
Did a quick test on througput of a Satellite 6.4 server with this modification applied

# DEFAULT SETTINGS (1 executor)
4000 Host::Update & GenerateApplicability tasks
18 CV Publishes
Duration:
Started at: 2019-06-13 16:43:53 UTC 
Ended at:   2019-06-13 17:19:02 UTC 
DURATION: 36min

# 2 EXECUTORS
4000 Host::Update & GenerateApplicability tasks
18 CV Publishes
Duration:
Started at: 2019-06-13 17:36:01 UTC 
Ended at: 2019-06-13 17:59:10 UTC 
DURATION: 23min

#  2 EXECUTORS & DYNFLOW FIFO MODIFICATION
4000 Host::Update & GenerateApplicability tasks
18 CV Publishes
Started at: 2019-06-13 18:12:23 UTC 
Ended at: 2019-06-13 18:25:31 UTC 
4000 tasks
18 CV Publishes
DURATION: 13min

Almost a 100% throughput improvement

Comment 28 errata-xmlrpc 2019-10-22 12:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172