Bug 1370139

Summary: there is no way how to cancel task which is still scheduling its child tasks
Product: Red Hat Satellite Reporter: Jan Hutař <jhutar>
Component: Tasks PluginAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.2.0CC: aruzicka, bbuckingham, bkearney, dcaplan, egolov, inecas, jcallaha, ktordeur, pmoravec, pm-sat, psuriset, zhunting
Target Milestone: UnspecifiedKeywords: PrioBumpPM, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1446725 (view as bug list) Environment:
Last Closed: 2018-02-21 16:54:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1516651    
Bug Blocks:    

Description Jan Hutař 2016-08-25 12:05:27 UTC
Description of problem:
There is no way how to cancel task which is still scheduling its child tasks


Version-Release number of selected component (if applicable):
satellite-6.2.1-1.3.el7sat.noarch


How reproducible:
always


Steps to Reproduce:
1. Schedule errata upgrade task via katello-agent on large number of systems
   (say 10k)
2. It takes multiple hours to parent task to start all 10k child tasks


Actual results:
In this time you are not able to cancel parent task (e.g. when you have noticed something in your infrastructure is broken and needs to be fixed first)


Expected results:
It should be possible to cancel task even if it is still starting its childs

Comment 3 Adam Ruzicka 2016-11-30 11:48:56 UTC
Created redmine issue http://projects.theforeman.org/issues/17528 from this bug

Comment 4 Pradeep Kumar Surisetty 2017-01-27 12:35:34 UTC
*** Bug 1417180 has been marked as a duplicate of this bug. ***

Comment 5 Pavel Moravec 2017-01-27 12:43:47 UTC
This bug can be quite bothering for customers in the scenario:

- wanting to run a job on say 5k clients

- fired it with a typo / error and wanted to cancel it

- since sub-tasks are created with cadence less than 1 task per second (1task/sec seems upper limit), the user has to wait almost 2 hours to cancel the task; the time when the cancel will succeed can be estimated just with some tolerance (i.e. the job can be executed on a system _before_ some cancellation attempt succeeds)

Comment 8 Pavel Moravec 2017-01-30 13:44:47 UTC
(worth to test as well:

- launch a job invocation of >100 hosts with time span set to some higher value

- ensure then that individual dynflow tasks are picked up even during the phase of generating the foreman sub-tasks (i.e. dynflow picked up 1st foreman (sub)task while 100th foreman (sub)task has not been generated yet)

- to check that: 
  - open foreman task with the job execution, click to sub-tasks link, sort subtasks per statr time
  - open very oldest subtask, click to dynflow console, then to Execution history tab - "start execution" timestamp is the time when dynflow picked this job from foreman

Current behaviour:
- when time span is set, dynflow picks up all tasks _after_ the latest one is generated (and they are generated with cadency less than 1 task per second)

Expected behaviour:
- even with time span set, dynflow picks up tasks during foreman generates them

Comment 10 Satellite Program 2017-02-12 09:11:24 UTC
Upstream bug assigned to aruzicka

Comment 11 Satellite Program 2017-02-12 09:11:28 UTC
Upstream bug assigned to aruzicka

Comment 12 Satellite Program 2017-03-28 12:10:54 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17528 has been resolved.

Comment 15 Brad Buckingham 2017-10-17 12:35:54 UTC
*** Bug 1446725 has been marked as a duplicate of this bug. ***

Comment 16 Peter Ondrejka 2017-11-02 13:39:02 UTC
Testing in Sat 6.3 snap 22, when starting a remote job on 4000 hosts and subsequently canceling, the sub-tasks stop executing after n*100 blocks as expected, but the parent task remains in running status forever and hosts with sub-tasks not yet executed remain in state ?N/A. So putting back to ASSIGNED for further investigation.

Comment 18 Ivan Necas 2017-11-23 09:49:38 UTC
There is a related issue in remote execution tracked here as well https://bugzilla.redhat.com/show_bug.cgi?id=1516651 - resolving that BZ should also move this BZ to ON_QA state again.

Comment 19 Bryan Kearney 2018-01-03 13:25:47 UTC
Based on comment 18, I am moving this to ON_QA since https://bugzilla.redhat.com/show_bug.cgi?id=1516651 has been verified.

Comment 20 Peter Ondrejka 2018-01-03 15:02:01 UTC
Checked again in Sat 6.3 snap 30, when the job is canceled in progress, it finishes scheduling the current batch of hosts (100) and the rest of tasks is not started.

Comment 21 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336