Bug 1386283 - [RFE] Job invocations should have a priority
Summary: [RFE] Job invocations should have a priority
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.1.9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: 6.4.0
Assignee: Ivan Necas
QA Contact: jcallaha
satellite6-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 14:36 UTC by Daniel Lobato Garcia
Modified: 2021-12-10 14:54 UTC (History)
14 users (show)

Fixed In Version: tfm-rubygem-foreman_remote_execution-1.5.2-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-16 15:26:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
verification screenshot (88.51 KB, image/png)
2018-09-19 15:51 UTC, jcallaha
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 20661 0 High Closed Job invocations should have a priority 2020-02-06 22:59:15 UTC
Red Hat Bugzilla 1561885 0 unspecified CLOSED Package install using Remote Execution is slow 2021-02-22 00:41:40 UTC
Red Hat Issue Tracker SAT-6891 0 None None None 2021-12-10 14:54:47 UTC
Red Hat Product Errata RHSA-2018:2927 0 None None None 2018-10-16 15:27:05 UTC

Internal Links: 1561885 1721679 1814424

Description Daniel Lobato Garcia 2016-10-18 14:36:39 UTC
Description of problem:

Every job should have a priority so that if there are enqueued jobs the new jobs with higher priority are executed first. 

For instance:

1- There are 100 reposync jobs currently running. There are another 50 jobs waiting to run because the capsules cannot run all of them concurrently.

2- Some errata is published and the customer wants to immediately apply it. The customer should be able to apply a *higher* priority to the "apply errata" jobs. Therefore, the reposync jobs that are queued will only run *after* the errata jobs have finished.

Comment 5 Mike McCune 2017-06-28 21:29:26 UTC
Clarification:

The situation described in this RFE would only occur if the clients were applying errata utilizing the katello-agent and gofer which is part of the Pulp infrastructure. For Remote Execution based errata updates, there should be no blocking while content operations take place. I did some initial tests with a Satellite with 100 clients and having ~10 Content synchronization and publish tasks running at the same time.

The REX jobs interleaved asynchronously with and completed before the content operations were completed. I did not see any blocking situations where the REX jobs were waiting in a queue behind the content jobs.

I'd recommend any customer effected by scenarios outlined above utilize Remote Execution for errata application instead of the katello-agent based approach.

Comment 6 Bryan Kearney 2017-08-11 13:14:16 UTC
PM is aware of the prirority, I am removing the urgent flag.

Comment 7 Ivan Necas 2017-08-21 09:09:16 UTC
Created redmine issue http://projects.theforeman.org/issues/20661 from this bug

Comment 8 Ivan Necas 2017-08-21 09:11:24 UTC
Given this will require some larger changes, I'm removing this from 6.2.z proposed bugs. We can reconsider the target release once the issue if handled, but I would not recommend that at the current knowledge.

Comment 9 Pavel Moravec 2018-02-26 14:37:08 UTC
Will this improvement help in situation, when:

- bulk REX job triggers another tasks (i.e. Katello::Host::Update or Katello::Host::UploadPackageProfile), practically one or two such tasks for each and every host executing the job
  - (i.e. assume the REX job contains installation of a package and reboot, than afaik triggers both steps)

- cadency of new commands execution significantly slows down since the time the other tasks start to pop up

- sorting out times of individual steps (RunHostsJob task created, RunHostJob task created, dynflow step started, ssh command executed,..), the biggest latency (cadency slowed down) does not yet affect creating the RunHostsJob task itself, but *does* affect RunHostsJob dynflow step
  - I mean that latest RunHostsJob task is started reasonably soon, but their dynflow steps are started in much much longer period of time

Two examples of the _latest_ time of that type (see the biggest "step" is always between RunHostsJob.started_at and RunHostsJob.dynflow.started_at, that is what I wrote above):

02:01:03 RunHostsJob.started_at.txt
02:01:03 RunHostsJob.world.start.txt
02:22:23 RunHostsJob.dynflow.started_at.txt
02:22:24 RunHostJob.started_at.txt
02:22:25 RunHostJob.world.start.txt
02:22:53 RunHostJob.dynflow.started_at.txt
02:23:21 RunHostJob.job.beginning.txt
02:30:58 RunHostJob.job.ending.txt
02:31:01 RunHostJob.dynflow.ended_at.txt
02:31:02 RunHostJob.ended_at.txt
02:31:02 RunHostJob.world.finish.txt
02:31:02 RunHostsJob.dynflow.ended_at.txt
02:31:02 RunHostsJob.ended_at.txt
02:31:02 RunHostsJob.world.finish.txt

08:13:42 RunHostsJob.started_at.txt
08:13:44 RunHostsJob.world.start.txt
08:27:53 RunHostJob.started_at.txt
08:27:53 RunHostsJob.dynflow.started_at.txt
08:27:57 RunHostJob.world.start.txt
08:28:31 RunHostJob.dynflow.started_at.txt
08:28:32 RunHostJob.job.beginning.txt
08:33:34 RunHostJob.job.ending.txt
08:35:46 RunHostJob.dynflow.ended_at.txt
08:37:18 RunHostJob.ended_at.txt
08:37:18 RunHostJob.world.finish.txt
08:39:35 RunHostsJob.dynflow.ended_at.txt
08:39:35 RunHostsJob.ended_at.txt
08:39:35 RunHostsJob.world.finish.txt

Comment 10 Ivan Necas 2018-02-26 16:32:38 UTC
Yes it should

Comment 11 Satellite Program 2018-03-13 22:17:37 UTC
Upstream bug assigned to inecas

Comment 12 Satellite Program 2018-03-13 22:17:41 UTC
Upstream bug assigned to inecas

Comment 13 Pavel Moravec 2018-04-05 12:42:24 UTC
When there are many tasks pending, hammer ping to foreman-tasks responds in 1000+ ms time. Does it make sense this improvement covers also this response time? 

Rationale:
- ping is a simple status check, it wont make the reacting thread/worker busy
- ping response time should be low (or not? high response time can show the service is busy..)?

Comment 14 Satellite Program 2018-04-25 16:16:32 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20661 has been resolved.

Comment 16 jcallaha 2018-09-19 15:51:08 UTC
Verified in Satellite 6.4 Snap 22

Performed setup steps for remote execution against hosts.

Queued up 10 jobs to run at a specific time.

Once that time hit, I navigated to <sathost>/foreman_tasks/dynflow/worlds/execution_status

On that page, you can see that remote_execution has it's own dedicated queue within dynflow to execute tasks.

See attached screenshot for verification.

Comment 17 jcallaha 2018-09-19 15:51:32 UTC
Created attachment 1484835 [details]
verification screenshot

Comment 19 errata-xmlrpc 2018-10-16 15:26:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927


Note You need to log in before you can comment on or make changes to this bug.