1386283 – [RFE] Job invocations should have a priority

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1386283 - [RFE] Job invocations should have a priority

Summary: [RFE] Job invocations should have a priority

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Remote Execution
Sub Component:
Version:	6.1.9
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	6.4.0
Assignee:	Ivan Necas
QA Contact:	jcallaha
Docs Contact:	satellite6-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-18 14:36 UTC by Daniel Lobato Garcia
Modified:	2021-12-10 14:54 UTC (History)
CC List:	14 users (show)
Fixed In Version:	tfm-rubygem-foreman_remote_execution-1.5.2-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-16 15:26:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
verification screenshot (88.51 KB, image/png) 2018-09-19 15:51 UTC, jcallaha	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	20661	High	Closed	Job invocations should have a priority	2020-02-06 22:59:15 UTC
Red Hat Bugzilla	1561885	unspecified	CLOSED	Package install using Remote Execution is slow	2021-02-22 00:41:40 UTC
Red Hat Issue Tracker	SAT-6891	None	None	None	2021-12-10 14:54:47 UTC
Red Hat Product Errata	RHSA-2018:2927	None	None	None	2018-10-16 15:27:05 UTC

Internal Links: 1561885 1721679 1814424

Description Daniel Lobato Garcia 2016-10-18 14:36:39 UTC

Description of problem:

Every job should have a priority so that if there are enqueued jobs the new jobs with higher priority are executed first. 

For instance:

1- There are 100 reposync jobs currently running. There are another 50 jobs waiting to run because the capsules cannot run all of them concurrently.

2- Some errata is published and the customer wants to immediately apply it. The customer should be able to apply a *higher* priority to the "apply errata" jobs. Therefore, the reposync jobs that are queued will only run *after* the errata jobs have finished.

Comment 5 Mike McCune 2017-06-28 21:29:26 UTC

Clarification:

The situation described in this RFE would only occur if the clients were applying errata utilizing the katello-agent and gofer which is part of the Pulp infrastructure. For Remote Execution based errata updates, there should be no blocking while content operations take place. I did some initial tests with a Satellite with 100 clients and having ~10 Content synchronization and publish tasks running at the same time.

The REX jobs interleaved asynchronously with and completed before the content operations were completed. I did not see any blocking situations where the REX jobs were waiting in a queue behind the content jobs.

I'd recommend any customer effected by scenarios outlined above utilize Remote Execution for errata application instead of the katello-agent based approach.

Comment 6 Bryan Kearney 2017-08-11 13:14:16 UTC

PM is aware of the prirority, I am removing the urgent flag.

Comment 7 Ivan Necas 2017-08-21 09:09:16 UTC

Created redmine issue http://projects.theforeman.org/issues/20661 from this bug

Comment 8 Ivan Necas 2017-08-21 09:11:24 UTC

Given this will require some larger changes, I'm removing this from 6.2.z proposed bugs. We can reconsider the target release once the issue if handled, but I would not recommend that at the current knowledge.

Comment 9 Pavel Moravec 2018-02-26 14:37:08 UTC

Will this improvement help in situation, when:

- bulk REX job triggers another tasks (i.e. Katello::Host::Update or Katello::Host::UploadPackageProfile), practically one or two such tasks for each and every host executing the job
  - (i.e. assume the REX job contains installation of a package and reboot, than afaik triggers both steps)

- cadency of new commands execution significantly slows down since the time the other tasks start to pop up

- sorting out times of individual steps (RunHostsJob task created, RunHostJob task created, dynflow step started, ssh command executed,..), the biggest latency (cadency slowed down) does not yet affect creating the RunHostsJob task itself, but *does* affect RunHostsJob dynflow step
  - I mean that latest RunHostsJob task is started reasonably soon, but their dynflow steps are started in much much longer period of time

Two examples of the _latest_ time of that type (see the biggest "step" is always between RunHostsJob.started_at and RunHostsJob.dynflow.started_at, that is what I wrote above):

02:01:03 RunHostsJob.started_at.txt
02:01:03 RunHostsJob.world.start.txt
02:22:23 RunHostsJob.dynflow.started_at.txt
02:22:24 RunHostJob.started_at.txt
02:22:25 RunHostJob.world.start.txt
02:22:53 RunHostJob.dynflow.started_at.txt
02:23:21 RunHostJob.job.beginning.txt
02:30:58 RunHostJob.job.ending.txt
02:31:01 RunHostJob.dynflow.ended_at.txt
02:31:02 RunHostJob.ended_at.txt
02:31:02 RunHostJob.world.finish.txt
02:31:02 RunHostsJob.dynflow.ended_at.txt
02:31:02 RunHostsJob.ended_at.txt
02:31:02 RunHostsJob.world.finish.txt

08:13:42 RunHostsJob.started_at.txt
08:13:44 RunHostsJob.world.start.txt
08:27:53 RunHostJob.started_at.txt
08:27:53 RunHostsJob.dynflow.started_at.txt
08:27:57 RunHostJob.world.start.txt
08:28:31 RunHostJob.dynflow.started_at.txt
08:28:32 RunHostJob.job.beginning.txt
08:33:34 RunHostJob.job.ending.txt
08:35:46 RunHostJob.dynflow.ended_at.txt
08:37:18 RunHostJob.ended_at.txt
08:37:18 RunHostJob.world.finish.txt
08:39:35 RunHostsJob.dynflow.ended_at.txt
08:39:35 RunHostsJob.ended_at.txt
08:39:35 RunHostsJob.world.finish.txt

Comment 10 Ivan Necas 2018-02-26 16:32:38 UTC

Yes it should

Comment 11 Satellite Program 2018-03-13 22:17:37 UTC

Upstream bug assigned to inecas

Comment 12 Satellite Program 2018-03-13 22:17:41 UTC

Upstream bug assigned to inecas

Comment 13 Pavel Moravec 2018-04-05 12:42:24 UTC

When there are many tasks pending, hammer ping to foreman-tasks responds in 1000+ ms time. Does it make sense this improvement covers also this response time? 

Rationale:
- ping is a simple status check, it wont make the reacting thread/worker busy
- ping response time should be low (or not? high response time can show the service is busy..)?

Comment 14 Satellite Program 2018-04-25 16:16:32 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20661 has been resolved.

Comment 16 jcallaha 2018-09-19 15:51:08 UTC

Verified in Satellite 6.4 Snap 22

Performed setup steps for remote execution against hosts.

Queued up 10 jobs to run at a specific time.

Once that time hit, I navigated to <sathost>/foreman_tasks/dynflow/worlds/execution_status

On that page, you can see that remote_execution has it's own dedicated queue within dynflow to execute tasks.

See attached screenshot for verification.

Comment 17 jcallaha 2018-09-19 15:51:32 UTC

Created attachment 1484835 [details]
verification screenshot

Comment 19 errata-xmlrpc 2018-10-16 15:26:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927

Note You need to log in before you can comment on or make changes to this bug.