Description of problem: Every job should have a priority so that if there are enqueued jobs the new jobs with higher priority are executed first. For instance: 1- There are 100 reposync jobs currently running. There are another 50 jobs waiting to run because the capsules cannot run all of them concurrently. 2- Some errata is published and the customer wants to immediately apply it. The customer should be able to apply a *higher* priority to the "apply errata" jobs. Therefore, the reposync jobs that are queued will only run *after* the errata jobs have finished.
Clarification: The situation described in this RFE would only occur if the clients were applying errata utilizing the katello-agent and gofer which is part of the Pulp infrastructure. For Remote Execution based errata updates, there should be no blocking while content operations take place. I did some initial tests with a Satellite with 100 clients and having ~10 Content synchronization and publish tasks running at the same time. The REX jobs interleaved asynchronously with and completed before the content operations were completed. I did not see any blocking situations where the REX jobs were waiting in a queue behind the content jobs. I'd recommend any customer effected by scenarios outlined above utilize Remote Execution for errata application instead of the katello-agent based approach.
PM is aware of the prirority, I am removing the urgent flag.
Created redmine issue http://projects.theforeman.org/issues/20661 from this bug
Given this will require some larger changes, I'm removing this from 6.2.z proposed bugs. We can reconsider the target release once the issue if handled, but I would not recommend that at the current knowledge.
Will this improvement help in situation, when: - bulk REX job triggers another tasks (i.e. Katello::Host::Update or Katello::Host::UploadPackageProfile), practically one or two such tasks for each and every host executing the job - (i.e. assume the REX job contains installation of a package and reboot, than afaik triggers both steps) - cadency of new commands execution significantly slows down since the time the other tasks start to pop up - sorting out times of individual steps (RunHostsJob task created, RunHostJob task created, dynflow step started, ssh command executed,..), the biggest latency (cadency slowed down) does not yet affect creating the RunHostsJob task itself, but *does* affect RunHostsJob dynflow step - I mean that latest RunHostsJob task is started reasonably soon, but their dynflow steps are started in much much longer period of time Two examples of the _latest_ time of that type (see the biggest "step" is always between RunHostsJob.started_at and RunHostsJob.dynflow.started_at, that is what I wrote above): 02:01:03 RunHostsJob.started_at.txt 02:01:03 RunHostsJob.world.start.txt 02:22:23 RunHostsJob.dynflow.started_at.txt 02:22:24 RunHostJob.started_at.txt 02:22:25 RunHostJob.world.start.txt 02:22:53 RunHostJob.dynflow.started_at.txt 02:23:21 RunHostJob.job.beginning.txt 02:30:58 RunHostJob.job.ending.txt 02:31:01 RunHostJob.dynflow.ended_at.txt 02:31:02 RunHostJob.ended_at.txt 02:31:02 RunHostJob.world.finish.txt 02:31:02 RunHostsJob.dynflow.ended_at.txt 02:31:02 RunHostsJob.ended_at.txt 02:31:02 RunHostsJob.world.finish.txt 08:13:42 RunHostsJob.started_at.txt 08:13:44 RunHostsJob.world.start.txt 08:27:53 RunHostJob.started_at.txt 08:27:53 RunHostsJob.dynflow.started_at.txt 08:27:57 RunHostJob.world.start.txt 08:28:31 RunHostJob.dynflow.started_at.txt 08:28:32 RunHostJob.job.beginning.txt 08:33:34 RunHostJob.job.ending.txt 08:35:46 RunHostJob.dynflow.ended_at.txt 08:37:18 RunHostJob.ended_at.txt 08:37:18 RunHostJob.world.finish.txt 08:39:35 RunHostsJob.dynflow.ended_at.txt 08:39:35 RunHostsJob.ended_at.txt 08:39:35 RunHostsJob.world.finish.txt
Yes it should
Upstream bug assigned to inecas
When there are many tasks pending, hammer ping to foreman-tasks responds in 1000+ ms time. Does it make sense this improvement covers also this response time? Rationale: - ping is a simple status check, it wont make the reacting thread/worker busy - ping response time should be low (or not? high response time can show the service is busy..)?
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20661 has been resolved.
Verified in Satellite 6.4 Snap 22 Performed setup steps for remote execution against hosts. Queued up 10 jobs to run at a specific time. Once that time hit, I navigated to <sathost>/foreman_tasks/dynflow/worlds/execution_status On that page, you can see that remote_execution has it's own dedicated queue within dynflow to execute tasks. See attached screenshot for verification.
Created attachment 1484835 [details] verification screenshot
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927