Bug 1844634

Summary: [RFE] publish task spawned from sync task should be executed immediately, not put to the end of resource_manager queue
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: RepositoriesAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED NOTABUG QA Contact: Cole Higgins <chiggins>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.7.0CC: ahumbe, dalley, dkliban, dsynk, jhutar, jsherril, ktordeur, ltran, ttereshc
Target Milestone: UnspecifiedKeywords: FutureFeature
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-08 12:55:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2020-06-05 20:27:41 UTC
Description of problem:
I am suggesting an improvement in scheduling pulp tasks. Not sure if/how is it applicable to pulp-3, but in pulp-2, and in the way how katello requests repo sync task, the task automatically spawns a publish task in not ideal way. See particular example:

- multiple repos are synced concurrently (imagine e.g. a CV publish/promote, or Caps sync, or a Sync plan or similar), say there are 10 such tasks
- there are say 4 pulp workers, meaning 6 sync tasks are kept in resource_manager in a backlog
- the very first completed sync task spawns - at the _end_ of the sync - a publish repo task (*)
- this new task is added to the end of the resource_manager's queue
- other sync tasks are processed, and only then the first publish task gets to a pulp worker

We should re-order the tasks execution, such that the publish task is executed immediately, and by the same worker, after the sync task. Rationale:

- a repo with a new content will be available to clients faster. Other repos wont be affected, as they will be re-published at the same time (the re-ordering of tasks happen *before* the other publish tasks, right?) - a kind of optimisation
- also, dynflow waits for completion of whole sync+publish pair of tasks; making this period shorter saves some polling from dynflow to pulp, as well as less concurrent dynflow steps will be running at some time


(*) here, this step has another inefficiency; the publish task is spawned also when no new content has been synced/changed; this information is know to the sync task, but a no-op publish task is spawned, that will complete in zero time with 'Skipped: Repository content has not changed since last publish.'. sync task should prevent spawning such no-op publish, as an optimization.


Version-Release number of selected component (if applicable):
Sat 6.7.0


How reproducible:
100%


Steps to Reproduce:
0. Below steps can be applied to any pulp server. Satellite, or Capsule.
1. Optionally, to see the behaviour more straightforwardly, artificially set just one pulp worker: e.g. in /etc/default/pulp_workers, set PULP_CONCURRENCY=1 and restart pulp services.
2. Trigger several repo synces - depending where you experiment, try a Sync plan (on Sat) or promote a CV with many repos (on Sat), or invoke a Caps sync to a new Caps
3. Once the bulk action completes, find out (e.g. by expanding dynflow steps of the foreman task) the sequence of sync+publish tasks


Actual results:
3. shows me:
pulp:action:sync     repo_id:8c1fb341-6b7b-43a1-bee8-28c1fc58ad8e     started:'2020-06-05T16:21:57Z'     completed:'2020-06-05T16:22:06Z' 
pulp:action:publish     repo_id:8c1fb341-6b7b-43a1-bee8-28c1fc58ad8e     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:sync     repo_id:a69cea2d-7ba4-4cff-85e4-d96ad30ca5ca     started:'2020-06-05T16:22:06Z'     completed:'2020-06-05T16:22:14Z' 
pulp:action:publish     repo_id:a69cea2d-7ba4-4cff-85e4-d96ad30ca5ca     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:sync     repo_id:4e20a22d-81a5-4b78-ad4d-b8f9ff3df953     started:'2020-06-05T16:22:15Z'     completed:'2020-06-05T16:22:21Z' 
pulp:action:publish     repo_id:4e20a22d-81a5-4b78-ad4d-b8f9ff3df953     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:sync     repo_id:8c1e1c97-ecf2-4885-b7f8-8b508be0b837     started:'2020-06-05T16:22:21Z'     completed:'2020-06-05T16:22:27Z' 
pulp:action:publish     repo_id:8c1e1c97-ecf2-4885-b7f8-8b508be0b837     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:sync     repo_id:0ef874dc-bf93-4e8f-a605-b2582d7a4929     started:'2020-06-05T16:21:50Z'     completed:'2020-06-05T16:21:56Z' 
pulp:action:publish     repo_id:0ef874dc-bf93-4e8f-a605-b2582d7a4929     started:'2020-06-05T16:23:20Z'     completed:'2020-06-05T16:23:20Z' 
pulp:action:sync     repo_id:5eb348a7-5e0c-443b-a46f-1a15d685af5f     started:'2020-06-05T16:22:34Z'     completed:'2020-06-05T16:22:40Z' 
pulp:action:publish     repo_id:5eb348a7-5e0c-443b-a46f-1a15d685af5f     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:sync     repo_id:19ab5ba9-b941-41d0-bb82-9370e4f3f7a5     started:'2020-06-05T16:22:47Z'     completed:'2020-06-05T16:22:55Z' 
pulp:action:publish     repo_id:19ab5ba9-b941-41d0-bb82-9370e4f3f7a5     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 
pulp:action:sync     repo_id:574bb92b-fc6a-4d49-afb8-b4d903155bc3     started:'2020-06-05T16:22:27Z'     completed:'2020-06-05T16:22:33Z' 
pulp:action:publish     repo_id:574bb92b-fc6a-4d49-afb8-b4d903155bc3     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:sync     repo_id:7160dfcf-9025-4d95-ac1f-14b97829c5d3     started:'2020-06-05T16:22:40Z'     completed:'2020-06-05T16:22:46Z' 
pulp:action:publish     repo_id:7160dfcf-9025-4d95-ac1f-14b97829c5d3     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:sync     repo_id:42ffcaa8-4979-4d07-9538-221604ad8522     started:'2020-06-05T16:22:55Z'     completed:'2020-06-05T16:23:01Z' 
pulp:action:publish     repo_id:42ffcaa8-4979-4d07-9538-221604ad8522     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 
pulp:action:sync     repo_id:4bf21816-6c63-4444-8e65-ad56a53201ff     started:'2020-06-05T16:23:02Z'     completed:'2020-06-05T16:23:20Z' 
pulp:action:publish     repo_id:4bf21816-6c63-4444-8e65-ad56a53201ff     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 

when I sort it by "started" column (sort -nrk3):
pulp:action:sync     repo_id:a69cea2d-7ba4-4cff-85e4-d96ad30ca5ca     started:'2020-06-05T16:22:06Z'     completed:'2020-06-05T16:22:14Z' 
pulp:action:sync     repo_id:8c1fb341-6b7b-43a1-bee8-28c1fc58ad8e     started:'2020-06-05T16:21:57Z'     completed:'2020-06-05T16:22:06Z' 
pulp:action:sync     repo_id:8c1e1c97-ecf2-4885-b7f8-8b508be0b837     started:'2020-06-05T16:22:21Z'     completed:'2020-06-05T16:22:27Z' 
pulp:action:sync     repo_id:7160dfcf-9025-4d95-ac1f-14b97829c5d3     started:'2020-06-05T16:22:40Z'     completed:'2020-06-05T16:22:46Z' 
pulp:action:sync     repo_id:5eb348a7-5e0c-443b-a46f-1a15d685af5f     started:'2020-06-05T16:22:34Z'     completed:'2020-06-05T16:22:40Z' 
pulp:action:sync     repo_id:574bb92b-fc6a-4d49-afb8-b4d903155bc3     started:'2020-06-05T16:22:27Z'     completed:'2020-06-05T16:22:33Z' 
pulp:action:sync     repo_id:4e20a22d-81a5-4b78-ad4d-b8f9ff3df953     started:'2020-06-05T16:22:15Z'     completed:'2020-06-05T16:22:21Z' 
pulp:action:sync     repo_id:4bf21816-6c63-4444-8e65-ad56a53201ff     started:'2020-06-05T16:23:02Z'     completed:'2020-06-05T16:23:20Z' 
pulp:action:sync     repo_id:42ffcaa8-4979-4d07-9538-221604ad8522     started:'2020-06-05T16:22:55Z'     completed:'2020-06-05T16:23:01Z' 
pulp:action:sync     repo_id:19ab5ba9-b941-41d0-bb82-9370e4f3f7a5     started:'2020-06-05T16:22:47Z'     completed:'2020-06-05T16:22:55Z' 
pulp:action:sync     repo_id:0ef874dc-bf93-4e8f-a605-b2582d7a4929     started:'2020-06-05T16:21:50Z'     completed:'2020-06-05T16:21:56Z' 
pulp:action:publish     repo_id:a69cea2d-7ba4-4cff-85e4-d96ad30ca5ca     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:publish     repo_id:8c1fb341-6b7b-43a1-bee8-28c1fc58ad8e     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:publish     repo_id:8c1e1c97-ecf2-4885-b7f8-8b508be0b837     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:publish     repo_id:7160dfcf-9025-4d95-ac1f-14b97829c5d3     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:publish     repo_id:5eb348a7-5e0c-443b-a46f-1a15d685af5f     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:publish     repo_id:574bb92b-fc6a-4d49-afb8-b4d903155bc3     started:'2020-06-05T16:23:22Z'     completed:'2020-06-05T16:23:22Z' 
pulp:action:publish     repo_id:4e20a22d-81a5-4b78-ad4d-b8f9ff3df953     started:'2020-06-05T16:23:21Z'     completed:'2020-06-05T16:23:21Z' 
pulp:action:publish     repo_id:4bf21816-6c63-4444-8e65-ad56a53201ff     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 
pulp:action:publish     repo_id:42ffcaa8-4979-4d07-9538-221604ad8522     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 
pulp:action:publish     repo_id:19ab5ba9-b941-41d0-bb82-9370e4f3f7a5     started:'2020-06-05T16:23:23Z'     completed:'2020-06-05T16:23:23Z' 
pulp:action:publish     repo_id:0ef874dc-bf93-4e8f-a605-b2582d7a4929     started:'2020-06-05T16:23:20Z'     completed:'2020-06-05T16:23:20Z' 


See that all repos were first synced, and even then the very-first synced repo was published. Meantime, old content was still available to consumers.


Expected results:
A publish follows just after a sync of the same repo.


Additional info:

Comment 1 Brad Buckingham 2020-06-08 14:27:06 UTC
Hi Dennis,

Would this still be applicable for pulp 3 or only pulp 2?

Comment 3 Tanya Tereshchenko 2021-04-19 17:37:15 UTC
Hi Pavel,

The details might not be entirely relevant for Pulp 3 but currently it would have the same issue in terms of a sequence in which tasks are performed.
Pulp 3 does not have auto publish functionality, it will have it soon though.
Katello triggers publication creation after pulp sync is done, I believe.

There is some tasking system redesign is going on at the moment + auto-publish is just being added.
I'll keep this BZ open, to keep the workflow you described in mind. And we'll see if we could improve something in this are.

Thanks!

Comment 4 Tanya Tereshchenko 2021-04-21 14:54:59 UTC
Pavel, the new auto-publish/distribute feature in Pulp 3 upstream makes it possible to perform Pulp's sync/publish inside one task.
Currently, Katello doesn't use this feature, so as of now, Satellite will behave in a simialr way as before.
We'll see if there are any objections/obstacles to start using this feature on the Katello side.

Comment 5 Pavel Moravec 2021-04-21 20:18:25 UTC
(In reply to Tanya Tereshchenko from comment #4)
> Pavel, the new auto-publish/distribute feature in Pulp 3 upstream makes it
> possible to perform Pulp's sync/publish inside one task.
> Currently, Katello doesn't use this feature, so as of now, Satellite will
> behave in a simialr way as before.
> We'll see if there are any objections/obstacles to start using this feature
> on the Katello side.

This sounds promising. Thanks for the review and feedback!

Comment 6 Tanya Tereshchenko 2021-06-04 20:10:28 UTC
The desired behaviour is available in the upstream releases of pulp_rpm 3.12+ when using auto-publish/distribute feature. 
In such case, Pulp sync and publish run inside one task.

Moving to Katello/Repositories component to consider using the feature in the future.

Comment 7 Daniel Alley 2021-09-24 17:10:07 UTC
Auto-publish would also (probably) simplify Katello's codepaths, as for mirror repos you expect a publication to be created during the sync, but for non-mirror sync you do not, and have to trigger it yourselves.

Letting non-mirrored repos be auto-published would mean that either way you can expect the publication to be there.

But there's probably more that Pulp can do to make the workflows streamlined, so as you start looking into this definitely let us know if you have any pain points.

Comment 8 Brad Buckingham 2022-09-02 20:25:18 UTC
Upon review of our valid but aging backlog the Satellite Team has concluded that this Bugzilla does not meet the criteria for a resolution in the near term, and are planning to close in a month. This message may be a repeat of a previous update and the bug is again being considered to be closed. If you have any concerns about this, please contact your Red Hat Account team.  Thank you.

Comment 9 Brad Buckingham 2022-09-05 22:59:20 UTC
Upon review of our valid but aging backlog the Satellite Team has concluded that this Bugzilla does not meet the criteria for a resolution in the near term, and are planning to close in a month. This message may be a repeat of a previous update and the bug is again being considered to be closed. If you have any concerns about this, please contact your Red Hat Account team.  Thank you.

Comment 10 Daniel Alley 2022-09-08 12:55:28 UTC
The task scheduling is improved in Pulp 3, so arguably this isn't much of an issue anymore regardless.