Bug 1899310

Summary: no-op repo sync leads to real-op repo publish in certain situations
Product: Red Hat Satellite Reporter: James Jeffers <jjeffers>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lai <ltran>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 6.7.0CC: bmbouter, dkliban, egolov, ipanova, jjansky, ktordeur, mmccune, pmoravec, rchan, satellite6-bugs, ttereshc, wpinheir
Target Milestone: 6.8.2Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Fixed In Version: pulp- Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1879126 Environment:
Last Closed: 2020-12-15 13:15:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
sync times verified none

Description James Jeffers 2020-11-18 20:54:21 UTC
+++ This bug was initially created as a clone of Bug #1879126 +++

Description of problem:
There are few scenarios leading to a redundant "real-op" publish(*) of a repo in pulp. 

(recall: each repo sync automatically triggers publish of the repo; if nothing has changed, the publish is no-op - BUT the decision "nothing has changed" wrongly detects "a change" even if the sync didnt update the repo at all)

Particular reproducer:
- republish repository metadata of a repo
  - this will forcefully publish the repo
- then invoke optimised repo sync
  - this wont sync anything new, BUT the repo will be re-published, redundantly

(alternate reproducer: do a full repo sync, followed by optimised/normal repo sync - again, the 2nd (no-op) sync triggers "real-op" publish)

The cause is here: https://github.com/pulp/pulp/blob/2-master/server/pulp/server/controllers/repository.py#L1201

Here, we compare previous and current publish override options. So if a new repo sync does not change anything but it was called with different (even empty) override config, the repo *will be re-published - since same_override=False and https://github.com/pulp/pulp/blob/2-master/server/pulp/server/controllers/repository.py#L1215 evals to False .

As a consequence, redundant repo is re-published (which can also trigger redundant sync of the repo to Capsule, after all).

Version-Release number of selected component (if applicable):
Sat 6.7

How reproducible:

Steps to Reproduce:
1. Have a fully synced repo in Sat
2. Do either action:
  - invoke Complete Sync
  - or Republish Repository metadata
3. Check the repo was re-published
4. Invoke a regular sync
5. Check the repo was republished

Standalone pulp minimalistic reproducer: call repo publish once with override_config\":{\"force_full\":true}, then without it (or with force_full:false) - even the 2nd publish will be a "real op".

Actual results:
5. Repo was republished

Expected results:
5. Repo isnt republished

Additional info:
I understand there can be legit situations where diff between previous and current override_config *must* result in re-publish. I just dont know them and the current logic is simply too strict..

--- Additional comment from  on 2020-09-17T14:44:25Z 

Some minimalistic patch to ignore _changes_ in force_full override config:

--- a/pulp/server/controllers/repository.py
+++ b/pulp/server/controllers/repository.py
@@ -1160,13 +1160,17 @@ def check_publish(repo_obj, dist_id, dis
         skip_for_predistributor = published_after_predistributor or not \
-    same_override = dist.last_override_config == config_override
-    if not same_override:
+    if not (dist.last_override_config == config_override):
         # Use raw pymongo not to fire the signal hander
+    # do not compare force_full from config override - this particular difference itself does not
+    # determine the need of repo publish
+    config_override = {k: v for k, v in config_override.items() if k != 'force_full'}
+    dist.last_override_config = {k: v for k, v in dist.last_override_config.items() if k != 'force_full'}
+    same_override = dist.last_override_config == config_override
     # Check if content has not changed since last publish and a predistributor is not defined.
     unchanged_content_and_no_predistributor = last_published and not last_updated and \
         not units_removed and not predistributor_id

(a review from pulp devels is welcomed, though..)

--- Additional comment from  on 2020-09-18T06:43:21Z 

Pavel, thanks for the patch. It looks reasonable to me.
Feel free to open the upstream PR. I'll file the upstream bug.

--- Additional comment from  on 2020-09-18T07:05:41Z 

The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

--- Additional comment from  on 2020-09-18T07:05:42Z 

The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

--- Additional comment from  on 2020-09-29T20:16:26Z 

(In reply to Tanya Tereshchenko from comment #2)
> Pavel, thanks for the patch. It looks reasonable to me.
> Feel free to open the upstream PR. I'll file the upstream bug.

Here you are :)


(majority of redundant repo publishes were in my case really due to config_override altering from {} to {force_full=False})

--- Additional comment from  on 2020-10-22T19:06:18Z 

The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

--- Additional comment from  on 2020-10-26T13:06:45Z 

The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

--- Additional comment from  on 2020-10-26T14:07:41Z 

All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

--- Additional comment from  on 2020-11-02T16:08:24Z 

The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

--- Additional comment from  on 2020-11-04T22:19:28Z 

*** Satellite 6.7.4 and 6.7.5 Hotfix Available ***

1) Download pulp-server- from this bugzilla to your Satellite

2) stop services:

satellite-maintain service stop

3) Install:

rpm -Uvh pulp-server-

4) restart:

satellite-maintain service start

5) resume operations

--- Additional comment from  on 2020-11-04T22:20:26Z 

Created attachment 1726745 [details]

Comment 5 Lai 2020-12-09 16:35:30 UTC
Steps to retest

1. Upload manifest
2. Enabled rhel7 repo (content -> repositories)
3. Sync repo
4. Once repo is sync, resync it
5. Content -> product -> rhel7 product -> rhel7 repo -> select action -> advanced sync.  choose Complete sync and sync
6. Once sync is completed, do a regular sync

3. Should take a long time
4. Should take seconds
5. Should take a long time
6. Should take seconds

3. Regular sync took 45 minutes
4. Took 23 seconds
5. Complete sync took 52 minutes
6. Took 23 seconds

Originally, after the complete sync took place and resyncing happens, it took a long time.  After the fix, the syncing took seconds to complete which is what step 6 indicates.

Marking issue as verified.  Verified on 6.8.2 snap 3.

Comment 6 Lai 2020-12-09 16:37:03 UTC
Created attachment 1737961 [details]
sync times verified

Comment 11 errata-xmlrpc 2020-12-15 13:15:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.8.2 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.