Bug 2229963

Summary: The "Retain package versions" option partially works for Appstream repos or any such repos with module streams
Product: Red Hat Satellite Reporter: Sayan Das <saydas>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: NEW --- QA Contact: Satellite QE Team <sat-qe-bz-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.14.0CC: dalley, sajha
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sayan Das 2023-08-08 10:12:34 UTC
Description of problem:

While setting the mirroring policy to additive and "Retain package versions" to 1 works great for any normal repos, It does not work the same way with repos having modular metadata i.e. AppStream RPMs repo.


Version-Release number of selected component (if applicable):

Satellite 6.14 ( Snap 10 ) ( Tested by RH support )
Satellite 6.13 ( Tested and confirmed by the end-user )


How reproducible:

100%


Steps to Reproduce:

1. Install any of the affected versions of satellite
2. Enable "Red Hat Enterprise Linux 8 for x86_64 - AppStream RPMs 8" repo and set the mirroring policy to additive and "Retain package versions" to 1
3. Sync the repo
4. Check the count of packages and MOdule streams synced and compare with the result of a sync without any modifications with "Retain package versions" option.


Actual results:

If not using: "Retain package versions"

Packages	32339
Source RPMs	0
Errata	3127
Package Groups	59
Module Streams	695


If using: "Retain package versions" and it is set to 1

Packages	15023
Source RPMs	0
Errata	3127
Package Groups	59
Module Streams	695


During sync pulp does show this:
~~
Aug  8 15:11:03 vm206-40 pulpcore-worker-2[98989]: pulp [f894404a-7655-40a9-813d-d7adbfc2f0aa]: pulp_rpm.app.tasks.synchronizing:INFO: Excluding 17316 packages (duplicates, outdated or skipping was requested e.g. 'skip_types')
Aug  8 15:11:04 vm206-40 pulpcore-api[98953]: pulp [f894404a-7655-40a9-813d-d7adbfc2f0aa]:  - - [08/Aug/2023:09:41:04 +0000] "GET /pulp/api/v3/tasks/91c564b4-9dd3-45ec-aa46-cc3f986f6055/ HTTP/1.1" 200 1670 "-" "OpenAPI-Generator/3.22.4/ruby"
~~

But when you do a DB query or use hammer to list the packages:

# hammer package list --repository "Red Hat Enterprise Linux 8 for x86_64 - AppStream RPMs 8" --product "Red Hat Enterprise Linux for x86_64" --organization RedHat --search "name = postgresql" | head 
------|--------------------------------------------------------------|----------------------------------------------------------
ID    | FILENAME                                                     | SOURCE RPM                                               
------|--------------------------------------------------------------|----------------------------------------------------------
13845 | postgresql-9.6.10-1.module+el8+2470+d1bafa0e.x86_64.rpm      | postgresql-9.6.10-1.module+el8+2470+d1bafa0e.src.rpm     
7904  | postgresql-9.6.20-1.module+el8.3.0+8938+7f0e88b6.x86_64.rpm  | postgresql-9.6.20-1.module+el8.3.0+8938+7f0e88b6.src.rpm 
6909  | postgresql-9.6.22-1.module+el8.4.0+11244+beebcf7e.x86_64.rpm | postgresql-9.6.22-1.module+el8.4.0+11244+beebcf7e.src.rpm
12860 | postgresql-10.6-1.module+el8+2469+5ecd5aae.x86_64.rpm        | postgresql-10.6-1.module+el8+2469+5ecd5aae.src.rpm       
9361  | postgresql-10.14-1.module+el8.2.0+7801+be0fed80.x86_64.rpm   | postgresql-10.14-1.module+el8.2.0+7801+be0fed80.src.rpm  
7906  | postgresql-10.15-1.module+el8.3.0+8944+1ca16b1f.x86_64.rpm   | postgresql-10.15-1.module+el8.3.0+8944+1ca16b1f.src.rpm  
7414  | postgresql-10.17-1.module+el8.4.0+11249+895597ab.x86_64.rpm  | postgresql-10.17-1.module+el8.4.0+11249+895597ab.src.rpm 


As we can see There are three "postgresql-9.6" related packages present in different versions while they are part of same module stream .

Also to be noted that It syncs all the Module Stream data and perhaps that is what results in this behavior here. 

Expected results:

The expectation above would be that I would see just the latest most version for postgresl-9.6 in the list ( and the same for any other rpms ).

"Retain package versions" should work on Appstream type repos as well. 


Additional info:

NA

Comment 2 Daniel Alley 2023-08-08 14:39:55 UTC
This isn't really a bug, unfortunately modular packages cannot safely be filtered out in the way that non-modular packages generally can, and so we deliberately ignore them when determining which packages to retain during the sync.

It's a complicated subject, but I will try to explain a bit more detail:  

* These aren't exactly "postgresql-9.6" and "postgresql-10" packages as described, they're all just different versions of the "postgresql" package, so it's straightforwards that a naive analysis would throw out the 9.6.z packages, which would be undesirable.

* We can't throw out modular packages without also excluding the module metadata too, because that would very very easily break everything.  Or vice-versa.  Accidentally getting rid of the modular metadata but missing a package could cause breakages.  Overloading the package version retention feature to apply to modules also would not be great, so it would probably need to be it's own separate thing. 

* But, module versioning is not straightforwards, and I've been told that removing older versions of streams (that is, not "older streams", but "older versions of streams") would also be sketchy and potentially broken for other reasons, too, which is why we never tried do it.  I cannot remember the exact details as the discussion was probably at least 2 years ago, but I got the impression it wasn't something we should touch for the time being at least.  I can refresh on the details if needed.

* "Module Stream 695" is a bit misleading, because what it refers to is "695 versions of module streams".  That is, if you had "module foo, stream 1.0, version 1.23" which is then updated to "module foo, stream 1.0, version 1.24", that counts as two "module streams" in the same way that "module foo, stream 2.0, version 2.0" would.

My personal feeling is that it's not ideal, but the cure is probably worse than the disease... The impact is relatively minor, and of course only the AppStream repo, and only about half of the packages in AppStream at that, and mostly applies just to RHEL 8.  RHEL 9 is impacted to an even lesser degree due to the more conservative use of modules and the removal of "default streams", and RHEL 10 is very likely not to have modules at all.

So the cost/benefit ratio of making the logic much more complex and potentially error prone doesn't seem very high? I'm open to hearing counterarguments, though.