Description of problem: While setting the mirroring policy to additive and "Retain package versions" to 1 works great for any normal repos, It does not work the same way with repos having modular metadata i.e. AppStream RPMs repo. Version-Release number of selected component (if applicable): Satellite 6.14 ( Snap 10 ) ( Tested by RH support ) Satellite 6.13 ( Tested and confirmed by the end-user ) How reproducible: 100% Steps to Reproduce: 1. Install any of the affected versions of satellite 2. Enable "Red Hat Enterprise Linux 8 for x86_64 - AppStream RPMs 8" repo and set the mirroring policy to additive and "Retain package versions" to 1 3. Sync the repo 4. Check the count of packages and MOdule streams synced and compare with the result of a sync without any modifications with "Retain package versions" option. Actual results: If not using: "Retain package versions" Packages 32339 Source RPMs 0 Errata 3127 Package Groups 59 Module Streams 695 If using: "Retain package versions" and it is set to 1 Packages 15023 Source RPMs 0 Errata 3127 Package Groups 59 Module Streams 695 During sync pulp does show this: ~~ Aug 8 15:11:03 vm206-40 pulpcore-worker-2[98989]: pulp [f894404a-7655-40a9-813d-d7adbfc2f0aa]: pulp_rpm.app.tasks.synchronizing:INFO: Excluding 17316 packages (duplicates, outdated or skipping was requested e.g. 'skip_types') Aug 8 15:11:04 vm206-40 pulpcore-api[98953]: pulp [f894404a-7655-40a9-813d-d7adbfc2f0aa]: - - [08/Aug/2023:09:41:04 +0000] "GET /pulp/api/v3/tasks/91c564b4-9dd3-45ec-aa46-cc3f986f6055/ HTTP/1.1" 200 1670 "-" "OpenAPI-Generator/3.22.4/ruby" ~~ But when you do a DB query or use hammer to list the packages: # hammer package list --repository "Red Hat Enterprise Linux 8 for x86_64 - AppStream RPMs 8" --product "Red Hat Enterprise Linux for x86_64" --organization RedHat --search "name = postgresql" | head ------|--------------------------------------------------------------|---------------------------------------------------------- ID | FILENAME | SOURCE RPM ------|--------------------------------------------------------------|---------------------------------------------------------- 13845 | postgresql-9.6.10-1.module+el8+2470+d1bafa0e.x86_64.rpm | postgresql-9.6.10-1.module+el8+2470+d1bafa0e.src.rpm 7904 | postgresql-9.6.20-1.module+el8.3.0+8938+7f0e88b6.x86_64.rpm | postgresql-9.6.20-1.module+el8.3.0+8938+7f0e88b6.src.rpm 6909 | postgresql-9.6.22-1.module+el8.4.0+11244+beebcf7e.x86_64.rpm | postgresql-9.6.22-1.module+el8.4.0+11244+beebcf7e.src.rpm 12860 | postgresql-10.6-1.module+el8+2469+5ecd5aae.x86_64.rpm | postgresql-10.6-1.module+el8+2469+5ecd5aae.src.rpm 9361 | postgresql-10.14-1.module+el8.2.0+7801+be0fed80.x86_64.rpm | postgresql-10.14-1.module+el8.2.0+7801+be0fed80.src.rpm 7906 | postgresql-10.15-1.module+el8.3.0+8944+1ca16b1f.x86_64.rpm | postgresql-10.15-1.module+el8.3.0+8944+1ca16b1f.src.rpm 7414 | postgresql-10.17-1.module+el8.4.0+11249+895597ab.x86_64.rpm | postgresql-10.17-1.module+el8.4.0+11249+895597ab.src.rpm As we can see There are three "postgresql-9.6" related packages present in different versions while they are part of same module stream . Also to be noted that It syncs all the Module Stream data and perhaps that is what results in this behavior here. Expected results: The expectation above would be that I would see just the latest most version for postgresl-9.6 in the list ( and the same for any other rpms ). "Retain package versions" should work on Appstream type repos as well. Additional info: NA
This isn't really a bug, unfortunately modular packages cannot safely be filtered out in the way that non-modular packages generally can, and so we deliberately ignore them when determining which packages to retain during the sync. It's a complicated subject, but I will try to explain a bit more detail: * These aren't exactly "postgresql-9.6" and "postgresql-10" packages as described, they're all just different versions of the "postgresql" package, so it's straightforwards that a naive analysis would throw out the 9.6.z packages, which would be undesirable. * We can't throw out modular packages without also excluding the module metadata too, because that would very very easily break everything. Or vice-versa. Accidentally getting rid of the modular metadata but missing a package could cause breakages. Overloading the package version retention feature to apply to modules also would not be great, so it would probably need to be it's own separate thing. * But, module versioning is not straightforwards, and I've been told that removing older versions of streams (that is, not "older streams", but "older versions of streams") would also be sketchy and potentially broken for other reasons, too, which is why we never tried do it. I cannot remember the exact details as the discussion was probably at least 2 years ago, but I got the impression it wasn't something we should touch for the time being at least. I can refresh on the details if needed. * "Module Stream 695" is a bit misleading, because what it refers to is "695 versions of module streams". That is, if you had "module foo, stream 1.0, version 1.23" which is then updated to "module foo, stream 1.0, version 1.24", that counts as two "module streams" in the same way that "module foo, stream 2.0, version 2.0" would. My personal feeling is that it's not ideal, but the cure is probably worse than the disease... The impact is relatively minor, and of course only the AppStream repo, and only about half of the packages in AppStream at that, and mostly applies just to RHEL 8. RHEL 9 is impacted to an even lesser degree due to the more conservative use of modules and the removal of "default streams", and RHEL 10 is very likely not to have modules at all. So the cost/benefit ratio of making the logic much more complex and potentially error prone doesn't seem very high? I'm open to hearing counterarguments, though.