Bug 1903367
| Summary: | Publishing a new Content View version brings in old metadata files from multiple previous versions; regenerating CV metadata then fixes it | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pablo Hess <phess> | ||||
| Component: | Repositories | Assignee: | Justin Sherrill <jsherril> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cole Higgins <chiggins> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.8.0 | CC: | aeladawy, ahumbe, dsynk, grbrown, gscarbor, ipanova, janarula, jeanbaptiste.dancre, jjeffers, jkrajice, jsherril, juwatts, kkinge, mfalz, mkalyat, mmccune, musman, pdudley, satellite6-bugs, saydas, sbible, sboyron, smajumda, sraut, therman, ttereshc, vdeshpan, wpinheir, zhunting | ||||
| Target Milestone: | 6.9.5 | Keywords: | PrioBumpGSS, Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | foreman-installer-2.3.1.19-1 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-08-31 12:04:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Pablo Hess
2020-12-01 22:14:50 UTC
Script that may safely be run on any fully-published repository that will remove any repodata files that are not referenced in repomd.xml: # ( grep '^<data type' repomd.xml | fgrep '.xml' | cut -f3 -d= |cut -f2 -d '"' | cut -f2 -d/ ; ls *-*.xml* ) | sort | uniq -u | xargs rm This is an alternative to re-publishing metadata for the CV that will also work fine to reduce storage space consumption on a published repository under /var/lib/pulp/. Here is one confirmed permanent solution: remove the unnecessary "duplicated" metadata files from the root repository yum_distributor dir. From then on, new CV versions containing this repository will no longer contain the "duplicates" and will behave as usual and expected. The root repository dir at /var/lib/pulp/published/yum/master/yum_distributor/e463a3be-9beb-4b58-954b-3c70b4d76ff9/1606714503.72/repodata contains those "duplicated" metadata files: [root@sat68a repodata]# ls /var/lib/pulp/published/yum/master/yum_distributor/e463a3be-9beb-4b58-954b-3c70b4d76ff9/1606714503.72/repodata/ 170d2873c8c64bf2284ddf4cfd286b1a8d00e745bf059697ae0cc1a80595b27b-primary.xml.gz 29b35070e9e5e0ce665634a6695657dd2c0e90f984ecf9edc9ae0c2514317609-primary.xml.gz 50cfa17ec516642f998f9b4a1f3fbd69c7ae2a801725f74c47dc262f908a4bc0-filelists.xml.gz 81a50a950c81d298ed82532f06ff8e77b5c2baae6f614b2b7954ed52d2cfe217-other.xml.gz 887f98b08fd5a31c2f862b5a813212c2edacb7b71025f31739bdae1b056c6a58-other.xml.gz 8d4daf96-3bdc-41fd-b5ad-7ab37ef3cd58 9f9055ce8ec3f81556b037756b9db2646fe9a85c2b9edf607dd0305c7edb96d8-updateinfo.xml.gz af22d713b7535abba900ee1d1163e04fdaca3aa5cbc80e631032553cb3afaa2e-filelists.xml.gz b468996a0064ff1f778008a88e75ebda41dbb0e9067243c7fb83aad3bd8ea652-updateinfo.xml.gz c716c2a27a0c069aa95fc982573b552da9089b870cb1534b412704ba4e66ca91-comps.xml repomd.xml A published CV's yum_distributor dir that contains this repository also contains the "duplicates": [root@sat68a repodata]# ls/var/lib/pulp/published/yum/master/yum_distributor/1-cv_test_bz-v1_0-e463a3be-9beb-4b58-954b-3c70b4d76ff9/1606920752.03/repodata/ 170d2873c8c64bf2284ddf4cfd286b1a8d00e745bf059697ae0cc1a80595b27b-primary.xml.gz 29b35070e9e5e0ce665634a6695657dd2c0e90f984ecf9edc9ae0c2514317609-primary.xml.gz 50cfa17ec516642f998f9b4a1f3fbd69c7ae2a801725f74c47dc262f908a4bc0-filelists.xml.gz 81a50a950c81d298ed82532f06ff8e77b5c2baae6f614b2b7954ed52d2cfe217-other.xml.gz 887f98b08fd5a31c2f862b5a813212c2edacb7b71025f31739bdae1b056c6a58-other.xml.gz 8d4daf96-3bdc-41fd-b5ad-7ab37ef3cd58 9f9055ce8ec3f81556b037756b9db2646fe9a85c2b9edf607dd0305c7edb96d8-updateinfo.xml.gz af22d713b7535abba900ee1d1163e04fdaca3aa5cbc80e631032553cb3afaa2e-filelists.xml.gz b468996a0064ff1f778008a88e75ebda41dbb0e9067243c7fb83aad3bd8ea652-updateinfo.xml.gz c716c2a27a0c069aa95fc982573b552da9089b870cb1534b412704ba4e66ca91-comps.xml repomd.xml So I'll remove all the unnecessary "duplicates" i.e. metadata files that are not referenced by repomd.xml: # ( grep '^<data type' repomd.xml | fgrep '.xml' | cut -f3 -d= |cut -f2 -d '"' | cut -f2 -d/ ; ls *-*.xml* ) | sort | uniq -u | xargs rm Then I'll publish a new version of my CV that contains this repository and check if the latest version has the stubborn "duplicates": [root@sat68a repodata]# ls /var/lib/pulp/published/yum/master/yum_distributor/1-cv_test_bz-v2_0-e463a3be-9beb-4b58-954b-3c70b4d76ff9/*/repodata/ -1 170d2873c8c64bf2284ddf4cfd286b1a8d00e745bf059697ae0cc1a80595b27b-primary.xml.gz 887f98b08fd5a31c2f862b5a813212c2edacb7b71025f31739bdae1b056c6a58-other.xml.gz 8d4daf96-3bdc-41fd-b5ad-7ab37ef3cd58 af22d713b7535abba900ee1d1163e04fdaca3aa5cbc80e631032553cb3afaa2e-filelists.xml.gz b468996a0064ff1f778008a88e75ebda41dbb0e9067243c7fb83aad3bd8ea652-updateinfo.xml.gz c716c2a27a0c069aa95fc982573b552da9089b870cb1534b412704ba4e66ca91-comps.xml repomd.xml End result: no more duplicates. The question that remains is: why did the root repo's yum_distributor dir end up with duplicated metadata files? This behaviour is expected. Old repodata files are stored by default for 14 days. This can be configured by setting the remove_old_repodata_threshold, although I am not sure if it is exposed in katello https://docs.pulpproject.org/en/2.21/plugins/pulp_rpm/tech-reference/yum-plugins.html?highlight=remove_old_repodata_threshold To reproduce this behaviour it is enough to add new content into the repo multiple times and perform publish in between. Files older than 14 days( by default) are removed. Force metadata regeneration removes old files, since in this case publish operation is performed from scratch. If it is not desirable to store old repodata files for 14 days it can be configured to a smaller time period. Step to reproduce in directly in pulp are attached. Created attachment 1738485 [details] bz1903367-pulp-steps Hi Ina, to turn your comment into action I understand we can enable a 1-day threshold by adding the contents below to the `/etc/pulp/server/plugins.conf.d/yum_distributor.json` file which does not exist by default on Red Hat Satellite 6.x:
# cat /etc/pulp/server/plugins.conf.d/yum_distributor.json
{
"remove_old_repodata": True,
"remove_old_repodata_threshold": 1
}
New question: can we set the threshold to zero if we want to completely prevent old repodata from needlessly existing in new repos?
WARNING: my mistake: if you use the `yum_distributor.json` contents from my comment above you will get this error below when running `pulp-manage-db` either directly or indirectly (through e.g. `satellite-installer`):
Updating the database with types []
Found the following type definitions that were not present in the update collection [puppet_module, docker_tag, ostree, modulemd_defaults, package_langpacks, erratum, docker_blob, docker_manifest, yum_repo_metadata_file, package_group, pa
ckage_category, iso, package_environment, drpm, distribution, modulemd, rpm, srpm, docker_image, docker_manifest_list]
Updating the database with types [puppet_module, drpm, ostree, modulemd_defaults, package_langpacks, docker_manifest, docker_blob, erratum, yum_repo_metadata_file, package_group, package_category, iso, package_environment, docker_tag, dis
tribution, modulemd, rpm, srpm, docker_image, docker_manifest_list]
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
No JSON object could be decoded
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 280, in main
return _auto_manage_db(options)
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 347, in _auto_manage_db
migrate_database(options)
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 83, in migrate_database
migration_packages = models.get_migration_packages()
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 348, in get_migration_packages
migration_packages.append(MigrationPackage(migration_package_module))
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 172, in __init__
available_versions = self.available_versions
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 219, in available_versions
migrations = self.migrations
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 248, in migrations
migration_modules.append(MigrationModule(module_name))
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 91, in __init__
self._module = _import_all_the_way(python_module_name)
File "/usr/lib/python2.7/site-packages/pulp/server/db/migrate/models.py", line 365, in _import_all_the_way
module = __import__(module_string)
File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/migrations/0016_new_yum_distributor.py", line 33, in <module>
NEW_DISTRIBUTOR_CONF = read_json_config(NEW_DISTRIBUTOR_CONF_FILE_PATH)
File "/usr/lib/python2.7/site-packages/pulp/common/config.py", line 681, in read_json_config
config = json.load(f)
File "/usr/lib64/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
FIX:
Use true (lowercase) instead of True.
THE CORRECT CONTENTS ARE:
# cat /etc/pulp/server/plugins.conf.d/yum_distributor.json
{
"remove_old_repodata": true, <==== note `true` is all lowercase
"remove_old_repodata_threshold": 1
}
The docs are suboptimal and do not explicitly mention but the value for the threshold is expected to be specified in seconds. It should suffice to specify small enough value and any old repodata that pass that threshold will be removed. I believe setting the value to 0 will remove any old repdata regardless of its age, however, I think this case might not be fully tested. After some testing, I see that this is not working to prevent the propagation of unnecessary old metadata:
# cat /etc/pulp/server/plugins.conf.d/yum_distributor.json
{
"remove_old_repodata": true,
"remove_old_repodata_threshold": 1
}
(In reply to Ina Panova from comment #11)
> The docs are suboptimal and do not explicitly mention but the value for the
> threshold is expected to be specified in seconds.
>
> It should suffice to specify small enough value and any old repodata that
> pass that threshold will be removed. I believe setting the value to 0 will
> remove any old repdata regardless of its age, however, I think this case
> might not be fully tested.
A value of 0 is also not working as a means to prevent the propagation of unnecessary old metadata.
Perhaps the logic that turns remove_old_repodata into action is misbehaving?
On Satellite 6.8, I've located and modified /usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/publish.py:
1270 threshold = self.get_config().get(
1271 'remove_old_repodata_threshold',
1272 #datetime.timedelta(days=14).total_seconds()) <== commented out the default value
1273 1) <== new default value is 1 second
This did not solve the issue either.
So here is the test I'm performing:
# ls ./published/yum/master/yum_distributor/1-cv_mycv-v13_0-*/*/repodata
./published/yum/master/yum_distributor/1-cv_mycv-v13_0-626293c0-3cc2-4a90-af5b-6d48a18ac53e/1612975312.32/repodata:
078bdcd5-96a5-45d2-b838-9414c0bc1a84 86b01422fe44ee18f18cf43be24b09362dfbf22efaadc5e8c497647bf9419794-updateinfo.xml.gz
2a3d2c604a604e3796e83a2deb58906d3186b05d999da5871a8ef07caf874480-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml
3efa8ee78de989cd08682cb605de03c4e2e8a005ffc2026adc0dc615e953898d-updateinfo.xml.gz a36be2bed4496e10a4af858bf743b4628c62175d2b11bab040ce0f1609405927-primary.xml.gz
3fbd7e543de1f82086d2f0b836d40f9b18c5359ba3b164df04018e509f11e2a9-filelists.xml.gz d6d40f1d6f4e5663f5dcafa03fa1585652a5a07fe84b04e02ea4568cf8aaebcb-filelists.xml.gz
5b2a6258eb64ba18142e37454d820499fa993b3ad458116caacb25878c9b43a3-primary.xml.gz repomd.xml
76bbf7f0bc7ac45ae01ff40a7b05681e703004f2b418f22afba32d06d7c5ed23-other.xml.gz
./published/yum/master/yum_distributor/1-cv_mycv-v13_0-7c326edf-b01b-4584-8f53-79696f4bb8d8/1612975312.39/repodata:
056dfc15b97ca51d6c0b17e6d96cfe9812a07bb9b9c0b6d13672b1082a784b05-updateinfo.xml.gz 91b29cd7bc395982f51ae7360297c6f4a107479d9b8a0504136037e3a5702d6c-filelists.xml.gz
0d053db8860939b4316087571ca80dcf91bd52adf3a4a8fb4258619b871262ac-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml
3fa96672cfc133a227eb733ed73507b65a46df74309fefa74f85aa0a0c0859dc-primary.xml.gz cc39646353313b6b41f0aceda68182416e443eccebc38840c3e8816bae7dd443-filelists.xml.gz
41ef974230c69d2bd9888190f2f827d2e1201f31597a81f92a6aa74fec688ba1-other.xml.gz fd3ab046ab62d410cf1e3a81a83dfcf56bb0fbca071542c159dfb58d1d63167f-primary.xml.gz
8745c868fea4065dcac832c8b343eeac7d9f82b5b11551dbac229264ee66a862-updateinfo.xml.gz repomd.xml
Then, publish new version.
NOTE: no changes to contents happened, I'm simply publishing a new version that is bound to have no difference in contents to v13.
Re-run `ls`:
./published/yum/master/yum_distributor/1-cv_mycv-v14_0-626293c0-3cc2-4a90-af5b-6d48a18ac53e/1612975428.77/repodata:
078bdcd5-96a5-45d2-b838-9414c0bc1a84 86b01422fe44ee18f18cf43be24b09362dfbf22efaadc5e8c497647bf9419794-updateinfo.xml.gz
2a3d2c604a604e3796e83a2deb58906d3186b05d999da5871a8ef07caf874480-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml
3efa8ee78de989cd08682cb605de03c4e2e8a005ffc2026adc0dc615e953898d-updateinfo.xml.gz a36be2bed4496e10a4af858bf743b4628c62175d2b11bab040ce0f1609405927-primary.xml.gz
3fbd7e543de1f82086d2f0b836d40f9b18c5359ba3b164df04018e509f11e2a9-filelists.xml.gz d6d40f1d6f4e5663f5dcafa03fa1585652a5a07fe84b04e02ea4568cf8aaebcb-filelists.xml.gz
5b2a6258eb64ba18142e37454d820499fa993b3ad458116caacb25878c9b43a3-primary.xml.gz repomd.xml
76bbf7f0bc7ac45ae01ff40a7b05681e703004f2b418f22afba32d06d7c5ed23-other.xml.gz
./published/yum/master/yum_distributor/1-cv_mycv-v14_0-7c326edf-b01b-4584-8f53-79696f4bb8d8/1612975428.58/repodata:
056dfc15b97ca51d6c0b17e6d96cfe9812a07bb9b9c0b6d13672b1082a784b05-updateinfo.xml.gz 91b29cd7bc395982f51ae7360297c6f4a107479d9b8a0504136037e3a5702d6c-filelists.xml.gz
0d053db8860939b4316087571ca80dcf91bd52adf3a4a8fb4258619b871262ac-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml
3fa96672cfc133a227eb733ed73507b65a46df74309fefa74f85aa0a0c0859dc-primary.xml.gz cc39646353313b6b41f0aceda68182416e443eccebc38840c3e8816bae7dd443-filelists.xml.gz
41ef974230c69d2bd9888190f2f827d2e1201f31597a81f92a6aa74fec688ba1-other.xml.gz fd3ab046ab62d410cf1e3a81a83dfcf56bb0fbca071542c159dfb58d1d63167f-primary.xml.gz
8745c868fea4065dcac832c8b343eeac7d9f82b5b11551dbac229264ee66a862-updateinfo.xml.gz repomd.xml
We can see that metadata files are simply being copied over to the new version -- no evaluation of each metadata file's age is being done.
I can't say this (new CV version with no content differences) is exactly what is happening on all Satellites facing this issue, but at least in this scenario the issue appears but as far as I can tell the 'remove_old_repodata_threshold' logic should be kicking in and removing from the repodata/ directory any metadata files older than 1 second.
@Ina, any thoughts or suggested next steps?
I now see that not examining metadata file timestamps is part of the repository clone process that takes place when there are no changes to a given repo. Can this behavior be changed by some config setting? Can RemoveOldRepodataStep be added to the repository cloning process? So, back to the original situation: on Satellite we are seeing old repository metadata files be unnecessarily kept under /var/lib/pulp/, taking more and more storage space. If one publishes multiple CVs every day -- and this is a perfecly valid usage scenario for Satellite -- one may end up with hundreds of GB of wasted storage space. Setting remove_old_repodata_threshold to 1 inside /etc/pulp/server/plugins.conf.d/yum_distributor.json has not helped prevent this phenomenon. This is showing to be a big problem, please prioritize it as such. @Pablo, I have checked once again that the distributor.json works on the pulp side as it should be even with 1 second set. There are some specifics ongoing on the katello side. I will defer the rest to @jsherrill Hey Pablo! There are a couple of issues at play here i think: 1. Generating metadata during repo sync and content view publish are keeping old copies of metadata around (you may need a 'skip metadata sync' to see it actually republish the metadata) 2a. Promoting Content Views versions with a repo with multiple copies of metadata is propagating that into lifecycle environment repos at promotion time 2b. Even when you've published a new content view version with only a single set of metadata, promoting them is not causing any data to be I believe the configuration that ina mentions should resolve 1), can you confirm if it is? I understand that even if 1) is solved, new content view versions may not contain changes and thus may not have their metadata regenerated when promoting to a lifecycle environment? If thats the case you may want to manually trigger a 'regenerate repo metadata' for each content view versions. You can do this from the content view version list or via 'hammer content-view version republish-repositories' command. There's a lot in satellite that attempts to reduce the amount of work done in the cases where there are no changes, and i think this is preventing the fix from being seen (assuming 1) is working) Thank you Justin for chiming in and for the comprehensive review, and Ina for the constructive discussion and testing. Regarding #1: > 1. Generating metadata during repo sync and content view publish are keeping old copies of metadata around (you may need a 'skip metadata sync' to see it actually republish the metadata) (...) > I understand that even if 1) is solved, new content view versions may not contain changes and thus may not have their metadata regenerated when promoting to a lifecycle environment? If thats the case you may want to manually trigger a 'regenerate repo metadata' for each content view versions. I believe this is the core of the issue: having to regenerate metadata (this is expensive) in order to simply keep old metadata from being copied around doesn't seem more efficient than taking a quick, cheap look (it ought to be cheap) at metadata file age before blindly copying them over to the new repo (or CV version). > There's a lot in satellite that attempts to reduce the amount of work done in the cases where there are no changes, and i think this is preventing the fix from being seen (assuming 1) is working) I know this and I thoroughly appreciate it. I'm asking for the logic to just be improved a (hopefully) tiny bit to accommodate these cases. In my experience this (hopefully) tiny change in the logic would benefit a significant number of customers. One more question I think should be asked: why do we go with a default of 14 days for keeping old metadata along with new metadata? Why even keep old metadata? What purpose is the old metadata expected to serve since it's not referenced in repomd.xml?
Going forward, if we could have Satellite adopt a default of
"remove_old_repodata_threshold": 0
...it would serve our users much better in terms of preventing old metadata build-up.
If the reluctance to keeping old metadata around sounds like an overreaction, please bear in mind rhel-7-server-rpms metadata is now 809 MB in size, with a whopping 771 MB being used for the "other" data type alone. Having this repo metadata duplicated in one single CV means the CV uses 1+ GB of space for metadata alone. And if you then publish a new CV version with no changes, the new version metadata will also be 1+ GB in size. Now think of this problem applied to big CVs, and picture these big CVs being published with or without change every week, and also promoted. The problem gets *really* big.
I second the comments made by Pablo, with a real life experience, and the effect it has on large systems: In my example, I'm going to publish a new version for the CV for rhel7, and then publish the associated CCV (BASE is "just" rhel7 + sat repos, EXTENDED is rhel7 + sat tools + epel7 + extra + optionnals) Every time, i'm looking at the free storage left on /var/lib/pulp prior new version of CV RHEL7: > 288 645M post new version of CV RHEL7: > 285 499M (aka approx 3GB more storage used) Post New version of CCV (BASE) > Create the Library > 281 790M (aka approx 4GB more storage used) Post Promote of CCV to Early > 281 561M (good news, stays approx the same) Remove the old version of the CCV > 283 141M (frees only 2GB - compared to 4GB used for the new version) Post New version of CCV EXTENDED > 279 513M (approx 4GB more storage used) Post promote of CCV to Early > 278 726M (approx 1GB used, strange compared to the BASE CCV) Remove the old version of the CCV > 280 776M (frees about 2GB, in line with BASE) Long story short, with approx 100CCV to publish (20 orgs, and approx 5 CCV per orgs), I end up churning 300GB for my publish cycle ! This was not happening like that prior 6.8.x (In reply to Pablo Hess from comment #17) > Thank you Justin for chiming in and for the comprehensive review, and Ina > for the constructive discussion and testing. > > Regarding #1: > > 1. Generating metadata during repo sync and content view publish are keeping old copies of metadata around (you may need a 'skip metadata sync' to see it actually republish the metadata) > (...) > > I understand that even if 1) is solved, new content view versions may not contain changes and thus may not have their metadata regenerated when promoting to a lifecycle environment? If thats the case you may want to manually trigger a 'regenerate repo metadata' for each content view versions. > > I believe this is the core of the issue: having to regenerate metadata (this > is expensive) in order to simply keep old metadata from being copied around > doesn't seem more efficient than taking a quick, cheap look (it ought to be > cheap) at metadata file age before blindly copying them over to the new repo > (or CV version). > > > There's a lot in satellite that attempts to reduce the amount of work done in the cases where there are no changes, and i think this is preventing the fix from being seen (assuming 1) is working) > > I know this and I thoroughly appreciate it. I'm asking for the logic to just > be improved a (hopefully) tiny bit to accommodate these cases. In my > experience this (hopefully) tiny change in the logic would benefit a > significant number of customers. Hi Pablo, it's Tanya from Pulp. I understand the frustration, and, unfortunately, fixing the problem without noticeably degrading performance (e.g. remove some optimizations) is more complicated than it looks on the surface. The first step towards the solution is to ensure that the threshold setting on pulp side works as expected, and then we can try to figure out what can be done on Katello side. Could you confirm what Justin asks in bz1903367#c16? > I believe the configuration that ina mentions should resolve 1), can you confirm if it is? FWIW, your tests in the bz1903367#c13 with no changes to content and without 'skip metadata sync' didn't change metadata as designed/expected. This is due to optimizations on Pulp side. You need to change some content or bypass the optimizations explicitly with 'skip metadata sync'. Thank you. (In reply to Tanya Tereshchenko from comment #24) > Could you confirm what Justin asks in bz1903367#c16? > > I believe the configuration that ina mentions should resolve 1), can you confirm if it is? I believe so. However, like I said in c#18, we would need Satellite to adopt a default "remove_old_repodata_threshold" value of zero in order to prevent it from adding old metadata from the beginning. If old metadata sneaks in at any point in time, it will be blindly copied around from then on. Only by having a value of zero since the first content sync would Pulp be able to avoid holding old metadata. On an existing Satellite, I have tried setting "remove_old_repodata_threshold" to 1 (I have not tried 0). This works if and only if we manually remove all unreferenced old metadata (e.g. with the commands in c#4) prior to doing any CV publish/promote. In this case, Satellite will not keep old unreferenced metadata around when we publish CVs next time. > FWIW, your tests in the bz1903367#c13 with no changes to content and without > 'skip metadata sync' didn't change metadata as designed/expected. > This is due to optimizations on Pulp side. You need to change some content > or bypass the optimizations explicitly with 'skip metadata sync'. Right, makes sense. Thank you for clarifying. Plan of action: 1) set remove_old_repodata_threshold to '0' by default 2) provide a foreman-maintain command to cleanup these duplicates If a users upgrades to include 1), their content views should get cleaned up automatically as they publish/promote new versions (and delete old versions). This only happens if a new version has some changes in it (so should happen over time). But if a user wants to speed this up, they could run the foreman-maintain script *** Bug 1921752 has been marked as a duplicate of this bug. *** Upstream bug assigned to jsherril Upstream bug assigned to jsherril Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/32966 has been resolved. The pulp puppet module didn't seem to get an update, i'd expect 8.2.0 according to https://gitlab.sat.engineering.redhat.com/satellite6/foreman-installer/-/commit/5d18354bda7d7e0f9926e2ce834d5319b46edf1b but i see 8.1.1: # cat /usr/share/foreman-installer/modules/pulp/metadata.json | grep version "version": "8.1.1", rpm -q foreman-installer foreman-installer-2.3.1.18-1.el7sat.noarch this is a build issue, see https://bugzilla.redhat.com/show_bug.cgi?id=1903367#c37 for details Verified on Satellite 6.9.5, snap 3. Steps to Test: 1. On a separate Satellite server from the Satellite 6.9.5 instance being tested, create a directory in /var/www/html/pub: # mkdir /var/www/html/pub/repo 2. Change into the directory, download an RPM to the directory, and create a repository in the directory: ~~~ # cd /var/www/html/pub/repo # wget https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/bear-4.1-1.noarch.rpm # createrepo . ~~~ 3. On the Content > Products page of the Satellite webUI, create a new custom product. 4. Create a custom repository in the product with the repository created in the previous step as the upstream URL. 5. Synchronize the repository. 6. On the Satellite hosting the upstream repository, download a second RPM to the repositorydirectory and run `createrepo` again: ~~~ # wget https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/camel-0.1-1.noarch.rpm # createrepo . ~~~ 7. On the Satellite 6.9.5 instance being tested, synchronize the custom repository a second time. 8. Create a new content view containing the custom repository. 9. Publish three versions of the content view. 10. Check each published content view version for duplicate repodata: ~~~ # ls /var/lib/pulp/published/yum/master/yum_distributor/1-repodata_dupe_test-v3_0-cb49cf3b-fe30-4dad-96ec-a689a705fc53/1630006867.47/repodata/ 3cf3c635f08ac92c0483f0ca96eaf60f62131d1ae828d44d94ae43e2fc2d0346-primary.xml.gz 3d9cbba2f0b04239db929749a6d57ed975cb659240040a46bd4616bbe33049ff-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml ab2a4d42f8a2b35c95b8d9b5f946d76688f1316eb0bf48752948101da67447c9-updateinfo.xml.gz cf62ad33787a800a447a349e4969df93e4e2b6c7ddb5395a2b98d9b84dfa5a26-filelists.xml.gz repomd.xml # ls /var/lib/pulp/published/yum/master/yum_distributor/1-repodata_dupe_test-v2_0-cb49cf3b-fe30-4dad-96ec-a689a705fc53/1630006855.15/repodata/ 3cf3c635f08ac92c0483f0ca96eaf60f62131d1ae828d44d94ae43e2fc2d0346-primary.xml.gz 3d9cbba2f0b04239db929749a6d57ed975cb659240040a46bd4616bbe33049ff-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml ab2a4d42f8a2b35c95b8d9b5f946d76688f1316eb0bf48752948101da67447c9-updateinfo.xml.gz cf62ad33787a800a447a349e4969df93e4e2b6c7ddb5395a2b98d9b84dfa5a26-filelists.xml.gz repomd.xml # ls /var/lib/pulp/published/yum/master/yum_distributor/1-repodata_dupe_test-v1_0-cb49cf3b-fe30-4dad-96ec-a689a705fc53/1630006838.87/repodata/ 3cf3c635f08ac92c0483f0ca96eaf60f62131d1ae828d44d94ae43e2fc2d0346-primary.xml.gz 3d9cbba2f0b04239db929749a6d57ed975cb659240040a46bd4616bbe33049ff-other.xml.gz a27718cc28ec6d71432e0ef3e6da544b7f9d93f6bb7d0a55aacd592d03144b70-comps.xml ab2a4d42f8a2b35c95b8d9b5f946d76688f1316eb0bf48752948101da67447c9-updateinfo.xml.gz cf62ad33787a800a447a349e4969df93e4e2b6c7ddb5395a2b98d9b84dfa5a26-filelists.xml.gz repomd.xml ~~~ Expected Results: No duplicate repodata files are present in any of the content view versions. Actual Results: No duplicate repodata files are present in any of the content view versions. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Satellite 6.9.5 Async Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3387 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |