Bug 1255090
Summary: | Kickstart Trees duplicate packages in /var/lib/pulp/content/rpm | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sebastian Hetze <shetze> | ||||||
Component: | Pulp | Assignee: | Brad Buckingham <bbuckingham> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Peter Ondrejka <pondrejk> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.1.0 | CC: | bbuckingham, bkearney, dgregor, mhrivnak, mverma, pmutha, pondrejk, sthirugn | ||||||
Target Milestone: | Unspecified | Keywords: | Triaged | ||||||
Target Release: | Unused | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-12-19 21:29:12 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Sebastian Hetze
2015-08-19 15:00:56 UTC
With the current 6.2 beta snap this problem remains unresolved. find /var/lib/pulp/content/units/rpm/ -name bash-4.2.46-19\* /var/lib/pulp/content/units/rpm/b5/701df8aa2a86d9884c8c946a55593fea073e6a9575207556378752dc768055/bash-4.2.46-19.el7.x86_64.rpm /var/lib/pulp/content/units/rpm/1a/53b1410864e0b5fd1bc2d71409c8e11c0c03fac1d0b65b30416eddeaa4f512/bash-4.2.46-19.el7.x86_64.rpm sha256sum /var/lib/pulp/content/units/rpm/b5/701df8aa2a86d9884c8c946a55593fea073e6a9575207556378752dc768055/bash-4.2.46-19.el7.x86_64.rpm /var/lib/pulp/content/units/rpm/1a/53b1410864e0b5fd1bc2d71409c8e11c0c03fac1d0b65b30416eddeaa4f512/bash-4.2.46-19.el7.x86_64.rpm 88b662408745b64513268d6c3c57484a0625c264f63e4509174c6b6f2507cf96 /var/lib/pulp/content/units/rpm/b5/701df8aa2a86d9884c8c946a55593fea073e6a9575207556378752dc768055/bash-4.2.46-19.el7.x86_64.rpm 88b662408745b64513268d6c3c57484a0625c264f63e4509174c6b6f2507cf96 /var/lib/pulp/content/units/rpm/1a/53b1410864e0b5fd1bc2d71409c8e11c0c03fac1d0b65b30416eddeaa4f512/bash-4.2.46-19.el7.x86_64.rpm So the cryptic directory name apparently does not reflect the checksum anymore. We still have the complete content of the kickstart tree duplicated in the pulp space. Michael, is the behavior described by Sebastian expected for 6.2? What you see is expected. Pulp will not remove content from the filesystem unless two things happen in order: 1) the content is removed from all repositories. If the RPM is still associated with any other repo, pulp will keep it. 2) an "orphan purge" task is initiated Both of those are in katello's control, so I'm not sure when they would be expected to happen. To be clear, this duplication happens on a freshly installed Satellite. There is no existing content so the expectation is not that Pulp or Katello removes something but just does not store something in a second location that it already has. Ok, thank you for that clarification. I don't think the dual-storage problem is going to be completely resolved in the near future, but we have been working with RCM on a longer-term resolution. I believe RCM cleaned up the checksum mis-match on their end, so that should help future syncs. We also changed pulp to always store RPMs using the sha256 checksum value for uniqueness, regardless of which algorithm is used in the remote repo metadata. That will also prevent this problem during future syncs. This change landed in pulp 2.9.0. Based on comment 18, I am moving this bug to 6.3 for verification since 6.3 will include pulp 2.9. In Satellite 6.3 snap 6 there is a change in paths but this seems to persist, for example: ~]# find /var/lib/pulp/ -name bash-4.2.46-12.el7.x86_64.rpm /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_RPMs_x86_64_7Server/1480431662.42/bash-4.2.46-12.el7.x86_64.rpm /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_Kickstart_x86_64_7_1/1480432704.29/Packages/bash-4.2.46-12.el7.x86_64.rpm /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_Kickstart_x86_64_7_1/1480432704.29/bash-4.2.46-12.el7.x86_64.rpm Also, you can list duplicate packages in UI at Content > Packages, revealing the checksum mismatch persists (compare attached screenshots) Created attachment 1225908 [details]
389-kickstart_repo
Created attachment 1225909 [details]
389-server_repo
(In reply to Peter Ondrejka from comment #20) > In Satellite 6.3 snap 6 there is a change in paths but this seems to > persist, for example: > > ~]# find /var/lib/pulp/ -name bash-4.2.46-12.el7.x86_64.rpm > /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization- > Red_Hat_Enterprise_Linux_Server- > Red_Hat_Enterprise_Linux_7_Server_RPMs_x86_64_7Server/1480431662.42/bash-4.2. > 46-12.el7.x86_64.rpm > /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization- > Red_Hat_Enterprise_Linux_Server- > Red_Hat_Enterprise_Linux_7_Server_Kickstart_x86_64_7_1/1480432704.29/ > Packages/bash-4.2.46-12.el7.x86_64.rpm > /var/lib/pulp/published/yum/master/yum_distributor/Default_Organization- > Red_Hat_Enterprise_Linux_Server- > Red_Hat_Enterprise_Linux_7_Server_Kickstart_x86_64_7_1/1480432704.29/bash-4. > 2.46-12.el7.x86_64.rpm > > Also, you can list duplicate packages in UI at Content > Packages, revealing > the checksum mismatch persists (compare attached screenshots) Was this using a download policy of on_demand or background? In that use case, there is nothing Pulp can do, because we only know the checksum for an rpm as it is listed in the repo metadata. For example, consider that you have repo_a whose metadata uses sha1, and repo_b whose metadata uses sha256. They contain the same RPMs. If you sync repo_a with the on_demand policy, it will create an entry in the database for each RPM using a sha1 checksum. If you then sync repo_b with the on_demand policy, it will create new entries for each RPM using the sha256 checksum. Pulp has no way to compare these with the existing sha1 checksums, so the equivalence goes unrecognized. If you sync'd both using the "immediate" policy, pulp calculates all supported checksum types, and always makes DB entries using the sha256 algorithm. That would enable pulp to recognize the equivalence during the sync of repo_b. I'm closing as "wontfix" since I believe Pulp is already doing everything it can given available information. We could consider an RFE to de-duplicate data after-the-fact, but that would not be possible without major changes to the way content is served [0]. Those changes would only be possible post-pulp3. Let me know if you have any additional questions or feedback. [0] To explain from a high level, if pulp recognized that two RPMs are in the DB twice, and thus on disk twice, that means published data likely exists with symlinks to both. Pulp doesn't currently track publications, so it would be impossible to remove one of the files without risk of breaking lots of published repos. *** Bug 1418676 has been marked as a duplicate of this bug. *** |