Created attachment 1053020 [details] inode_usage of pulp nodes publushed repos Description of problem: To make the pulp data available for Capsules (pulp nodes) the pulp repositories are made available in /var/lib/pulp/nodes/published/https/repos. With every repository added (Also Content Views also creates implicit repositories) a new directory is added. Example 1: Duplication per Content View, that is O(N): - A Redhat Server 6.5 repository will generate 60.000 inodes (45.000 directories and 15.000 symlinks). When this is used by 10 Content Views also this RedHat Server repositories will be created 10 times, that means 600.000 inodes. Example 2: RedHat CDN, that recreates directories: - Syncing RedHat channels will always recreated directories for every RedHat repository. That means that with a daily sync for an each (both non-EUS and EUS) Server repository those 60.000 need to be created and also deleted. We are syncing both non-EUS,EUS of Kicksatrt,Server,Optional,RHSCL channels for the following releases 6.5,6.6,6Server,7.1,7Server. This totals then to 71 Directories: /var/lib/pulp/nodes/published/https/repos# ls -1d Hilti-Red* Hilti-Oracle* | wc -l 71 Taking an average of 30.000 inodes that means: 30.000 * 2 (both create and delete) * 71 = 42.000.000 inodes IO actions need to be done per day. The shared repository of all RPMs uses only: 228.000 inodes The pulp nodes published directory contains: 6.352.462 inodes The pulp nodes published directory contains: 550 directories See the attached inode_usage.txt for details Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Sync both non-EUS,EUS of Kicksatrt,Server,Optional,RHSCL channels for the following releases 6.5,6.6,6Server,7.1,7Server. 2. Sync all RedHat repositories once 3. Do an incremental Sync of RedHat repositories and monitor the number of IO transactions on the filesystem 4. Create and Publish 20 ContentViews with each content view has at least Kickstart,Server,Optional repositories included 5. Check the /var/lib/pulp/nodes/published/https/repos directory 6. Check for duplicate directories 7. Count number of inodes in /var/lib/pulp/content/rpm 8. Count number of inodes in /var/lib/pulp/nodes/published/https/repos Actual results: - Many Inode transactions during RedHat sync, in fact the time for an incremental sync is almost the same as a full sync. - Inode usage in the pulp nodes published directory is N-times higher than the shared content/rpm Expected results: - Incremental sync shall use at maximum IO actions related to the incremental changes - Inode usage in pulp nodes published is at maximum the amount of inodes used by the shared rpm content Additional info:
Created attachment 1053021 [details] inode_usage script
Since this issue was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.
Please verify that this requires a release note, and if so please provide some suitable doc text. thanks
This bug requires modifications to improve the inode usage; however, it shouldn't require a rel note at this time; therefore, removing the sat61-release-notes blocker.
Jeff, please add a link to an upstream issue describing your proposal.
Done.
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
Stuart, After pulp is updatd, the existing links (inode usage) would be cleaned up on subsequent publishes. If we don't want wait for publishes, we can include a migration script in the solution. -jeff
Hi Jeff, Maybe i'm missing something here, but under what circumstances would an *existing* content view get re-published? Regards Stuart
Please take care of Composite Content Views and Environments. These are refering to dedicated Content View Versions. Re-publishing adds a new Content View Versions. The current Sat6 (without automatic latest selection support) cannot make assumptions what to do with Lifecycle Environment Promotions or Content View Versions.
My understanding of content view lifecycle is limited. Thanks for the clarification, Peter. Looks like we'll want to include a migration script in the solution to clean up the unwanted symlinks.
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.
The solution makes it possible for existing symlinks created during node publishing to be blindly deleted. After upgrade they are no longer used by the child node (capsule). The links are published in /var/lib/pulp/nodes/published Subsequent publishes will delete the links. However if admins want them deleted prior to the next publish, it can be done in one of two ways: 1. Admins can delete them manually. Example: find /var/lib/pulp/nodes/published/https/repos -type l -exec rm -f {} \; 2. The RPM spec file can run #1 during upgrade. I prefer #1. Thoughts?
Using the RPM %post is not good as the removal process can take hours. If a user forgets the manual step in the upgrade process it will never be done, because all the next upgrades do not contain this cleanup step anymore. Recommendation: - have a special pulp background task do the cleanup. Then it is for sure that the cleanup will be executed and done in a controlled way. Alternative: - add it to katello-upgrade, but that is something more Satellite specific. Only remove of symlinks does still leave there directories. Is this by design?
Yes. The content/ directory still contains a few real files.
The major inode use is by the directory tree, see Example 1 in the description https://bugzilla.redhat.com/show_bug.cgi?id=1244130#c0. For RHEL6.5 EUS 45.000 directories 15.000 symlinks Deleting the symlinks is therefor not enough. Also the empty directories need to be cleaned.
Moving to POST since an upstream fix is available.
Understood about the need to purge the directories as well. I still think a reasonable approach is to provide admin suggested commands to clean things up. Something like: find /var/lib/pulp/nodes/published -type l -delete find /var/lib/pulp/nodes/published -type d -empty -delete We could also provide a shell script that performs this clean up as well. A background task in pulp to do this seems like overkill.
I did not know about the empty and delete features of find. I agree that using those 2 simple find commands are good enough to perform the cleanup.
To verify this bz, I installed Sat 6.1.6 along with capsule to see the real issue. Now I need some clarification before upgrading to satellite 6.1.7 I'm assuming, before upgrade, we need to check and delete existing published links and directories manually on satellite and capsule server as per comment25, right ? find /var/lib/pulp/nodes/published -type l -delete find /var/lib/pulp/nodes/published -type d -empty -delete 2. Need to upgrade with 6.1.7 and again need to validate if published links and directories are being created : a) on Sat sever by publishing some CV's on sat server b) by syncing contents from sat server to capsule ? Note: I synced rhel6, 7 server kickstart, optional and rhscl repos on Sat server. @Jeff: could you please take a look and confirm if my assumptions are correct based on bz history ?
(In reply to Jeff Ortel from comment #30) > (In reply to Sachin Ghai from comment #28) > > To verify this bz, I installed Sat 6.1.6 along with capsule to see the real > > issue. Now I need some clarification before upgrading to satellite 6.1.7 > > > > I'm assuming, before upgrade, we need to check and delete existing published > > links and directories manually on satellite and capsule server as per > > comment25, right ? > > Yes. > > > > > find /var/lib/pulp/nodes/published -type l -delete > > find /var/lib/pulp/nodes/published -type d -empty -delete Thanks Jeff. I removed all existing soft links and empty dir before upgrade.. [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type l -delete [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type l | wc -l 0 [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type d -empty -delete [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type d -empty | wc -l 0 Now proceeding with upgrade with 6.1.7 and will update the test results here.
Upgrade went fine from Sat 6.1.6 to 6.1.7 along with external capsule. Later, after upgrade, I ran a capsule sync and it was finished successfully. -- [root@ibm-x3550 ~]# hammer -u admin -p changeme capsule content synchronize --id=2 [............................................................................................................................................] [100%] -- Two observations: ------------------- 1) Apache's pulp_node.conf updated with new stanza: <snip> <Directory /var/www/pulp/nodes/content > Options FollowSymLinks Indexes SSLRequireSSL <snip> 2) meta-data re-generated on every capsule sync. Feb 9 18:53:32 ibm-x3550m3-06 pulp: pulp.plugins.pulp_rpm.plugins.distributors.yum.metadata.metadata:WARNING: Overwriting existing metadata file [/var/lib/pulp/working/repos/Default_Organization-Dev-cv_rhel7-capsule_rhel7-cap_rhel7_617/distributors/yum_distributor/repodata/repomd.xml]
After upgrade with 6.1.7 compose1. I performed following tests: - re-synced some of the existing Red Hat repos - success - enabled new repos and synced them - success - created new CV with new and existing repos and published them - success - published existing CV - success - re-synced contents from satellite -> capsule - success I don't find any symlinks after upgrade. Also the inode count for published repos reduced to very low count. Before upgrade: ---------------- [root@ibm-x3550m3 ~]# find /var/lib/pulp/nodes/published -type l | wc -l 568186 [root@ibm-x3550m3 ~]# find /var/lib/pulp/nodes/published -type d | wc -l 2141711 [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published/https/repos -name '*' | wc -l 2710169 After upgrade: ---------------- [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type l | wc -l 0 [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type d -empty | wc -l 1 [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published -type d | wc -l 201 [root@ibm-x3550 ~]# find /var/lib/pulp/nodes/published/https/repos -name '*' | wc -l 499
As per the comment 33, Moving this bz to verified. thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0174
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.