Bug 1419867
Summary: | it takes too long to re-sync already synced channel | ||
---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Jan Hutař <jhutar> |
Component: | Satellite Synchronization | Assignee: | Jan Dobes <jdobes> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jan Hutař <jhutar> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 580 | CC: | tlestach |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | spacewalk-backend-2.5.3-79 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-06-21 12:09:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1358815 |
Description
Jan Hutař
2017-02-07 09:45:20 UTC
Can you tell which part of sync takes most of the time? My suspect is the first part where are evaluated packages which are needed to sync. (In reply to Jan Dobes from comment #1) > Can you tell which part of sync takes most of the time? My suspect is the > first part where are evaluated packages which are needed to sync. IMO you are right. Please see attached output. I think the main reason why this part is slow when channel is synced is because there is performed sha sum on each package RPM on file system which is referenced from repodata and decided if it matches file in repository. satellite-sync was not doing it most likely. I'm not sure how this could be optimized other way than by simplifying these checks. I don't like the fact each cdn-sync/reposync run needs to go compute checksum of every RPM in channel. It can't just compare the checksum value from Satellite database because it's sha256, in repodata are often sha1 checksums. Maybe would be useful to do some caching of these computed checksums or compute them only with some special option. I implemented some kind of cache for this in file /var/cache/rhn/reposync/checksum_cache. It caches computed checksums for given path in Satellite package storage. It's global for all packages in Satellite, for RHEL 6 base channel it created cache file with size like 4 MB. Let's see how it performs and split it into multiple files or compress it if necessary. spacewalk master: 40f794c894a8cc4d5b664415ff0153fd93b2ff40 I'm going to see for other ways how to optimize the run and maybe add more fixes here. fixing previous commit in spacewalk master: 49f910e6b539336229ce4407f180df4eb35f57e9 fixing pylint in spacewalk master: 68a25b6efc6ede6e630a4d14e8bb6099fb578011 skip package to channel subscription when there was no package synced: spacewalk master: b4456e8a87df3787561c6d6dca8efb7f40c53f05 correct commit from comment #10 is: b716ceb5fd7262958a72b1449551968232d9ce80 one small improvement in kickstart sync, spacewalk master: 8cc501a68d8ce491cc26aeeecbd91f993496ad59 improving errata import to not import always all errata, adding --force-all-errata same as in satellite-sync, spacewalk master: 0614dc0a022e9a70bb68091e3d804fc7a656e020 786c7802a35988f8760e00fe60821d225fa0ec82 Another possible scenario: 1. I copy /var/satellite with synced rhel-x86_64-server-6 from another Satellite to my fresh Satellite 2. Run cdn-sync -c rhel-x86_64-server-6 I have feeling that all packages are being downloaded (sync seems to be very slow). Does this match with the way it is supposed to work (checksum caching and so on)? If so, should this be fixed or at least reported as a different bugzilla? one more fix in spacewalk master: 5833e299e4955c257170beab187e5798ad315c95 (In reply to Jan Hutař from comment #17) > Another possible scenario: > > 1. I copy /var/satellite with synced rhel-x86_64-server-6 from another > Satellite > to my fresh Satellite > 2. Run cdn-sync -c rhel-x86_64-server-6 > > I have feeling that all packages are being downloaded (sync seems to be very > slow). Does this match with the way it is supposed to work (checksum caching > and so on)? > > If so, should this be fixed or at least reported as a different bugzilla? I'm afraid this scenario is not supported. In this case RPMs are on filesystem but they are not in database. cdn-sync needs to download it again because it doesn't know where in the filesystem the RPMs could be - the path depends on checksum of each RPM -> cdn-sync can only know about packages if they are in the database (where is path already) or if they are downloaded in stagging directory - /var/satellite/redhat/NULL/stage. It's not possible to get the checksummed-path from nothing and detect the RPM somewhere in /var/satellite/redhat/NULL/*. In any case, after download, RPMs metadata need to be imported to the database and this is the slowest part by far. (In reply to Jan Dobes from comment #21) > (In reply to Jan Hutař from comment #17) > > [...] > I'm afraid this scenario is not supported. In this case RPMs are on > filesystem but they are not in database. cdn-sync needs to download it again > because it doesn't know where in the filesystem the RPMs could be - the path > depends on checksum of each RPM -> cdn-sync can only know about packages if > they are in the database (where is path already) or if they are downloaded > in stagging directory - /var/satellite/redhat/NULL/stage. It's not possible > to get the checksummed-path from nothing and detect the RPM somewhere in > /var/satellite/redhat/NULL/*. Hmm, so I can just move these packages into "stage/" and cdn-sync should not download these right? Can I safely pre-cache any packages I'm about to download by this approach? I assume if I place corrupted data into "stage/", cdn-sync will handle it and re-download? (In reply to Jan Hutař from comment #23) > (In reply to Jan Dobes from comment #21) > > (In reply to Jan Hutař from comment #17) > > > [...] > > [...] > > Hmm, so I can just move these packages into "stage/" and cdn-sync should not > download these right? Can I safely pre-cache any packages I'm about to > download by this approach? I assume if I place corrupted data into "stage/", > cdn-sync will handle it and re-download? Correct, you can pick RPMs from directories and put them into stage directory, cdn-sync will check if they match packages from repository. |