Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1490836 - Satellite6 documentation does not contain detailed steps for importing incremental content on a disconnected Satellite
Satellite6 documentation does not contain detailed steps for importing increm...
Status: NEW
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Docs User Guide (Show other bugs)
6.2.11
x86_64 Linux
medium Severity medium (vote)
: Unspecified
: Unused
Assigned To: satellite-doc-list
satellite-doc-list
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-09-12 07:10 EDT by Mihir Lele
Modified: 2018-10-08 01:22 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mihir Lele 2017-09-12 07:10:02 EDT
Document URL: 

https://access.redhat.com/documentation/en-us/red_hat_satellite/6.2/html-single/content_management_guide/#importing_a_content_view_as_a_red_hat_repository

Section Number and Name: D.6.3. Incremental Updates 

Describe the issue: 

Customer is having 2 Satellites in their setup. One which is connected and second one is a disconnected one. They are exporting content from the connected onto the disconnected one. 

Even though the documentation talks about the steps to export the content using hammer cli, the steps to import the incremental updates are not clear and requires lot of trial and error, and guess work.
Comment 1 Rich Jerrido 2017-09-12 09:30:59 EDT
See https://access.redhat.com/blogs/1169563/posts/2641311 under the Buiding a CDN mirror - A practical example using Content ISOs section. 

The same steps for importing multiple content ISOs is the same for importing a base import, then an incremental export.
Comment 2 David Walser 2017-09-13 18:36:47 EDT
What are the listing files for?

Integrating an incremental export into what you already have to keep a functional CDN mirror is exactly what's desired.  One thing you'd need to make that work is have something to rebuild the repodata as well, and that document doesn't address that as far as I can tell.  I created a script myself that does that by running createrepo_c --update --keep-all-metadata on all of the repositories, but it takes forever and the updateinfo files that have the information about the Errata only end up with what was in the last incremental export rather than everything.
Comment 4 Rich Jerrido 2017-09-15 09:09:49 EDT
(In reply to David Walser from comment #2)
> What are the listing files for?
> 

The listing files provide pulp a listing (hence the name) of what directories are in the output. Without them, when you attempt to select a repository (on the content-> Red Hat repositories page), there will not be any repositories listed.  (Basically, the CDN is a series of directories and subdirectories and Satellite uses the listing files to understand the hierarchy)  

> Integrating an incremental export into what you already have to keep a
> functional CDN mirror is exactly what's desired.  One thing you'd need to
> make that work is have something to rebuild the repodata as well, and that
> document doesn't address that as far as I can tell.  I created a script
> myself that does that by running createrepo_c --update --keep-all-metadata
> on all of the repositories, but it takes forever and the updateinfo files
> that have the information about the Errata only end up with what was in the
> last incremental export rather than everything.

You dont need to run any createrepo commands. Your incremental exports already have the metadata and updateinfo needed for a proper sync. 


Assuming that you are doing an export of the Default Org View (which is the 'all the repos view'), you basically have a full export of the CDN, so you can sync it just like you were internet connected. Your import process (the first time) will be

- stage the content somewhere the disconnected Satellite can access.
- change your CDN URL to http://destination/pub/cdn-import/ (or whatever it is called)
- Import Manifest
- Select repos (from the Content->Red Hat Repositories page) and sync.

When you do your incremental export, the import process becomes:

- delete the original import contents (from http://destination/pub/cdn-import/).
- copy the incremental export to the sync directory. (which contains only the new RPMs and new repodata)
- Sync your repos.
Comment 5 David Walser 2017-09-15 12:53:39 EDT
(In reply to Rich Jerrido from comment #4)
> You dont need to run any createrepo commands. Your incremental exports
> already have the metadata and updateinfo needed for a proper sync. 

No they don't.  That's the whole problem.  Incremental exports only include metadata and updateinfo for the packages in that incremental export, not for what the repositories should contain (in full) up to that point.  Therefore, something is needed to integrate that metadata into the metadata you already had from before.

> When you do your incremental export, the import process becomes:
> 
> - delete the original import contents (from
> http://destination/pub/cdn-import/).

Exactly what I don't want.  I want a full in-tact mirror of the CDN that I can sync from.

> - copy the incremental export to the sync directory. (which contains only
> the new RPMs and new repodata)
> - Sync your repos.

No, that doesn't work.  If you do that, it wipes out all of the content on the Satellite and all that's left is the packages in the incremental export.  Everything else you had before gets blown away.  I know, because I've done this, and your documentation also warns that will happen.  You can use the hammer command and do a sync with the incremental option, but that's the only way for this method to work.  If anything ever goes wrong (like accidentally forgetting the incremental option and blowing away all of your Satellite content) and you have to start over with a clean import, using your method, you can't do it because you've lost all of the content you transferred over to your disconnected network.  Hence why I want a method that actually produces an in-tact CDN mirror that you always have the ability to sync from, be it an initial import or update sync, while also allowing the use of the web interface to do the sync (which your incremental method does not allow).
Comment 6 Rich Jerrido 2017-09-15 14:54:59 EDT
(In reply to David Walser from comment #5)
> (In reply to Rich Jerrido from comment #4)
> > You dont need to run any createrepo commands. Your incremental exports
> > already have the metadata and updateinfo needed for a proper sync. 
> 
> No they don't.  That's the whole problem.  Incremental exports only include
> metadata and updateinfo for the packages in that incremental export, not for
> what the repositories should contain (in full) up to that point.  Therefore,
> something is needed to integrate that metadata into the metadata you already
> had from before.
> 
> > When you do your incremental export, the import process becomes:
> > 
> > - delete the original import contents (from
> > http://destination/pub/cdn-import/).
> 
> Exactly what I don't want.  I want a full in-tact mirror of the CDN that I
> can sync from.

They you have to rebuild the yum metadata. The import/export features were built with the belief that the admin doesnt want to keep a full mirror of the CDN in their disconnected environment (As you'd be using 2x the disk space - 1x for the  storage in /var/lib/pulp, 1x for your mirror). 

The expectation is that you export content from the connected side, set the 'mirror location' on the disconnected side, and then import from that well known location, delete the content and repeat as needed. Creating the CDN as a 'full mirror' on the disconnected side, while doable, isn't what the workflow was intended to do.

> 
> > - copy the incremental export to the sync directory. (which contains only
> > the new RPMs and new repodata)
> > - Sync your repos.
> 
> No, that doesn't work.  If you do that, it wipes out all of the content on
> the Satellite and all that's left is the packages in the incremental export.
> Everything else you had before gets blown away.  I know, because I've done
> this, and your documentation also warns that will happen.  You can use the
> hammer command and do a sync with the incremental option, but that's the
> only way for this method to work.  If anything ever goes wrong (like
> accidentally forgetting the incremental option and blowing away all of your
> Satellite content) and you have to start over with a clean import, using
> your method, you can't do it because you've lost all of the content you
> transferred over to your disconnected network.  Hence why I want a method
> that actually produces an in-tact CDN mirror that you always have the
> ability to sync from, be it an initial import or update sync, while also
> allowing the use of the web interface to do the sync (which your incremental
> method does not allow).


Are your repositories set to 'mirror on sync'? If yes, you'll have the behavior you describe. (Which is the repo containing only the contents from its last sync). If your repos aren't set to 'mirror on sync' (which I believe the default for a repo is 'mirror on sync == disabled), Satellite will sync the content from your incremental export and it is to the content the repository already contains.
Comment 7 David Walser 2017-09-15 15:19:08 EDT
(In reply to Rich Jerrido from comment #6)
> They you have to rebuild the yum metadata.

Exactly.  So as I mentioned earlier, I wrote a script to do that, but it takes a lot longer to run than it should and it doesn't correctly incorporate all of the updateinfo into the rebuilt metadata.  I hope createrepo_c will be fixed to work better for this case.

> The import/export features were
> built with the belief that the admin doesnt want to keep a full mirror of
> the CDN in their disconnected environment (As you'd be using 2x the disk
> space - 1x for the  storage in /var/lib/pulp, 1x for your mirror). 
> 
> The expectation is that you export content from the connected side, set the
> 'mirror location' on the disconnected side, and then import from that well
> known location, delete the content and repeat as needed. Creating the CDN as
> a 'full mirror' on the disconnected side, while doable, isn't what the
> workflow was intended to do.

Right, it was built on the assumption that everything will go perfectly, all of the time.  From what I have seen working with Satellite 6 for most of this year, it's a very unstable and immature product, so I wouldn't count on that from that end.  I also don't want to be dependent on a senior admin such as myself always being the one running things.  I'd like a more foolproof and failsafe procedure that I can feel comfortable with junior admins consistently running successfully.  Disk space is relatively cheap.  Time is not.

> Are your repositories set to 'mirror on sync'? If yes, you'll have the
> behavior you describe. (Which is the repo containing only the contents from
> its last sync). If your repos aren't set to 'mirror on sync' (which I
> believe the default for a repo is 'mirror on sync == disabled), Satellite
> will sync the content from your incremental export and it is to the content
> the repository already contains.

IINM, I remember reading about that setting somewhere.  I seem to remember checking and it was set to that, but it was the default; I haven't changed it.  I haven't tried messing with that setting (and it doesn't sound like changing it would fit with the workflow I currently have in place).  So if you went through and changed that, in theory you could do your incremental syncs through the web interface, but again, that assumes that everything always works correctly.  In reality, it doesn't.  Already I've seen bugs in Satellite 6 that have caused me to have to stop and re-do syncs because the Satellite errors out or gets stuck.  I know they were Satellite bugs and not just random failures because I would see these behaviors on my disconnected Satellite right after seeing it happen on my connected one.
Comment 8 Rich Jerrido 2017-09-15 17:50:08 EDT
(In reply to David Walser from comment #7)
> (In reply to Rich Jerrido from comment #6)
> > They you have to rebuild the yum metadata.
> 
> Exactly.  So as I mentioned earlier, I wrote a script to do that, but it
> takes a lot longer to run than it should and it doesn't correctly
> incorporate all of the updateinfo into the rebuilt metadata.  I hope
> createrepo_c will be fixed to work better for this case.
> 

Feel free to file a BZ or open a support case on this. But IMHO, you dont need to run createrepo{,_c} at all. 

> > The import/export features were
> > built with the belief that the admin doesnt want to keep a full mirror of
> > the CDN in their disconnected environment (As you'd be using 2x the disk
> > space - 1x for the  storage in /var/lib/pulp, 1x for your mirror). 
> > 
> > The expectation is that you export content from the connected side, set the
> > 'mirror location' on the disconnected side, and then import from that well
> > known location, delete the content and repeat as needed. Creating the CDN as
> > a 'full mirror' on the disconnected side, while doable, isn't what the
> > workflow was intended to do.
> 
> Right, it was built on the assumption that everything will go perfectly, all
> of the time.  From what I have seen working with Satellite 6 for most of
> this year, it's a very unstable and immature product, so I wouldn't count on
> that from that end.  I also don't want to be dependent on a senior admin
> such as myself always being the one running things.  I'd like a more
> foolproof and failsafe procedure that I can feel comfortable with junior
> admins consistently running successfully.  Disk space is relatively cheap. 
> Time is not.
> 

I am still trying to figure out how the process that you describe is somehow more fool proof than the process I described. 

Fundamentally, you need to 

- sync the RPMs from the CDN
- export them to disk. 
- migrate them to the disconnected environment. 
- apply some post-processing. (in your scenario you recreate yum metadata; in mine we delete the old export and extract the new export in the same place)
- sync the content in the disconnected Satellite. 

Both your process and the process I described are easily automated so that is doesn't require a senior admin to complete. The _only_ place where incremental updates can be problematic is if you do not export with overlapping or adjacent date ranges. 


Example: If I export 7Server on 01-Jul, import it, then create an incremental export from 15-Jul-> 01-Sep, I wont have the RPMs between 02-Jul and 14-Jul. But since pulp only stores RPMs once on import, you can easily solve this by ensuring in your automation (since you are building automation anyway) that your dates are aligned (or you use overlapping dates). 

> > Are your repositories set to 'mirror on sync'? If yes, you'll have the
> > behavior you describe. (Which is the repo containing only the contents from
> > its last sync). If your repos aren't set to 'mirror on sync' (which I
> > believe the default for a repo is 'mirror on sync == disabled), Satellite
> > will sync the content from your incremental export and it is to the content
> > the repository already contains.
> 
> IINM, I remember reading about that setting somewhere.  I seem to remember
> checking and it was set to that, but it was the default; 

A quick spot-check of my 6.2.11 instance shows it as being the default. I seem to remember that it didn't used to be the default, but I could be mis-remembering. It is possible we changed the defaults. Mirror on sync is _very_ useful in connected use-cases. More below. 

> I haven't changed
> it.  I haven't tried messing with that setting (and it doesn't sound like
> changing it would fit with the workflow I currently have in place).  So if
> you went through and changed that, in theory you could do your incremental
> syncs through the web interface, but again, that assumes that everything
> always works correctly. 

The purpose of mirror_on_sync is to allow the downstream Satellite to reflect what its upstream repo has. This is useful in connected scenarios in the use-case (as an example) where Red Hat publishes a package incorrectly. It allows us to fix the repo in the CDN, and your Satellite will get the updated RPMs in the next sync. It is mutually exclusive with a disconnected use-case with incremental content imports. (in a incremental use case you want the base+incrementals, not just the last incremental)


> In reality, it doesn't.  

Can you cite examples? I'd love to further investigate. The import/export features were specifically designed so that you dont have to keep a persistent full mirror of the CDN in the disconnected environment. You _can_ (and you currently do), but you definitely shouldn't _HAVE_ to. 

 
> Already I've seen bugs in
> Satellite 6 that have caused me to have to stop and re-do syncs because the
> Satellite errors out or gets stuck.  I know they were Satellite bugs and not
> just random failures because I would see these behaviors on my disconnected
> Satellite right after seeing it happen on my connected one.
Comment 9 David Walser 2017-09-15 18:58:28 EDT
(In reply to Rich Jerrido from comment #8)
> Feel free to file a BZ or open a support case on this. But IMHO, you dont
> need to run createrepo{,_c} at all. 

Well, if I am going to keep an in-tact CDN mirror I do need to (which in fact is the case with suggestion you made in Comment 1), but yes if rely on hammer command to do incremental imports (or turn off mirror-on-sync) you don't.

> I am still trying to figure out how the process that you describe is somehow
> more fool proof than the process I described. 
> 
> Fundamentally, you need to 
> 
> - sync the RPMs from the CDN
> - export them to disk. 
> - migrate them to the disconnected environment. 
> - apply some post-processing. (in your scenario you recreate yum metadata;
> in mine we delete the old export and extract the new export in the same
> place)
> - sync the content in the disconnected Satellite. 
> 
> Both your process and the process I described are easily automated so that
> is doesn't require a senior admin to complete.

The "delete the old export and extract the new export in the same place" concept didn't come up until I had this discussion with you (it's certainly not in the documentation), so it hadn't occurred to me before now.  So I was thinking you'd be storing every incremental export in a different place and be changing the CDN URLs (or renaming directories to keep the URL the same).  Given all the trouble it is to move content from the connected to the disconnected network, it wouldn't have occurred to me that you'd want to delete, just in case something went wrong and you needed it again.  I'll grant you that with deleting it and keeping it in the same place, it at least gives a consistent procedure, but like I said, it still assumes that everything always goes perfectly.

The initial export is hundreds of gigabytes, so the best way to do that was to put it on a hard disk that was then moved to the disconnected network.  As you should know, the rule with this kind of stuff is you can move things that way but then you can't move them back.  So the incremental exports go on DVDs.  The RHEL7.4 update was way too big for that though (it would have been 18 Blu-ray or even more regular DVDs, that's crazy), so I sacrificed another hard disk.  Still, doing it that way didn't disrupt how I do things (with a normal sync on the Satellite).  After all that trouble to get the data moved over, I certainly don't want to delete it just in case something goes wrong.

With that idea, even I needed to start completely over, if I have an in-tact CDN mirror, I can do that.  If I have the initial export and all of the incremental ones separate, then you'd have to re-import everything in succession.  No fun.

> The _only_ place where
> incremental updates can be problematic is if you do not export with
> overlapping or adjacent date ranges. 

That'd be a problem with my process too, so no matter what, you have to get that part right when doing incremental updates.  Helpfully the Dashboard on the connected Satellite shows when you do exports, so that shouldn't be too hard to get right.

> Mirror on sync is _very_ useful in connected use-cases.

Of course, it's the only way it can work.

> > In reality, it doesn't.  
> 
> Can you cite examples? I'd love to further investigate.

I already cited one in what you snipped after that.  I gave another earlier in this reply.

> The import/export
> features were specifically designed so that you dont have to keep a
> persistent full mirror of the CDN in the disconnected environment. You _can_
> (and you currently do), but you definitely shouldn't _HAVE_ to. 

I appreciate that, but as I've tried to explain, it doesn't give me much peace of mind.  Doing it by keeping an in-tact CDN mirror should at least be an option.

Note You need to log in before you can comment on or make changes to this bug.