Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
If a content sync is underway, and any one capsule is down or becomes unavailable during this stage, the synchronisation to ALL capsules fails.
The problem seems to be that the sync/publish/promote actions on Satellite perform the satellite component of the task first, and then in the last 5-10% of the job plan and start the capsule sync jobs. At this point if one capsule does not respond the entire task fails so none of the capsules begin syncing.
At the time of the task failure, the Satellite components have completed, so from the Satellite view there is nothing to do. Package search shows new packages in the content view but they are not available on the capsules.
Version-Release number of selected component (if applicable):
Satellite 6.2.x with multiple capsules.
How reproducible:
Any time a capsule is unavailable and a sync/publish/promote is triggered. This has happened a few times as network connectivity to a remote capsule has been lost whilst a sync is in progress.
Also, if a capsule is down due to a known condition (e.g. remote site power outage) a sync on the remaining capsules cannot be started due to the one being down. The only way around this is to remove the capsule from the Satellite configuration, sync the rest, add the 'failed' capsule again and manually re-sync it when it has re-connected to Satellite. This is highly undesirable is a large enterprise environment.
Steps to Reproduce:
Environment is Satellite 6.2.x with at least 2 capsules, set them up to receive Library CV.
- Start a repo sync and whilst the sync is in progress disconnect one capsule from the network.
- Monitor the Satellite tasks - when the sync task reaches 95% (roughly) there should be new capsule sync tasks spawned in planning state. At this point the main sync task will fail due to being unable to plan capsule sync on the failed capsule
- Note that capsule sync to the GOOD capsule is also not performed
Can't relatively easy fix be:
in /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.124/app/lib/actions/katello/content_view/promote_to_environment.rb :
replace:
def run
environment = ::Katello::KTEnvironment.find(input[:environment_id])
if ::Katello::CapsuleContent.sync_needed?(environment)
ForemanTasks.async_task(ContentView::CapsuleGenerateAndSync,
::Katello::ContentView.find(input[:content_view_id]),
environment)
end
rescue ::Katello::Errors::CapsuleCannotBeReached # skip any capsules that cannot be connected to
end
by something like:
def run
environment = ::Katello::KTEnvironment.find(input[:environment_id])
content_view = ::Katello::ContentView.find(input[:content_view_id])
::Katello::CapsuleContent.with_environment(environment).each do |capsule_content|
ForemanTasks.async_task(Actions::Katello::CapsuleContent::Sync, capsule_content, :content_view => content_view, :environment => environment)
end
end
? I.e. invoke the underlying task independently for each Capsule.
But comparing old behaviour and new one, I lack dynflow steps
Actions::Pulp::Repository::UpdateImporter
Actions::Pulp::Repository::RefreshDistributor (twice)
that were present in original Actions::Katello::ContentView::CapsuleGenerateAndSync but miss in the new one (why?)
Hello Ché,
I apologize, I did not intend to clear needinfo without responding. The fix for this one is currently targeted for Satellite 6.2.12; however, that is pending QE verification.
Comment 11Satellite Program
2017-08-03 22:00:51 UTC
Verified in satellite-6.2.12-1.0.el7sat.noarch with three capsules. If one of the capsules becomes unavailable during content sync, the sync on other two is not affected
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2017:2803