1448777 – Content sync/promotion fails to all capsules if one capsule is down

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1448777 - Content sync/promotion fails to all capsules if one capsule is down

Summary: Content sync/promotion fails to all capsules if one capsule is down

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Capsule - Content
Sub Component:
Version:	6.2.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	Unspecified
Assignee:	Brad Buckingham
QA Contact:	Peter Ondrejka
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1449862 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-08 04:38 UTC by cpatters
Modified:	2020-12-14 08:37 UTC (History)
CC List:	15 users (show)
Fixed In Version:	tfm-rubygem-katello-3.4.4, tfm-rubygem-katello-3.0.0.151-1, rubygem-katello-3.0.0.151-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1480353 (view as bug list)
Environment:
Last Closed:	2017-09-25 18:59:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	19659	0	Normal	Closed	Content sync/promotion fails to all capsules if one capsule is down	2020-10-02 11:10:37 UTC
Red Hat Product Errata	RHBA-2017:2803	0	normal	SHIPPED_LIVE	Satellite 6.2.12 bug fix update	2017-10-12 19:22:49 UTC

Description cpatters 2017-05-08 04:38:00 UTC

Description of problem:
If a content sync is underway, and any one capsule is down or becomes unavailable during this stage, the synchronisation to ALL capsules fails.

The problem seems to be that the sync/publish/promote actions on Satellite perform the satellite component of the task first, and then in the last 5-10% of the job plan and start the capsule sync jobs. At this point if one capsule does not respond the entire task fails so none of the capsules begin syncing.

At the time of the task failure, the Satellite components have completed, so from the Satellite view there is nothing to do. Package search shows new packages in the content view but they are not available on the capsules.

Version-Release number of selected component (if applicable):

Satellite 6.2.x with multiple capsules.

How reproducible:

Any time a capsule is unavailable and a sync/publish/promote is triggered. This has happened a few times as network connectivity to a remote capsule has been lost whilst a sync is in progress.

Also, if a capsule is down due to a known condition (e.g. remote site power outage) a sync on the remaining capsules cannot be started due to the one being down. The only way around this is to remove the capsule from the Satellite configuration, sync the rest, add the 'failed' capsule again and manually re-sync it when it has re-connected to Satellite. This is highly undesirable is a large enterprise environment.

Steps to Reproduce:
Environment is Satellite 6.2.x with at least 2 capsules, set them up to receive Library CV.
- Start a repo sync and whilst the sync is in progress disconnect one capsule from the network.
- Monitor the Satellite tasks - when the sync task reaches 95% (roughly) there should be new capsule sync tasks spawned in planning state. At this point the main sync task will fail due to being unable to plan capsule sync on the failed capsule
- Note that capsule sync to the GOOD capsule is also not performed

Comment 2 Brad Buckingham 2017-05-24 20:37:12 UTC

*** Bug 1449862 has been marked as a duplicate of this bug. ***

Comment 3 Brad Buckingham 2017-05-24 20:38:35 UTC

Created redmine issue http://projects.theforeman.org/issues/19659 from this bug

Comment 4 Pavel Moravec 2017-06-19 07:38:43 UTC

Can't relatively easy fix be:

in /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.124/app/lib/actions/katello/content_view/promote_to_environment.rb :

replace:

        def run
          environment = ::Katello::KTEnvironment.find(input[:environment_id])
          if ::Katello::CapsuleContent.sync_needed?(environment)
            ForemanTasks.async_task(ContentView::CapsuleGenerateAndSync,
                                    ::Katello::ContentView.find(input[:content_view_id]),
                                    environment)
          end
        rescue ::Katello::Errors::CapsuleCannotBeReached # skip any capsules that cannot be connected to
        end

by something like:


        def run
          environment = ::Katello::KTEnvironment.find(input[:environment_id])
          content_view = ::Katello::ContentView.find(input[:content_view_id])
          ::Katello::CapsuleContent.with_environment(environment).each do |capsule_content|
            ForemanTasks.async_task(Actions::Katello::CapsuleContent::Sync, capsule_content, :content_view => content_view, :environment => environment)
          end
        end

? I.e. invoke the underlying task independently for each Capsule.


But comparing old behaviour and new one, I lack dynflow steps

Actions::Pulp::Repository::UpdateImporter
Actions::Pulp::Repository::RefreshDistributor (twice)

that were present in original Actions::Katello::ContentView::CapsuleGenerateAndSync but miss in the new one (why?)

Comment 8 cpatters 2017-07-13 02:13:59 UTC

Is it possible to get an update on this BUG please, just want to keep the Customer informed on its progress.

Thanks

Ché

Comment 9 Satellite Program 2017-07-18 04:00:51 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/19659 has been resolved.

Comment 10 Brad Buckingham 2017-07-31 16:44:08 UTC

Hello Ché,

I apologize, I did not intend to clear needinfo without responding. The fix for this one is currently targeted for Satellite 6.2.12; however, that is pending QE verification.

Comment 11 Satellite Program 2017-08-03 22:00:51 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/19659 has been resolved.

Comment 14 Peter Ondrejka 2017-08-29 16:06:25 UTC

Verified in satellite-6.2.12-1.0.el7sat.noarch with three capsules. If one of the capsules becomes unavailable during content sync, the sync on other two is not affected

Comment 17 errata-xmlrpc 2017-09-25 18:59:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2803

Note You need to log in before you can comment on or make changes to this bug.