Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1391298 - Race condition among capsule sync tasks to destroy/create pulp repos
Summary: Race condition among capsule sync tasks to destroy/create pulp repos
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Capsule - Content
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: Unspecified
Assignee: Andrew Kofink
QA Contact: Peter Ondrejka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-03 00:25 UTC by Hao Chang Yu
Modified: 2020-12-14 07:50 UTC (History)
14 users (show)

Fixed In Version: tfm-rubygem-katello-3.0.0.138-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1463815 (view as bug list)
Environment:
Last Closed: 2017-08-10 17:02:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 18706 0 High Closed Race condition among capsule sync tasks to destroy/create pulp repos 2020-12-10 22:14:05 UTC
Red Hat Product Errata RHBA-2017:2466 0 normal SHIPPED_LIVE Satellite 6.2.11 Async Release 2017-08-10 21:01:20 UTC

Description Hao Chang Yu 2016-11-03 00:25:31 UTC
Description of problem:

Bulk capsule syncs failed when they were trying to remove the same pulp repos ('Actions::Katello::CapsuleContent::RemoveUnneededRepos').

Please see the below task information:

> 'Actions::Katello::Repository::CapsuleGenerateAndSync'
> there are 7 Actions::Katello::CapsuleContent::Sync sub tasks in it matches the number of capsules in pnt-sysops org
> each one of the capsulecontent::sync have one 'Actions::Katello::CapsuleContent::ConfigureCapsule'
> and a lot of 'Actions::Pulp::Repository::Refresh'
> under these 'Actions::Pulp::Repository::Refresh', some of them have skipped 'Actions::Pulp::Repository::RefreshDistributor'
> under 'Actions::Katello::CapsuleContent::ConfigureCapsule', there is one 'Actions::Katello::CapsuleContent::RemoveUnneededRepos', where lots and lots of 'Actions::Pulp::Repository::Destroy' is in listed
> 'Actions::Pulp::Repository::Destroy' <-- lots of these are skipped or failed

The tasks paused after a whole bunch of repo::destroy failed. It can not be resumed until the user literally pressed the skip link next to every single errorred repo:destroy. The parent task can be resumed only when all of those errorred ones are skipped. After resuming, user got the Repo::create errors.


More information:
http://paste-platops.itos.redhat.com/pzpwqg41u/yyulhy#line-8
http://paste-platops.itos.redhat.com/p6cfi7rzn/sqiixo#line-15

Comment 1 Hao Chang Yu 2016-11-03 00:26:12 UTC
While reading'katello/repository/capsule_generate_and_sync.rb' file, I notice that each 'CapsuleGenerateAndSync' task will sync all the repositories between Katello and Capsule Pulp every time. If the repositories are not match, Katello will either create the needed repos or destroy the unneeded repos. This seems
ok when we only have one sync task running for a Capsule but conflict could happen when there are multiple sync tasks for a Capsule (I am don't know how Foreman is handling multiple tasks so I could be wrong).

For example, if the following sync tasks are running at about the same time.

Sync task 1:
1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A)
2) Get a list of repos from Katello. Katello returns Repo1 (B)
3) Unneeded repos = A - B
4) Delete unneeded repos from Pulp by calling Pulp API
5) Pulp async tasks created and queueing (Pulp task 1)

Sync task 2
1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A)
2) Get a list of repos from Katello. Katello returns Repo1 (B)
3) Unneeded repos = A - B.
4) It should get the same "unneeded repos" as Sync task 1 because Pulp task 1 is still queueing
5) Pulp runs Pulp task 1. The unneeded repos are now deleted from Pulp.
6) Call Pulp API to deleted unneeded repos. The API will return 404 because the repos have been deleted

Comment 2 Hao Chang Yu 2016-11-03 00:26:40 UTC
I can still reproduce the 409 conflict error when bulk sync repos to Capsule.

The first sync task ran successfully but the later tasks will get the following errors because the new repos had already been created by the 1st task:

7: Actions::Pulp::Repository::Create (skipped) [ 6.98s / 0.25s ]
9: Actions::Pulp::Repository::Create (skipped) [ 6.10s / 1.27s ]
11: Actions::Pulp::Repository::Create (skipped) [ 5.10s / 0.21s ]
13: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 0.19s ]
15: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 1.19s ]
17: Actions::Pulp::Consumer::SyncCapsule (success) [ 51.74s / 9.72s ]
19: Actions::Katello::CapsuleContent::RemoveOrphans (success) [ 0.15s / 0.15s ]

Comment 6 Justin Sherrill 2016-12-14 15:55:50 UTC
Couple of issues here:

1) conflicts on create.  This is likely exacerbated by https://bugzilla.redhat.com/show_bug.cgi?id=1398438
2) conflicts on delete.  This is completely solved by https://bugzilla.redhat.com/show_bug.cgi?id=1375075

for 1) we still need to do some work to handle the 409 conflicts that can still occur even with BZ 1398438

Comment 8 Brad Buckingham 2017-02-27 18:54:53 UTC
Created redmine issue http://projects.theforeman.org/issues/18706 from this bug

Comment 9 Satellite Program 2017-03-14 22:15:02 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18706 has been resolved.

Comment 10 Peter Ondrejka 2017-07-25 13:47:12 UTC
Verified on satellite-6.2.11-2.0, successfully created and synced 50+ repositories to the Capsule, and removed afterwards.

Comment 12 errata-xmlrpc 2017-08-10 17:02:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2466


Note You need to log in before you can comment on or make changes to this bug.