Bug 1391298

Summary: Race condition among capsule sync tasks to destroy/create pulp repos
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: Capsule - ContentAssignee: Andrew Kofink <akofink>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: medium Docs Contact:
Priority: high    
Version: 6.2.0CC: akofink, bbuckingham, erjohnso, ggatward, jalviso, jcallaha, jentrena, jsherril, mmccune, oshtaier, pdwyer, rballang, syangsao, zhunting
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.0.0.138-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1463815 (view as bug list) Environment:
Last Closed: 2017-08-10 17:02:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hao Chang Yu 2016-11-03 00:25:31 UTC
Description of problem:

Bulk capsule syncs failed when they were trying to remove the same pulp repos ('Actions::Katello::CapsuleContent::RemoveUnneededRepos').

Please see the below task information:

> 'Actions::Katello::Repository::CapsuleGenerateAndSync'
> there are 7 Actions::Katello::CapsuleContent::Sync sub tasks in it matches the number of capsules in pnt-sysops org
> each one of the capsulecontent::sync have one 'Actions::Katello::CapsuleContent::ConfigureCapsule'
> and a lot of 'Actions::Pulp::Repository::Refresh'
> under these 'Actions::Pulp::Repository::Refresh', some of them have skipped 'Actions::Pulp::Repository::RefreshDistributor'
> under 'Actions::Katello::CapsuleContent::ConfigureCapsule', there is one 'Actions::Katello::CapsuleContent::RemoveUnneededRepos', where lots and lots of 'Actions::Pulp::Repository::Destroy' is in listed
> 'Actions::Pulp::Repository::Destroy' <-- lots of these are skipped or failed

The tasks paused after a whole bunch of repo::destroy failed. It can not be resumed until the user literally pressed the skip link next to every single errorred repo:destroy. The parent task can be resumed only when all of those errorred ones are skipped. After resuming, user got the Repo::create errors.


More information:
http://paste-platops.itos.redhat.com/pzpwqg41u/yyulhy#line-8
http://paste-platops.itos.redhat.com/p6cfi7rzn/sqiixo#line-15

Comment 1 Hao Chang Yu 2016-11-03 00:26:12 UTC
While reading'katello/repository/capsule_generate_and_sync.rb' file, I notice that each 'CapsuleGenerateAndSync' task will sync all the repositories between Katello and Capsule Pulp every time. If the repositories are not match, Katello will either create the needed repos or destroy the unneeded repos. This seems
ok when we only have one sync task running for a Capsule but conflict could happen when there are multiple sync tasks for a Capsule (I am don't know how Foreman is handling multiple tasks so I could be wrong).

For example, if the following sync tasks are running at about the same time.

Sync task 1:
1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A)
2) Get a list of repos from Katello. Katello returns Repo1 (B)
3) Unneeded repos = A - B
4) Delete unneeded repos from Pulp by calling Pulp API
5) Pulp async tasks created and queueing (Pulp task 1)

Sync task 2
1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A)
2) Get a list of repos from Katello. Katello returns Repo1 (B)
3) Unneeded repos = A - B.
4) It should get the same "unneeded repos" as Sync task 1 because Pulp task 1 is still queueing
5) Pulp runs Pulp task 1. The unneeded repos are now deleted from Pulp.
6) Call Pulp API to deleted unneeded repos. The API will return 404 because the repos have been deleted

Comment 2 Hao Chang Yu 2016-11-03 00:26:40 UTC
I can still reproduce the 409 conflict error when bulk sync repos to Capsule.

The first sync task ran successfully but the later tasks will get the following errors because the new repos had already been created by the 1st task:

7: Actions::Pulp::Repository::Create (skipped) [ 6.98s / 0.25s ]
9: Actions::Pulp::Repository::Create (skipped) [ 6.10s / 1.27s ]
11: Actions::Pulp::Repository::Create (skipped) [ 5.10s / 0.21s ]
13: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 0.19s ]
15: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 1.19s ]
17: Actions::Pulp::Consumer::SyncCapsule (success) [ 51.74s / 9.72s ]
19: Actions::Katello::CapsuleContent::RemoveOrphans (success) [ 0.15s / 0.15s ]

Comment 6 Justin Sherrill 2016-12-14 15:55:50 UTC
Couple of issues here:

1) conflicts on create.  This is likely exacerbated by https://bugzilla.redhat.com/show_bug.cgi?id=1398438
2) conflicts on delete.  This is completely solved by https://bugzilla.redhat.com/show_bug.cgi?id=1375075

for 1) we still need to do some work to handle the 409 conflicts that can still occur even with BZ 1398438

Comment 8 Brad Buckingham 2017-02-27 18:54:53 UTC
Created redmine issue http://projects.theforeman.org/issues/18706 from this bug

Comment 9 Satellite Program 2017-03-14 22:15:02 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18706 has been resolved.

Comment 10 Peter Ondrejka 2017-07-25 13:47:12 UTC
Verified on satellite-6.2.11-2.0, successfully created and synced 50+ repositories to the Capsule, and removed afterwards.

Comment 12 errata-xmlrpc 2017-08-10 17:02:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2466