Red Hat Bugzilla – Bug 1391298
Race condition among capsule sync tasks to destroy/create pulp repos
Last modified: 2017-08-10 13:02:29 EDT
Description of problem: Bulk capsule syncs failed when they were trying to remove the same pulp repos ('Actions::Katello::CapsuleContent::RemoveUnneededRepos'). Please see the below task information: > 'Actions::Katello::Repository::CapsuleGenerateAndSync' > there are 7 Actions::Katello::CapsuleContent::Sync sub tasks in it matches the number of capsules in pnt-sysops org > each one of the capsulecontent::sync have one 'Actions::Katello::CapsuleContent::ConfigureCapsule' > and a lot of 'Actions::Pulp::Repository::Refresh' > under these 'Actions::Pulp::Repository::Refresh', some of them have skipped 'Actions::Pulp::Repository::RefreshDistributor' > under 'Actions::Katello::CapsuleContent::ConfigureCapsule', there is one 'Actions::Katello::CapsuleContent::RemoveUnneededRepos', where lots and lots of 'Actions::Pulp::Repository::Destroy' is in listed > 'Actions::Pulp::Repository::Destroy' <-- lots of these are skipped or failed The tasks paused after a whole bunch of repo::destroy failed. It can not be resumed until the user literally pressed the skip link next to every single errorred repo:destroy. The parent task can be resumed only when all of those errorred ones are skipped. After resuming, user got the Repo::create errors. More information: http://paste-platops.itos.redhat.com/pzpwqg41u/yyulhy#line-8 http://paste-platops.itos.redhat.com/p6cfi7rzn/sqiixo#line-15
While reading'katello/repository/capsule_generate_and_sync.rb' file, I notice that each 'CapsuleGenerateAndSync' task will sync all the repositories between Katello and Capsule Pulp every time. If the repositories are not match, Katello will either create the needed repos or destroy the unneeded repos. This seems ok when we only have one sync task running for a Capsule but conflict could happen when there are multiple sync tasks for a Capsule (I am don't know how Foreman is handling multiple tasks so I could be wrong). For example, if the following sync tasks are running at about the same time. Sync task 1: 1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A) 2) Get a list of repos from Katello. Katello returns Repo1 (B) 3) Unneeded repos = A - B 4) Delete unneeded repos from Pulp by calling Pulp API 5) Pulp async tasks created and queueing (Pulp task 1) Sync task 2 1) Get a list of repos from Pulp. Pulp returns Repo1, Repo2 (A) 2) Get a list of repos from Katello. Katello returns Repo1 (B) 3) Unneeded repos = A - B. 4) It should get the same "unneeded repos" as Sync task 1 because Pulp task 1 is still queueing 5) Pulp runs Pulp task 1. The unneeded repos are now deleted from Pulp. 6) Call Pulp API to deleted unneeded repos. The API will return 404 because the repos have been deleted
I can still reproduce the 409 conflict error when bulk sync repos to Capsule. The first sync task ran successfully but the later tasks will get the following errors because the new repos had already been created by the 1st task: 7: Actions::Pulp::Repository::Create (skipped) [ 6.98s / 0.25s ] 9: Actions::Pulp::Repository::Create (skipped) [ 6.10s / 1.27s ] 11: Actions::Pulp::Repository::Create (skipped) [ 5.10s / 0.21s ] 13: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 0.19s ] 15: Actions::Pulp::Repository::Create (skipped) [ 4.21s / 1.19s ] 17: Actions::Pulp::Consumer::SyncCapsule (success) [ 51.74s / 9.72s ] 19: Actions::Katello::CapsuleContent::RemoveOrphans (success) [ 0.15s / 0.15s ]
Couple of issues here: 1) conflicts on create. This is likely exacerbated by https://bugzilla.redhat.com/show_bug.cgi?id=1398438 2) conflicts on delete. This is completely solved by https://bugzilla.redhat.com/show_bug.cgi?id=1375075 for 1) we still need to do some work to handle the 409 conflicts that can still occur even with BZ 1398438
Created redmine issue http://projects.theforeman.org/issues/18706 from this bug
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18706 has been resolved.
Verified on satellite-6.2.11-2.0, successfully created and synced 50+ repositories to the Capsule, and removed afterwards.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2466