Bug 1421594 - Possible race condition when sync multiple puppet repos at the same time
Summary: Possible race condition when sync multiple puppet repos at the same time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.2.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Lukas Pramuk
URL:
Whiteboard:
Depends On:
Blocks: 1479962
TreeView+ depends on / blocked
 
Reported: 2017-02-13 08:42 UTC by Hao Chang Yu
Modified: 2022-07-09 09:21 UTC (History)
28 users (show)

Fixed In Version: rubygem-katello-3.0.0.162-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1530705 (view as bug list)
Environment:
Last Closed: 2018-02-05 13:54:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 18920 0 Normal Closed Possible race condition when sync multiple puppet repos at the same time 2021-02-18 09:00:16 UTC
Foreman Issue Tracker 20540 0 Normal Closed Duplicate Unit Names in Smart-Proxy Sync after 3.4.4 upgrade 2021-02-18 09:00:16 UTC
Pulp Redmine 2606 0 High CLOSED - CURRENTRELEASE Fix the ability to forge importer to mirror upstream repository on sync 2017-04-12 14:33:49 UTC
Red Hat Product Errata RHSA-2018:0273 0 normal SHIPPED_LIVE Important: Red Hat Satellite 6 security, bug fix, and enhancement update 2018-02-08 00:35:29 UTC

Description Hao Chang Yu 2017-02-13 08:42:29 UTC
Description of problem:

Puppet repo sync has the following steps:
1) Unassociate all puppet modules from the repo
2) Sync the Repo
3) Remove all Orphans

Lets say we are syncing 'Repo 1' and 'Repo 2' at the same time. 'Module A' is only associated to 'Repo 1'

Unassociate 'Module A' from 'Repo 1' will make it an orphan, so the 'Repo 1' sync task will assume 'Module A' is still in Pulp and will not try to re-download it again. At the same time, 'Repo 2' sync task reaches the 'Remove all orphans step' and remove 'Module A' from Pulp. This will cause the 'Repo 1' sync to fail with the following error:

IOError: [Errno 2] No such file or directory: u'/var/lib/pulp/content/units/puppet_module/1f/37902b7b8dd30cd2f0a2a99914f703c1b1459bc905504b8c23225cee4b728b/mweetman-hosts-0.1.0.tar.gz'


How reproducible:
This issue is hard to reproduce, because this issue will only be triggered in a very specific timing.

Comment 3 Michael Hrivnak 2017-02-14 19:14:53 UTC
Do you have the full traceback available?

Comment 6 Michael Hrivnak 2017-02-15 21:13:15 UTC
Thank you for that. Was there any other exception in the system log from that about time? I'm hoping to find a python traceback for an exception besides the "PulpCodedException".

Comment 10 Tanya Tereshchenko 2017-02-23 14:08:22 UTC
The potential temporary workaround is to get the missed files in place. 

One way is described in comment 7 - copy files from Satellite:
"Restoring the missing file by copying it from the Satellite temporarily resolves the problem, but the problem comes back again and again, on a random puppet module file each time."


The other way may be is to remove repository/clean orphans/recreate/sync, in that order - orphan clean up should happen after repository removal.
It will help in case missing units were only in this repo, if they are in multiple repositories all of them needs to be recreated in the described way.

# remove repo
pulp-admin --username admin --password <password> puppet repo remove --repo-id <repo_id>

# check if there are puppet_module orphans
pulp-admin --username admin --password <password> orphan list

# remove the orphaned content
pulp-admin --username admin --password <password> orphan remove --type puppet_module

# create repository
pulp-admin --username admin --password <password> puppet repo create --repo-id <repo_id> --feed <feed from which you'd like to sync>

# sync repo
pulp-admin --username admin --password <password> puppet repo sync run --repo-id <repo_id> 

If the issue is indeed in race condition like described in BZ then it is better to sync one repository at a time.

If there were no orphans or you know that the same affected puppet module is present in multiple repositories, there is a way to find out in which ones but it only directly in db, in mongo shell. Let me know if this approach will work for you, I will provide some commands.

Comment 12 pulp-infra@redhat.com 2017-02-23 22:31:40 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 13 pulp-infra@redhat.com 2017-02-23 22:31:45 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 15 Tanya Tereshchenko 2017-02-24 12:43:18 UTC
If the same module is missing again then it is likely that it was in multiple repositories. In this case the easiest way is to copy files from the main Satellite, like mentioned in comment 7.

Comment 16 pulp-infra@redhat.com 2017-02-28 22:32:01 UTC
The Pulp upstream bug priority is at High. Updating the external tracker on this bug.

Comment 17 pulp-infra@redhat.com 2017-02-28 23:01:48 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 18 Dennis Kliban 2017-03-14 02:25:12 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 19 pulp-infra@redhat.com 2017-03-14 19:44:56 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 20 Tanya Tereshchenko 2017-03-15 16:40:15 UTC
The Pulp upstream issue fixed the remove_missing option for the puppet importer.
This fix will allow Katello not to use the following steps:

> Puppet repo sync has the following steps:
> 1) Unassociate all puppet modules from the repo
> 2) Sync the Repo
> 3) Remove all Orphans

But just sync with the remove_missing option enabled.

This will eliminate the race condition because there should be no remove_orphan task in this sync workflow.

Comment 21 Justin Sherrill 2017-03-15 16:47:17 UTC
Created redmine issue http://projects.theforeman.org/issues/18920 from this bug

Comment 22 Justin Sherrill 2017-03-15 16:59:23 UTC
Cloning to katello for the katello change

Comment 23 Michael Hrivnak 2017-03-15 19:05:24 UTC
I'm changing the component since all Pulp work is done. Feel free to change the component if I didn't pick the right one.

Comment 24 pulp-infra@redhat.com 2017-04-05 19:03:37 UTC
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 25 pulp-infra@redhat.com 2017-04-12 14:33:50 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 31 Satellite Program 2017-07-06 14:10:26 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18920 has been resolved.

Comment 33 Satellite Program 2017-08-03 22:12:29 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18920 has been resolved.

Comment 41 Andrew Kofink 2018-01-26 16:05:55 UTC
I believe I reproduced this (iirc) by putting a break point in between steps 1 and 2. This helped greatly with the timing and allowed testing with small repos. Unfortunately, this is probably not an option in production.

> Puppet repo sync has the following steps:
> 1) Unassociate all puppet modules from the repo
> 2) Sync the Repo
> 3) Remove all Orphans

For the Katello-side verification, you can ensure the remove_missing option is passed to Pulp in the Sync action in foreman-tasks or Dynflow.

Comment 42 Lukas Pramuk 2018-01-29 12:00:33 UTC
FailedQA.

@tfm-rubygem-katello-3.0.0.161-1.el7sat.noarch

I tried to at least verify as a sanity only @6.2.14 So I wanted to sync Library into external capsule...But 

During task planning phase Actions::Katello::CapsuleContent::Sync failed with undefined method `content_type' for nil:NilClass (NoMethodError)



 Actions::Katello::CapsuleContent::Sync

Input:

---
smart_proxy:
  id: 2
  name: cap.example.com
services_checked:
- pulp
- pulp_auth

undefined method `content_type' for nil:NilClass (NoMethodError)
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:65:in `block (3 levels) in sync_repos_to_capsule'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `switch_flow'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/action.rb:369:in `sequence'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:54:in `block (2 levels) in sync_repos_to_capsule'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:53:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:53:in `block in sync_repos_to_capsule'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `switch_flow'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/action.rb:364:in `concurrence'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:52:in `sync_repos_to_capsule'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:46:in `block in plan'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/execution_plan.rb:316:in `switch_flow'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/action.rb:369:in `sequence'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.161/app/lib/actions/katello/capsule_content/sync.rb:41:in `plan'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.6/lib/dynflow/action.rb:461:in `block (3 levels) in execute_plan'

Comment 43 pulp-infra@redhat.com 2018-01-29 12:03:18 UTC
Requesting needsinfo from upstream developer ttereshc because the 'FailedQA' flag is set.

Comment 45 Andrew Kofink 2018-01-29 16:30:06 UTC
The FailedQA regression is fixed in http://projects.theforeman.org/issues/20540

Comment 46 Andrew Kofink 2018-01-29 16:49:26 UTC
Setting to POST as upstream fix is merged.

Comment 47 Lukas Pramuk 2018-01-31 12:31:32 UTC
VERIFIED.

@satellite-6.2.14-4.0.el7sat.noarch
tfm-rubygem-katello-3.0.0.162-1.el7sat.noarch

used steps described in comment#40 and focused on overall sanity 


@UI > Monitor > Tasks
32: Actions::Pulp::Consumer::SyncCapsule (success) [ 34.65s / 2.07s ]

Started at: 2018-01-30 15:33:45 UTC

Ended at: 2018-01-30 15:34:20 UTC

Real time: 34.65s

Execution time (excluding suspended state): 2.07s

Input: 
capsule_id: 2
repo_pulp_id: Default_Organization-Capsule-Capsule_6_2_RHEL7
sync_options:
  remove_missing: true
  force_full: true
remote_user: admin
remote_cp_user: admin

>>> Actions::Pulp::Consumer::SyncCapsule suceeded for both yum and puppet repo while remove_missing sync option was enabled 
>>> Actions::Pulp::Consumer::UnassociateUnits step is no longer required to avoid race condition


@SAT 
# foreman-rake katello:delete_orphaned_content RAILS_ENV=production
Orphaned content deletion started in background.

@UI > Monitor > Tasks
Remove orphans {"services_checked"=>["pulp", "pulp_auth"], "capsule_id"=>2, ... 	stopped 	success 	2018-01-31 13:19:44 +0100 	2018-01-31 13:19:44 +0100 	foreman_admin
Remove orphans {"services_checked"=>["pulp", "pulp_auth"], "capsule_id"=>1, ... 	stopped 	success 	2018-01-31 13:19:43 +0100 	2018-01-31 13:19:44 +0100 	foreman_admin

>>> weekly cleanup cron job now renders separate cleanup tasks for all capsules and these tasks are run successfully

Comment 50 errata-xmlrpc 2018-02-05 13:54:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0273


Note You need to log in before you can comment on or make changes to this bug.