Bug 1244704

Summary: Nightly repo syncs often result in duplicate key error
Product: Red Hat Satellite Reporter: Ivan Necas <inecas>
Component: RepositoriesAssignee: Justin Sherrill <jsherril>
Status: CLOSED ERRATA QA Contact: Roman Plevka <rplevka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.4CC: bbuckingham, bkearney, cwelton, ehelms, erik-fedora, jsherril, mmccune, oshtaier, pcfe, pmoravec, rplevka
Target Milestone: UnspecifiedKeywords: Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
URL: http://projects.theforeman.org/issues/11028
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-27 11:37:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan Necas 2015-07-20 10:29:22 UTC
I have a number of repos set to sync at midnight each night.  Each night, at least one repo sync seems to fail with an error similar to this :

PGError: ERROR:  duplicate key value violates unique constraint "katello_system_errata_eid_sid"
: INSERT INTO katello_system_errata (erratum_id, system_id) VALUES (2647, 5)

I can correct this by just resuming the task and it finishes with no additional errors.

Comment 1 Ivan Necas 2015-07-20 10:29:23 UTC
Created from redmine issue http://projects.theforeman.org/issues/11028

Comment 2 Ivan Necas 2015-07-20 10:31:19 UTC
As I wrote in the upstream issue:

I was able to reproduce the issue: seems like it happens when multiple repositories are being synchronized at the same time, so that the errata applicability import is happening for a given host multiple times at once.

After resuming the task the operation seemed to proceed successfully (no even need for skip the failing step). One possible fix would be either to:

1. make the code https://github.com/Katello/katello/blob/master/app/models/katello/glue/pulp/consumer.rb#L94 retry several times before failling, to cound on the fact that some other repo might be recalculating the applicability
2. limit the errata touched by this operation withing the Repo sync only to those affected by the repository that is just being synchronized

I belive the right answer would be the combination of these two.

Comment 3 Ivan Necas 2015-07-20 10:33:13 UTC
@justin: what are your thoughts on this?

Comment 5 Justin Sherrill 2015-07-20 14:43:56 UTC
Ahhh similar to http://projects.theforeman.org/issues/8586 but related to applicability. 


2) may be a bit complicated, as an errata could potentially be shared among two repos that the system is bound to.  Although since a sync should simply be an 'additive case' (i.e. errata/package should not be removed), we could probably limit it to a "insert errata from this repo into the profile unless it already exists' and not clear any of the existing entries.  

The issue could still pop up with 2) if for example the system uploaded its new package profile during a repo sync's "applicability generation".  Then the system would be attempting to recalculate its full profile anyways.  So +1 to doing both I guess

Comment 6 Mike McCune 2015-07-27 19:06:36 UTC
https://github.com/Katello/katello/pull/5368

Comment 8 Tazim Kolhar 2015-07-30 11:09:18 UTC
Hi,

   please provide verification steps
   thanks

Thanks and Regards,
Tazim

Comment 9 Justin Sherrill 2015-07-30 13:11:38 UTC
There isn't an easy way to reproduce as its a race condition.

What you can try is:

* register a few systems, 
* assign them to a subscription
* run yum repolist to make sure they are getting content
* Sync a few repositories

I'm fairly certain this wouldn't reproduce normally, as you likely need a large number of systems.  But without deploying 1000s of systems I'm not sure you could reproduce without getting really lucky.

Comment 10 Corey Welton 2015-08-03 15:57:28 UTC
I have been watching the production logs for servers we're using for client testing, with many repos populated/syncing and clients registered.  I have not been able to track down any PGErrors as noted in the report, so I think I will mark this one verified in Snap 15.  We can reopen if it shows up again, or there is a less-intensive method by which to reproduce the race condition.

Comment 11 Bryan Kearney 2015-08-12 16:03:52 UTC
This bug was fixed in Satellite 6.1.1 which was delivered on 12 August, 2015.

Comment 15 Bryan Kearney 2016-01-28 13:03:20 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/11028 has been closed
-------------
Ivan Necas
I was able to reproduce the issue: seems like it happens when multiple repositories are being synchronized at the same time, so that the errata applicability import is happening for a given host multiple times at once.

After resuming the task the operation seemed to proceed successfully (no even need for skip the failing step). One possible fix would be either to:

1. make the code https://github.com/Katello/katello/blob/master/app/models/katello/glue/pulp/consumer.rb#L94 retry several times before failling, to cound on the fact that some other repo might be recalculating the applicability
2. limit the errata touched by this operation withing the Repo sync only to those affected by the repository that is just being synchronized

I belive the right answer would be the combination of these two.
-------------
Justin Sherrill
Applied in changeset commit:katello|6b2570c9ad25f74dcafcb7e17d40207a332be827.

Comment 16 Bryan Kearney 2016-01-28 13:35:20 UTC
Turing off the sync script.

Comment 17 Bryan Kearney 2016-02-05 19:05:34 UTC
Upstream bug component is WebUI

Comment 18 Roman Plevka 2016-04-19 12:40:07 UTC
VERIFIED

Looking at the automation results, the error has been gone since sat 6.2.0 snap 6.2.

here are the 'duplicate key' message occurrences per 6.2.0 snap automation:

Sat6.2.0-Beta-Snap8.2
0
Sat6.2.0-Beta-Snap8.1
0
Sat6.2.0-Beta-Snap8.0
0
Sat6.2.0-Beta-Snap7.1
0
Sat6.2.0-Beta-Snap7.0
0
Sat6.2.0-Beta-Snap6.2
0
Sat6.2.0-Beta-Snap6.1
5
Sat6.2.0-Beta-Snap6.0
0
Sat6.2.0-Beta-Snap5.1
5
Sat6.2.0-Beta-Snap5
4
Sat6.2.0-Beta-SNAP4
52
Sat6.2.0-Beta-SNAP3.1
46
Sat6.2.0-Beta-SNAP3.0
58
Sat6.2.0-Beta-SNAP3.0
51
Sat6.2.0-Beta-SNAP2.1
49
Sat6.2.0-Beta-SNAP2
0
Sat6.2.0-Beta-SNAP1
0

Comment 19 Bryan Kearney 2016-07-27 11:37:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1501