Bug 2043710

Summary: syncing tens of repos to capsule can cause deadlock: while updating tuple (...) in relation "core_content"
Product: Red Hat Satellite Reporter: Brad Buckingham <bbuckingham>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.10.0CC: dkliban, ggainey, hakon.gislason, hyu, jangerrit.kootstra, jhutar, jjansky, juwatts, ldelouw, ltran, momran, osousa, peter.vreman, pmendezh, pmoravec, sadas, saydas, zhunting
Target Milestone: 6.10.3Keywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-pulpcore-3.14.12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2021406 Environment:
Last Closed: 2022-03-08 21:26:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Lai 2022-02-22 20:26:01 UTC
@ggainey @jhutar 

Should I use the steps in the description in c#0 or is there another way to test this?  Or can I just reassign to @jhutar to verify since he's more familiar with the deadlock case?

Please let me know!

Thanks!

Comment 5 Grant Gainey 2022-02-23 19:59:55 UTC
I don't have any reliable way to reproduce this deadlock other than the "run pathological tests from raw python inside pulpcore-manager", alas (which is part of why this took so long to get a fix for). Jan's test setup may be the best way to make sure it works for the actual Satellite workflow.

Comment 6 Grant Gainey 2022-02-23 20:02:37 UTC
This is going to be particularly hard to achieve on minimallmemory system setups, since the failure will only happen when you're syncing multiple (as in, >5) repositories simultaneously, that have overlapping content. So you need like 20 pulp workers syncing 20 repos that have overlapping content, and a significant amount of it, at the same time. Minimal-systems will run out of hardware before you're likely to see the failure.

Comment 7 Lai 2022-02-23 22:10:17 UTC
@jhutar can I pass this off to you to test since you're more familiar with it?

Comment 8 Jan Hutaƙ 2022-02-24 07:23:44 UTC
Ack

Comment 12 Lai 2022-03-02 21:54:07 UTC
This was difficult to reproduce as a deadlock case for verification.  Instead, I have verified that the code is in 6.10.3 snap 2.

I also performed a capsule sync test with large repos (rhel7, 8 appstream, baseos) to ensure that capsule is synced successfully and content is accessible in capsule.

Verified on 6.10.3 snap 3 with python3-pulpcore-3.14.12-1.el7pc.noarch.

Comment 16 errata-xmlrpc 2022-03-08 21:26:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Satellite 6.10.3 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0790