Bug 2043710 - syncing tens of repos to capsule can cause deadlock: while updating tuple (...) in relation "core_content"
Summary: syncing tens of repos to capsule can cause deadlock: while updating tuple (.....
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: 6.10.3
Assignee: satellite6-bugs
QA Contact: Lai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-21 19:59 UTC by Brad Buckingham
Modified: 2022-08-04 05:25 UTC (History)
18 users (show)

Fixed In Version: python-pulpcore-3.14.12
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2021406
Environment:
Last Closed: 2022-03-08 21:26:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github pulp pulpcore issues 2157 0 None closed touch() path still has a potential deadlock window. 2022-02-15 13:39:19 UTC
Red Hat Knowledge Base (Solution) 6718181 0 None None None 2022-02-09 08:46:58 UTC
Red Hat Product Errata RHSA-2022:0790 0 None None None 2022-03-08 21:26:23 UTC

Comment 4 Lai 2022-02-22 20:26:01 UTC
@ggainey @jhutar 

Should I use the steps in the description in c#0 or is there another way to test this?  Or can I just reassign to @jhutar to verify since he's more familiar with the deadlock case?

Please let me know!

Thanks!

Comment 5 Grant Gainey 2022-02-23 19:59:55 UTC
I don't have any reliable way to reproduce this deadlock other than the "run pathological tests from raw python inside pulpcore-manager", alas (which is part of why this took so long to get a fix for). Jan's test setup may be the best way to make sure it works for the actual Satellite workflow.

Comment 6 Grant Gainey 2022-02-23 20:02:37 UTC
This is going to be particularly hard to achieve on minimallmemory system setups, since the failure will only happen when you're syncing multiple (as in, >5) repositories simultaneously, that have overlapping content. So you need like 20 pulp workers syncing 20 repos that have overlapping content, and a significant amount of it, at the same time. Minimal-systems will run out of hardware before you're likely to see the failure.

Comment 7 Lai 2022-02-23 22:10:17 UTC
@jhutar can I pass this off to you to test since you're more familiar with it?

Comment 8 Jan Hutař 2022-02-24 07:23:44 UTC
Ack

Comment 12 Lai 2022-03-02 21:54:07 UTC
This was difficult to reproduce as a deadlock case for verification.  Instead, I have verified that the code is in 6.10.3 snap 2.

I also performed a capsule sync test with large repos (rhel7, 8 appstream, baseos) to ensure that capsule is synced successfully and content is accessible in capsule.

Verified on 6.10.3 snap 3 with python3-pulpcore-3.14.12-1.el7pc.noarch.

Comment 16 errata-xmlrpc 2022-03-08 21:26:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Satellite 6.10.3 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0790


Note You need to log in before you can comment on or make changes to this bug.