Bug 1475056

Summary:	Manifest refresh fails with 'deadlock detected'
Product:	Red Hat Satellite	Reporter:	Paul Dudley <pdudley>
Component:	Candlepin	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED WORKSFORME	QA Contact:	Katello QA List <katello-qa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.2.10	CC:	andrew.schofield, asanders, awood, bcourt, bkearney, brcoca, cdonnell, christopher.vincent, cmarinea, crog, csnyder, daniele, gkonda, hartsjc, khowell, ktordeur, lzap, pdudley, pdwyer, pmoravec, rbeyel, rdixon, smutkule, sraut, sthirugn, vanhoof, wpinheir
Target Milestone:	Unspecified	Keywords:	PrioBumpPM, Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:	sat-prio-proposed
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1475886 1478091 (view as bug list)		Environment:
Last Closed:	2018-12-07 16:25:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1475886, 1481367
Bug Blocks:	1478091

Description Paul Dudley 2017-07-26 00:25:18 UTC

Description of problem:
Refreshing the manifest on Satellite 6.2.10 results in the following error:

2017-07-21 16:05:26 EDT ERROR:  deadlock detected
2017-07-21 16:05:26 EDT DETAIL:  Process 2848 waits for ShareLock on transaction 231777414; blocked by process 1382.
	Process 1382 waits for ShareLock on transaction 231777394; blocked by process 2848.
	Process 2848: insert into cp_pool_products (created, updated, pool_id, product_id, product_name, dtype, id) values ($1, $2, $3, $4, $5, 'provided', $6)
	Process 1382: insert into cp_entitlement (created, updated, consumer_id, dirty, endDateOverride, owner_id, pool_id, quantity, updatedOnStart, id) values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)

Version-Release number of selected component (if applicable):
satellite-6.2.10-4.0.el7sat.noarch
candlepin-0.9.54.21-1.el7.noarch
foreman-1.11.0.76-1.el7sat.noarch
katello-3.0.0-20.el7sat.noarch

How reproducible:
So far only on customer Satellite

Comment 11 Paul Dudley 2017-08-04 15:39:17 UTC

Hey Vritant,

That's right - in this case subsequent manifest refreshes continue to fail.

Comment 23 Chris Snyder 2017-08-25 16:46:31 UTC

The following workaround should enable those blocked by this issue to move forward with manifest import.


Approach:

Temporarily disable conflicting, external traffic to the satellite for the duration of manifest import.


Important notes before getting started:
This will result in all attempts to register/unregister or update entitlements from client systems to fail for the duration of manifest import.
After this workaround is completed, operations should return to normal.


Steps to accomplish the above:

1) Add the following to /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf (less the triple quotes):
"""
<Location /rhsm>
  PassengerEnabled off
</Location>
"""

2) Restart httpd: `systemctl restart httpd`

3) Navigate to the manifest import page (Content -> Red Hat Subscriptions -> Click "Manage Manifest" button)

4) Select the file to import using the dialog displayed after clicking "Browse".

5) Click "Upload"

6) Wait awhile (in my case this took somewhere between 30min and 1 hour depending on hardware, it may take longer depending on the size and contents of the manifest)

7) Remove the passage added to /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf in step 1.

8) Restart httpd: `systemctl restart httpd`


After completing the above, the manifest import should succeed.

The above worked for me on my reproducer of the deadlock issue. I believe it will work for others as well.

Comment 35 Pavel Moravec 2018-10-21 19:04:47 UTC

FYI I was able to reproduce this by simple:

- calling manifest refresh
- invoking many "get me consumer serials" requests in parallel. these requests are normally triggered by rhsmd activity on the client systems
- _no_ virt-who call in charge at all

Comment 36 Brad Buckingham 2018-10-30 15:32:54 UTC

Pavel,

Thanks for the update.  To confirm, was that reproducer on 6.4?

Comment 37 Pavel Moravec 2018-11-04 19:41:10 UTC

(In reply to Brad Buckingham from comment #36)
> Pavel,
> 
> Thanks for the update.  To confirm, was that reproducer on 6.4?

Trying so on 6.4, I got (in approx. 10th attempt) a deadlock but in candlepin, not postgres:

2018-11-04 17:32:30,830 [thread=http-bio-8443-exec-521] [req=7e4a9f4e-71d3-46b2-9a64-5f0ab4dd621d, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/627e59b8-3f03-4c2f-9f67-db5d8a13e207/certificates/serials
2018-11-04 17:32:30,833 [thread=http-bio-8443-exec-647] [req=a31ce16a-d317-4204-b484-b41f77fbec7d, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/5c6bbc4d-c9a9-436a-a84a-4a27a63330ca/certificates/serials
2018-11-04 17:33:40,482 [thread=C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-AdminTaskTimer] [=, org=, csid=] WARN  com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@e4cebf5 -- APPARENT DEADLOCK!!! Complete Status: 
	Managed Threads: 3
	Active Threads: 3
	Active Tasks: 
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@6b1485df
			on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#0
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@3f938880
			on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#1
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@2b608ded
			on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#2
	Pending Tasks: 
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7159f200
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@3eacbb89
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1d58f354
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@32f80af7
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@c8f375a
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5724a5b1
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@dbc3e52
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@7fe122a0
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@a21efe1
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@785aca39
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@14f825ce
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@62fae1d3
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@67a1548a
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@49045145
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1c87cbfe
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@29c655c4
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@44ae5172
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@2fcbaedb
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@db158e7
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@16c244e0
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@20d1a28b
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@529e6d3c
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@556f61e8
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@518ecae1
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7ce6ffa1
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5d309a8c
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@737ec0e3
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5480c5af
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1f503369
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@46ed2ef5
		com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@43918713
		com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@11ee9f73
		com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@fbb3d5c
Pool thread stack traces:
	Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#0,5,main]
		com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720)
	Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#1,5,main]
		com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720)
	Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#2,5,main]
		com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720)

(and tomcat/candlepin was stopped :-o )

postgres logs show nothing, candlepin's error and candlepin.log contained just that info, tomcat logs nothing.

note to myself: reproducer was on provisioning.usersys.redhat.com, fetching serials of all cp_consumers in a loop during the manifest refresh

Comment 42 Alex Wood 2018-12-07 16:25:14 UTC

Other developers on the team, support engineers, and I have all been unable to reproduce this issue.  I'm going to close this bug, but if someone encounters it again and has a reproducer case, please reopen it.

If you do have a reproducer, we will need a copy of the manifest you are importing as well as a dump of the current Candlepin database.  It would be inappropriate to attach these to a public bug, so you will either need to open a case with support (referencing this bug) and attach the necessary manifest and DB dump to the case or provide those files to a member of the development team out-of-band.