Bug 1475056
Summary: | Manifest refresh fails with 'deadlock detected' | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Paul Dudley <pdudley> | |
Component: | Candlepin | Assignee: | satellite6-bugs <satellite6-bugs> | |
Status: | CLOSED WORKSFORME | QA Contact: | Katello QA List <katello-qa-list> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.2.10 | CC: | andrew.schofield, asanders, awood, bcourt, bkearney, brcoca, cdonnell, christopher.vincent, cmarinea, crog, csnyder, daniele, gkonda, hartsjc, khowell, ktordeur, lzap, pdudley, pdwyer, pmoravec, rbeyel, rdixon, smutkule, sraut, sthirugn, vanhoof, wpinheir | |
Target Milestone: | Unspecified | Keywords: | PrioBumpPM, Triaged | |
Target Release: | Unused | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | sat-prio-proposed | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1475886 1478091 (view as bug list) | Environment: | ||
Last Closed: | 2018-12-07 16:25:14 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1475886, 1481367 | |||
Bug Blocks: | 1478091 |
Description
Paul Dudley
2017-07-26 00:25:18 UTC
Hey Vritant, That's right - in this case subsequent manifest refreshes continue to fail. The following workaround should enable those blocked by this issue to move forward with manifest import. Approach: Temporarily disable conflicting, external traffic to the satellite for the duration of manifest import. Important notes before getting started: This will result in all attempts to register/unregister or update entitlements from client systems to fail for the duration of manifest import. After this workaround is completed, operations should return to normal. Steps to accomplish the above: 1) Add the following to /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf (less the triple quotes): """ <Location /rhsm> PassengerEnabled off </Location> """ 2) Restart httpd: `systemctl restart httpd` 3) Navigate to the manifest import page (Content -> Red Hat Subscriptions -> Click "Manage Manifest" button) 4) Select the file to import using the dialog displayed after clicking "Browse". 5) Click "Upload" 6) Wait awhile (in my case this took somewhere between 30min and 1 hour depending on hardware, it may take longer depending on the size and contents of the manifest) 7) Remove the passage added to /etc/httpd/conf.d/05-foreman-ssl.d/katello.conf in step 1. 8) Restart httpd: `systemctl restart httpd` After completing the above, the manifest import should succeed. The above worked for me on my reproducer of the deadlock issue. I believe it will work for others as well. FYI I was able to reproduce this by simple: - calling manifest refresh - invoking many "get me consumer serials" requests in parallel. these requests are normally triggered by rhsmd activity on the client systems - _no_ virt-who call in charge at all Pavel, Thanks for the update. To confirm, was that reproducer on 6.4? (In reply to Brad Buckingham from comment #36) > Pavel, > > Thanks for the update. To confirm, was that reproducer on 6.4? Trying so on 6.4, I got (in approx. 10th attempt) a deadlock but in candlepin, not postgres: 2018-11-04 17:32:30,830 [thread=http-bio-8443-exec-521] [req=7e4a9f4e-71d3-46b2-9a64-5f0ab4dd621d, org=, csid=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/627e59b8-3f03-4c2f-9f67-db5d8a13e207/certificates/serials 2018-11-04 17:32:30,833 [thread=http-bio-8443-exec-647] [req=a31ce16a-d317-4204-b484-b41f77fbec7d, org=, csid=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/5c6bbc4d-c9a9-436a-a84a-4a27a63330ca/certificates/serials 2018-11-04 17:33:40,482 [thread=C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-AdminTaskTimer] [=, org=, csid=] WARN com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@e4cebf5 -- APPARENT DEADLOCK!!! Complete Status: Managed Threads: 3 Active Threads: 3 Active Tasks: com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@6b1485df on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#0 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@3f938880 on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#1 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@2b608ded on thread: C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#2 Pending Tasks: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7159f200 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@3eacbb89 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1d58f354 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@32f80af7 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@c8f375a com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5724a5b1 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@dbc3e52 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@7fe122a0 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@a21efe1 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@785aca39 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@14f825ce com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@62fae1d3 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@67a1548a com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@49045145 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1c87cbfe com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@29c655c4 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@44ae5172 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@2fcbaedb com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@db158e7 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@16c244e0 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@20d1a28b com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@529e6d3c com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@556f61e8 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@518ecae1 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7ce6ffa1 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5d309a8c com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@737ec0e3 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@5480c5af com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@1f503369 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@46ed2ef5 com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@43918713 com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@11ee9f73 com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@fbb3d5c Pool thread stack traces: Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#0,5,main] com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720) Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#1,5,main] com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720) Thread[C3P0PooledConnectionPoolManager[identityToken->1hgemtd9y1mo7gay1kun9qu|8211f0a]-HelperThread-#2,5,main] com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:720) (and tomcat/candlepin was stopped :-o ) postgres logs show nothing, candlepin's error and candlepin.log contained just that info, tomcat logs nothing. note to myself: reproducer was on provisioning.usersys.redhat.com, fetching serials of all cp_consumers in a loop during the manifest refresh Other developers on the team, support engineers, and I have all been unable to reproduce this issue. I'm going to close this bug, but if someone encounters it again and has a reproducer case, please reopen it. If you do have a reproducer, we will need a copy of the manifest you are importing as well as a dump of the current Candlepin database. It would be inappropriate to attach these to a public bug, so you will either need to open a case with support (referencing this bug) and attach the necessary manifest and DB dump to the case or provide those files to a member of the development team out-of-band. |