Bug 1377195 (sat6-qpid-deadlock)

Summary: Pulp tasks deadlock when using Qpid
Product: Red Hat Satellite Reporter: vdhande
Component: QpidAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: jcallaha
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.2.0CC: aconway, asahni, bbuckingham, bkearney, bvassova, chrobert, dkliban, egolov, gpatil, igreen, jcallaha, jhutar, ktordeur, mcressma, mhrivnak, mmccune, nitthoma, oshtaier, pdwyer, pmoravec, psuriset, sauchter, schamilt, sramacha, ssegal, zhunting
Target Milestone: UnspecifiedKeywords: Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-qpid-0.30-11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1428533 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1338516, 1428533    

Comment 4 Mike McCune 2016-10-06 04:58:11 UTC
This may be related to this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1279539

which we can deliver a hotfix to anyone interested

Comment 10 Mike Cressman 2016-12-02 22:58:15 UTC
We've attempted a few fixes with select pulp users, but have not solved it.  We are actively working on it. (QPID-7317)

Mike

Comment 12 Bryan Kearney 2016-12-13 15:59:51 UTC
*** Bug 1388814 has been marked as a duplicate of this bug. ***

Comment 13 Brian Bouterse 2016-12-13 16:28:14 UTC
Retitling to describe the symptoms more accurately

Comment 36 Brian Bouterse 2017-03-07 15:13:17 UTC
@igreen, I agree that this issue does not sound like the root cause of your case. Since we both think it's not the root cause, I am going to remove the association from case 01681610. If I am incorrect in that, please let me know or re-add it.

Even though that case was the one that started this BZ, we had been investigating this issue in upstream Pulp already.

Comment 37 Chris Roberts 2017-03-14 16:20:23 UTC
 *** HOTFIX INSTRUCTIONS ***

Before restarting services make sure all sync/pulp tasks are finished

RHEL 7:

http://people.redhat.com/chrobert/hf1377195/python-qpid-0.30-11.el7sat.noarch.rpm

# wget http://people.redhat.com/chrobert/hf1377195/python-qpid-0.30-11.el7sat.noarch.rpm
# yum localupdate python-qpid-0.30-11.el7sat.noarch.rpm
# katello-service restart

RHEL 6:

http://people.redhat.com/chrobert/hf1377195/python-qpid-0.30-11.el6sat.noarch.rpm

# wget http://people.redhat.com/chrobert/hf1377195/python-qpid-0.30-11.el6sat.noarch.rpm
# yum localupdate python-qpid-0.30-11.el6sat.noarch.rpm
# katello-service restart

After this is done resume normal operations. Tested on ref7 (6.2.8) with the el7 steps and worked fine to update.

Comment 41 Brian Bouterse 2017-03-28 18:41:57 UTC
I agree w/ @mhrivnak's reading of that exception. It is not related to the deadlocking (which is good) and that it's an SSL trust issue when trying to publish results to Katello.

Comment 42 Brian Bouterse 2017-03-30 19:52:26 UTC
This was discussed to be included with 6.2.9 if possible. I'm setting that as the target milestone. To include, bring in the one package in Comment 29.  https://bugzilla.redhat.com/show_bug.cgi?id=1377195#c29

Comment 44 Brian Bouterse 2017-04-03 14:03:14 UTC
Confirming that it is fixed is difficult because we don't know how to trigger it and it occurs rarely. I would like QE to ensure that no new regressions are introduced with this change using normal regression testing. A few basic sync/publish actions in sat6 should do it.

Comment 48 jcallaha 2017-04-19 17:29:06 UTC
Verified in Satellite 6.2.9 Snap 3.

Based on no-break criteria as well as reports from customer environments, this issue is fixed in the version above.

Comment 49 Bryan Kearney 2017-05-01 14:28:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191

Comment 50 Bryan Kearney 2017-05-01 14:29:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191

Comment 52 Brian Bouterse 2017-05-17 20:52:02 UTC
This BZ was about a root cause that is very difficult to reproduce. If you can reproduce a defect regularly with similar symptoms please file a new bug because you have a different root cause then.