Bug 1366231

Summary: qpid dispatch router crashes when applying errata to a large number of hosts via Satellite 6.2,
Product: Red Hat Satellite Reporter: Pradeep Kumar Surisetty <psuriset>
Component: Errata ManagementAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: anazmy, arcsharm, cduryee, dcaplan, jhutar, jortel, mmccune, pmoravec, tross
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-11 11:18:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pradeep Kumar Surisetty 2016-08-11 10:36:01 UTC
Description of problem:

Registered around 25k content hosts at scale to satellite/capsules with some of the tunings mentioned below.  When applying errata to a large number of hosts via Satellite 6.2, it looks like the qpid dispatch router crashes with the following backtrace. Noticed this while updating on 2k nodes too. 
 
blktrace:
---------
*** Error in `/usr/sbin/qdrouterd': double free or corruption (out):
0x00007f129c7d4ce0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d053)[0x7f12e742d053]
/lib64/libqpid-proton.so.2(pn_class_decref+0x56)[0x7f12e8179806]
/lib64/libqpid-proton.so.2(+0x27580)[0x7f12e8187580]
/lib64/libqpid-proton.so.2(pn_class_decref+0x38)[0x7f12e81797e8]
/lib64/libqpid-proton.so.2(pn_collector_pop+0x22)[0x7f12e8187722]
/lib64/libqpid-dispatch.so.0(+0x16f00)[0x7f12e83c6f00]
/lib64/libqpid-dispatch.so.0(+0x29b9c)[0x7f12e83d9b9c]
/lib64/libpthread.so.0(+0x7dc5)[0x7f12e7f4bdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f12e74a6ced]

Packages:
---------

qpid-proton-c-0.9-16.el7.x86_64
qpid-tools-0.30-4.el7.noarch
qpid-cpp-server-0.30-11.el7sat.x86_64
qpid-dispatch-router-0.4-13.el7sat.x86_64
python-qpid-qmf-0.30-5.el7.x86_64
tfm-rubygem-qpid_messaging-0.30.0-7.el7sat.x86_64
python-gofer-qpid-2.7.6-1.el7sat.noarch
qpid-qmf-0.30-5.el7.x86_64
qpid-cpp-client-devel-0.30-11.el7sat.x86_64
python-qpid-0.30-9.el7sat.noarch
libqpid-dispatch-0.4-13.el7sat.x86_64
qpid-cpp-client-0.30-11.el7sat.x86_64
qpid-cpp-server-linearstore-0.30-11.el7sat.x86_64

Tunings made for scale:
----------------------

 # cat /etc/systemd/system/qdrouterd.service.d/limits.conf
 # cat /etc/systemd/system/qpidd.service.d/limits.conf
 # cat /etc/systemd/system/httpd.service.d/limits.conf
 All of these same content:
 [Service]
 LimitNOFILE=1000000
 # systemctl daemon-reload

 # katello-service restart
 # echo 1000000 > /proc/sys/fs/aio-max-nr  # or better equivalent in sysctl.conf
 
 Ref:  https://access.redhat.com/solutions/222693
       https://access.redhat.com/solutions/1375253

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Applying errata to a large number of hosts via Satellite 6.2
2.
3.

Actual results:

qpid dispatch router crashes 


Expected results:

errata should be successful 

Additional info:

Comment 1 Pradeep Kumar Surisetty 2016-08-11 10:41:02 UTC
Similar issue reported on qpid proton 0.9. Thanks to Ahmed. 

https://issues.apache.org/jira/browse/PROTON-826

Comment 2 Pavel Moravec 2016-08-11 11:18:33 UTC

*** This bug has been marked as a duplicate of bug 1366232 ***