DescriptionPradeep Kumar Surisetty
2016-08-11 10:36:01 UTC
Description of problem:
Registered around 25k content hosts at scale to satellite/capsules with some of the tunings mentioned below. When applying errata to a large number of hosts via Satellite 6.2, it looks like the qpid dispatch router crashes with the following backtrace. Noticed this while updating on 2k nodes too.
blktrace:
---------
*** Error in `/usr/sbin/qdrouterd': double free or corruption (out):
0x00007f129c7d4ce0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d053)[0x7f12e742d053]
/lib64/libqpid-proton.so.2(pn_class_decref+0x56)[0x7f12e8179806]
/lib64/libqpid-proton.so.2(+0x27580)[0x7f12e8187580]
/lib64/libqpid-proton.so.2(pn_class_decref+0x38)[0x7f12e81797e8]
/lib64/libqpid-proton.so.2(pn_collector_pop+0x22)[0x7f12e8187722]
/lib64/libqpid-dispatch.so.0(+0x16f00)[0x7f12e83c6f00]
/lib64/libqpid-dispatch.so.0(+0x29b9c)[0x7f12e83d9b9c]
/lib64/libpthread.so.0(+0x7dc5)[0x7f12e7f4bdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f12e74a6ced]
Packages:
---------
qpid-proton-c-0.9-16.el7.x86_64
qpid-tools-0.30-4.el7.noarch
qpid-cpp-server-0.30-11.el7sat.x86_64
qpid-dispatch-router-0.4-13.el7sat.x86_64
python-qpid-qmf-0.30-5.el7.x86_64
tfm-rubygem-qpid_messaging-0.30.0-7.el7sat.x86_64
python-gofer-qpid-2.7.6-1.el7sat.noarch
qpid-qmf-0.30-5.el7.x86_64
qpid-cpp-client-devel-0.30-11.el7sat.x86_64
python-qpid-0.30-9.el7sat.noarch
libqpid-dispatch-0.4-13.el7sat.x86_64
qpid-cpp-client-0.30-11.el7sat.x86_64
qpid-cpp-server-linearstore-0.30-11.el7sat.x86_64
Tunings made for scale:
----------------------
# cat /etc/systemd/system/qdrouterd.service.d/limits.conf
# cat /etc/systemd/system/qpidd.service.d/limits.conf
# cat /etc/systemd/system/httpd.service.d/limits.conf
All of these same content:
[Service]
LimitNOFILE=1000000
# systemctl daemon-reload
# katello-service restart
# echo 1000000 > /proc/sys/fs/aio-max-nr # or better equivalent in sysctl.conf
Ref: https://access.redhat.com/solutions/222693https://access.redhat.com/solutions/1375253
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Applying errata to a large number of hosts via Satellite 6.2
2.
3.
Actual results:
qpid dispatch router crashes
Expected results:
errata should be successful
Additional info:
Comment 1Pradeep Kumar Surisetty
2016-08-11 10:41:02 UTC