Bug 1393128

Summary:

qdrouterd 0.4-19 segfault when qpidd down for longer time and goferd restarted

Product:

Red Hat Satellite

Reporter:

Pavel Moravec <pmoravec>

Component:

katello-agent

Assignee:

Mike Cressman <mcressma>

Status:

CLOSED ERRATA

QA Contact:

jcallaha

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.2.4

CC:

alexandre.chanu, bbuckingham, bkearney, cdonnell, gmurthy, jcallaha, jentrena, jhutar, mmccune, oshtaier, paul.seymour, pdwyer, sthirugn

Target Milestone:

Unspecified

Keywords:

PrioBumpField, PrioBumpGSS, PrioBumpQA, Triaged

Target Release:

Unused

Hardware:

x86_64

OS:

Linux

URL:

https://bugzilla.redhat.com/show_bug.cgi?id=1367735

Whiteboard:

Fixed In Version:

qpid-dispatch-0.4-20

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1395700 1396568 (view as bug list)

Environment:

Last Closed:

2016-11-21 18:16:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1395700, 1396568

Attachments:

Description	Flags
coredump of one segfault	none
qpid-dispatch-router-0.4-20.el7sat.x86_64.rpm	none
qpid-dispatch-tools-0.4-20.el7sat.x86_64.rpm	none
libqpid-dispatch-0.4-20.el7sat.x86_64.rpm	none

Description Pavel Moravec 2016-11-08 22:30:18 UTC

Description of problem:
Playing with reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=1392768, I think it is possible to get qdrouterd segfault in Satellite when qpidd is down. In particular following scenario:

- qpidd is down for few minutes (well, this shouldnt happen)
- goferd on client machines restarted or sending some data to qdrouterd

Technically, I was able to reproduce it by 10 concurrent executions of https://bugzilla.redhat.com/show_bug.cgi?id=1367735#c21 that mimicked the goferd activity.

The qpidd down for longer time is mandatory condition for the segfault. That lowers down the severity of this bug - if qpidd is down, it does not matter much more that qdrouterd is down as well.


Version-Release number of selected component (if applicable):
qpid-dispatch-router-0.4-19.el7sat.x86_64
qpid-dispatch-debuginfo-0.4-19.el7sat.x86_64
qpid-proton-debuginfo-0.9-16.el7.x86_64
qpid-proton-c-0.9-16.el7.x86_64
libqpid-dispatch-0.4-19.el7sat.x86_64


How reproducible:
100% in few minutes


Steps to Reproduce:
1. qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add queue pulp.bz1272758
2. run 10 times the script from https://bugzilla.redhat.com/show_bug.cgi?id=1367735#c21
3. Wait a short time (10s) to let the scripts to connect via qdrouterd to qpidd
4. service qpidd stop
5. wait few minutes to the segfault


Actual results:
qdrouterd segfaults


Expected results:
no segfault


Additional info:
backtrace:

(gdb) bt
#0  __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:38
#1  0x00007f6896e5c0d0 in PyString_FromString (str=0x2 <Address 0x2 out of bounds>) at /usr/src/debug/Python-2.7.5/Objects/stringobject.c:121
#2  0x00007f6897a1bc6b in qd_entity_set_string () from /lib64/libqpid-dispatch.so.0
#3  0x00007f6897a24fe6 in qd_entity_refresh_router_link () from /lib64/libqpid-dispatch.so.0
#4  0x00007f688b2fadac in ffi_call_unix64 () at ../src/x86/unix64.S:76
#5  0x00007f688b2fa6d5 in ffi_call (cif=cif@entry=0x7f688a0dd720, fn=fn@entry=0x7f6897a24f40 <qd_entity_refresh_router_link>, rvalue=rvalue@entry=0x7f688a0dd650, 
    avalue=avalue@entry=0x7f688a0dd630) at ../src/x86/ffi64.c:522
#6  0x00007f688b50dc8b in _call_function_pointer (argcount=2, resmem=0x7f688a0dd650, restype=<optimized out>, atypes=<optimized out>, avalues=0x7f688a0dd630, 
    pProc=0x7f6897a24f40 <qd_entity_refresh_router_link>, flags=4357) at /usr/src/debug/Python-2.7.5/Modules/_ctypes/callproc.c:832
#7  _ctypes_callproc (pProc=pProc@entry=0x7f6897a24f40 <qd_entity_refresh_router_link>, 
    argtuple=argtuple@entry=({'linkType': 'endpoint', 'name': 'router.link/11', 'linkDir': 'out', 'msgFifoDepth': 0, 'remoteContainer': '083c47e3-c1aa-4384-b920-172cd2711fcf', 'linkName': 'd10cde98-38b1-4cbb-92ea-f2bd0dc4ae36', 'eventFifoDepth': 0, 'type': 'org.apache.qpid.dispatch.router.link', 'identity': 'router.link/11'}, 19640432), flags=4357, 
    argtypes=argtypes@entry=(<built-in method from_param of _ctypes.PyCSimpleType object at remote 0x1052ff0>, <built-in method from_param of _ctypes.PyCSimpleType object at remote 0x1067b60>), restype=<_ctypes.PyCSimpleType at remote 0x105d860>, checker=0x0) at /usr/src/debug/Python-2.7.5/Modules/_ctypes/callproc.c:1179
#8  0x00007f688b507a85 in PyCFuncPtr_call (self=<optimized out>, inargs=<optimized out>, kwds=<optimized out>) at /usr/src/debug/Python-2.7.5/Modules/_ctypes/_ctypes.c:3929
#9  0x00007f6896e0e0b3 in PyObject_Call (func=func@entry=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, 
    arg=arg@entry=({'linkType': 'endpoint', 'name': 'router.link/11', 'linkDir': 'out', 'msgFifoDepth': 0, 'remoteContainer': '083c47e3-c1aa-4384-b920-172cd2711fcf', 'linkName': 'd10cde98-38b1-4cbb-92ea-f2bd0dc4ae36', 'eventFifoDepth': 0, 'type': 'org.apache.qpid.dispatch.router.link', 'identity': 'router.link/11'}, 19640432), kw=kw@entry=0x0)
    at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529
#10 0x00007f6896ea225c in do_call (nk=<optimized out>, na=2, pp_stack=0x7f688a0dd980, func=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>)
    at /usr/src/debug/Python-2.7.5/Python/ceval.c:4316
#11 call_function (oparg=<optimized out>, pp_stack=0x7f688a0dd980) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4121
#12 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x11a6650, for file /usr/lib/qpid-dispatch/python/qpid_dispatch_internal/management/agent.py, line 126, in refresh_entity (self=<CImplementation(refreshfn=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, key=19640432, entity_type=<EntityType(operations=[u'READ'], singleton=False, all_bases=[<EntityType(operations=[u'READ'], singleton=False, all_bases=[<EntityType(operations=[], singleton=False, all_bases=[], operation_defs={u'READ': <OperationDef(response=<MessageDef(body=<AttributeType(name='body', default=None, create=False, required=False, atype=<Type(name='map', pytype=<built-in function map>) at remote 0x1012690>, update=False, value=None, defined_in=None, unique=False, description=u'Attributes of the entity') at remote 0x1164610>, properties={}) at remote 0x11645d0>, request=<MessageDef(body=None, properties={u'type': <AttributeType(name=u'type', default=None, create=False, required=False, atype=<Type(name='string', pytype=<type at remote 0x7f6897158ae0>) at remote 0x10125...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740
#13 0x00007f6896ea4860 in fast_function (nk=<optimized out>, na=2, n=2, pp_stack=0x7f688a0ddae0, func=<function at remote 0x1155050>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4184
#14 call_function (oparg=<optimized out>, pp_stack=0x7f688a0ddae0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119
#15 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x118d500, for file /usr/lib/qpid-dispatch/python/qpid_dispatch_internal/management/agent.py, line 177, in _refresh (self=<RouterLinkEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refresh_connection=<_FuncPtr(__name__='qd_entity_refresh_connection') at remote 0x10db600>...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740
#16 0x00007f6896ea4860 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7f688a0ddc40, func=<function at remote 0x11552a8>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4184
#17 call_function (oparg=<optimized out>, pp_stack=0x7f688a0ddc40) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119
#18 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x118cb30, for file /usr/lib/qpid-dispatch/python/qpid_dispatch_internal/management/agent.py, line 425, in refresh_from_c (self=<EntityCache(log=<instancemethod at remote 0x101f690>, agent=<Agent(management=<ManagementEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch---Type <return> to continue, or q <return> to quit---
_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refre...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740
#19 0x00007f6896ea60bd in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kws=0x12887a0, 
    kwcount=0, defs=0x0, defcount=0, closure=closure@entry=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3330
#20 0x00007f6896ea476f in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7f688a0dde40, func=<function at remote 0x1156668>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4194
#21 call_function (oparg=<optimized out>, pp_stack=0x7f688a0dde40) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119
#22 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x12885d0, for file /usr/lib/qpid-dispatch/python/qpid_dispatch_internal/management/agent.py, line 581, in receive (self=<Agent(management=<ManagementEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refresh_connection=<_FuncPtr(__name__='qd_entity_refresh_connection') at ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740
#23 0x00007f6896ea60bd in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x101f4c8, argcount=3, kws=kws@entry=0x0, 
    kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3330
#24 0x00007f6896e32f68 in function_call (func=<function at remote 0x1156e60>, 
    arg=(<Agent(management=<ManagementEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refresh_connection=<_FuncPtr(__name__='qd_entity_refresh_connection') at remote 0x10db600>, qd_dispatch_router_unlock=<_FuncPtr(__name__='qd_dispatch_router_unlock') at remote 0x10bebb0>, qd_waypoint...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.5/Objects/funcobject.c:526
#25 0x00007f6896e0e0b3 in PyObject_Call (func=func@entry=<function at remote 0x1156e60>, 
    arg=arg@entry=(<Agent(management=<ManagementEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refresh_connection=<_FuncPtr(__name__='qd_entity_refresh_connection') at remote 0x10db600>, qd_dispatch_router_unlock=<_FuncPtr(__name__='qd_dispatch_router_unlock') at remote 0x10bebb0>, qd_waypoint...(truncated), kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529
#26 0x00007f6896e1d0a5 in instancemethod_call (func=<function at remote 0x1156e60>, 
    arg=(<Agent(management=<ManagementEntity(_qd=<QdDll(qd_dispatch_configure_connector=<_FuncPtr(__name__='qd_dispatch_configure_connector') at remote 0x10be600>, _handle=140087201767944, qd_log_entity=<_FuncPtr(__name__='qd_log_entity') at remote 0x10be1f0>, qd_entity_refresh_end=<_FuncPtr(__name__='qd_entity_refresh_end') at remote 0x10beef0>, qd_dispatch_set_agent=<_FuncPtr(__name__='qd_dispatch_set_agent') at remote 0x10be940>, qd_entity_refresh_begin=<_FuncPtr(__name__='qd_entity_refresh_begin') at remote 0x10bee20>, qd_dispatch_configure_waypoint=<_FuncPtr(__name__='qd_dispatch_configure_waypoint') at remote 0x10be7a0>, qd_entity_refresh_router_link=<_FuncPtr(__name__='qd_entity_refresh_router_link') at remote 0x10db6d0>, qd_dispatch_prepare=<_FuncPtr(__name__='qd_dispatch_prepare') at remote 0x10be460>, qd_entity_refresh_connection=<_FuncPtr(__name__='qd_entity_refresh_connection') at remote 0x10db600>, qd_dispatch_router_unlock=<_FuncPtr(__name__='qd_dispatch_router_unlock') at remote 0x10bebb0>, qd_waypoint...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.5/Objects/classobject.c:2602
#27 0x00007f6896e0e0b3 in PyObject_Call (func=func@entry=<instancemethod at remote 0x101f3c0>, 
    arg=arg@entry=(<Message(body={}, reply_to=None, correlation_id=None, properties={'operation': 'noop'}, address=None) at remote 0x1184350>, 0), kw=kw@entry=0x0)
    at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529
#28 0x00007f6896e0e195 in call_function_tail (callable=<instancemethod at remote 0x101f3c0>, 
    args=(<Message(body={}, reply_to=None, correlation_id=None, properties={'operation': 'noop'}, address=None) at remote 0x1184350>, 0))
    at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2561
#29 0x00007f6896e0e27e in PyObject_CallFunction (callable=<instancemethod at remote 0x101f3c0>, format=<optimized out>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2585
#30 0x00007f6897a24cbf in qd_io_rx_handler () from /lib64/libqpid-dispatch.so.0
#31 0x00007f6897a261f5 in qd_router_timer_handler () from /lib64/libqpid-dispatch.so.0
#32 0x00007f6897a2c387 in thread_run () from /lib64/libqpid-dispatch.so.0
#33 0x00007f689759ddc5 in start_thread (arg=0x7f688a0df700) at pthread_create.c:308
---Type <return> to continue, or q <return> to quit---
#34 0x00007f6896af8ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 1 Pavel Moravec 2016-11-08 22:31:26 UTC

Created attachment 1218737 [details]
coredump of one segfault

Comment 2 Pavel Moravec 2016-11-08 22:46:24 UTC

I forgot to add:

- 0.4-16 does not exhibit that segfault
- some similar segfault is in qdrouterd for longer time, see [1] (though with different backtrace) - since this segfault has not been reported by customers, this one based on similar scenario is also improbable to be hit

[1] https://issues.jboss.org/browse/ENTMQIC-50

Comment 4 Pavel Moravec 2016-11-12 21:46:22 UTC

.. and hit by first customer with qpidd running (but backtrace matches).

(below I describe maybe a different segfault - please investigate if the cause is same or not)

Trying with some different scenarions (with qpidd "running"), I managed to get some segfault when freezing qpidd for a while (and running the same script multiple times meantime), simply running:

gdb -p $(pgrep qpidd) $(which qpidd)

waiting >10 seconds and detaching from the qpidd process - immediately after the detach, qpidd sends postponed heartbeats on AMQP1.0 to the router and it tries to match them to the already closed session/connection.

Variant of above: "freeze" qpidd just for a while and meantime stop all the client processes - again sessions/links are deleted in qdrouterd while qpidd will send some traffic on them later on.


Different backtraces seen in those scenarios:

#0  pn_session_connection (session=0x1a0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:232
232	  return session->connection;
(gdb) bt
#0  pn_session_connection (session=0x1a0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:232
#1  0x00007f328f8d4e14 in qd_link_connection (link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:914
#2  0x00007f328f8e2cd5 in router_link_attach_handler (context=0x125a200, link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/router_node.c:1686
#3  0x00007f328f8d4105 in handle_link_open (container=<optimized out>, pn_link=0x7f327c0bce20) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:217
#4  process_handler (unused=<optimized out>, qd_conn=0x7f327c00cb30, container=0x114e6e0) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:470
#5  handler (handler_context=0x114e6e0, conn_context=<optimized out>, event=event@entry=QD_CONN_EVENT_PROCESS, qd_conn=0x7f327c00cb30)
    at /usr/src/debug/qpid-dispatch-0.4/src/container.c:624
#6  0x00007f328f8e69fc in process_connector (cxtr=0x7f327c010290, qd_server=0x1159d60) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:398
#7  thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:626
#8  0x00007f328f457dc5 in start_thread (arg=0x7f3282f9b700) at pthread_create.c:308
#9  0x00007f328e9b2ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) p session
$1 = (pn_session_t *) 0x1a0
(gdb) p session->connection
Cannot access memory at address 0x208
(gdb) 

or:

#0  pni_record_find (record=<optimized out>, record=<optimized out>, key=key@entry=0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/record.c:71
71	    if (field->key == key) {
(gdb) bt
#0  pni_record_find (record=<optimized out>, record=<optimized out>, key=key@entry=0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/record.c:71
#1  pn_record_get (record=<optimized out>, key=key@entry=0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/record.c:120
#2  0x00007f17168f1593 in pn_connection_get_context (conn=<optimized out>) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:184
#3  0x00007f1716b35e21 in qd_link_connection (link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:918
#4  0x00007f1716b43cd5 in router_link_attach_handler (context=0x19eac50, link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/router_node.c:1686
#5  0x00007f1716b35105 in handle_link_open (container=<optimized out>, pn_link=0x7f1704099e60) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:217
#6  process_handler (unused=<optimized out>, qd_conn=0x1b06b70, container=0x197e420) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:470
#7  handler (handler_context=0x197e420, conn_context=<optimized out>, event=event@entry=QD_CONN_EVENT_PROCESS, qd_conn=0x1b06b70) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:624
#8  0x00007f1716b479fc in process_connector (cxtr=0x1b0a1e0, qd_server=0x19853f0) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:398
#9  thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:626
#10 0x00007f17166b8dc5 in start_thread (arg=0x7f17091fa700) at pthread_create.c:308
#11 0x00007f1715c13ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) p field
$1 = (pni_field_t *) 0x6e696c6f72614320
(gdb) p field->key
Cannot access memory at address 0x6e696c6f72614320
(gdb) 

or:

#0  0x00007fcdfd193585 in pn_connection_get_context (conn=0x30242d4b0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:184
184	  return conn ? pn_record_get(conn->context, PN_LEGCTX) : NULL;
(gdb) bt
#0  0x00007fcdfd193585 in pn_connection_get_context (conn=0x30242d4b0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:184
#1  0x00007fcdfd3d7e21 in qd_link_connection (link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:918
#2  0x00007fcdfd3e5cd5 in router_link_attach_handler (context=0x2360200, link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/router_node.c:1686
#3  0x00007fcdfd3d7105 in handle_link_open (container=<optimized out>, pn_link=0x7fcde00decd0) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:217
#4  process_handler (unused=<optimized out>, qd_conn=0x7fcde000cb30, container=0x22546e0) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:470
#5  handler (handler_context=0x22546e0, conn_context=<optimized out>, event=event@entry=QD_CONN_EVENT_PROCESS, qd_conn=0x7fcde000cb30)
    at /usr/src/debug/qpid-dispatch-0.4/src/container.c:624
#6  0x00007fcdfd3e99fc in process_connector (cxtr=0x7fcde0010290, qd_server=0x225fd60) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:398
#7  thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:626
#8  0x00007fcdfcf5adc5 in start_thread (arg=0x7fcdf0a9e700) at pthread_create.c:308
#9  0x00007fcdfc4b5ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) p conn
$1 = (pn_connection_t *) 0x30242d4b0
(gdb) p conn->context
Cannot access memory at address 0x30242d5a0
(gdb) 

or a combination of above two:

#0  pn_record_get (record=0x379830885ace8618, key=key@entry=0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/record.c:118
118	{
(gdb) bt
#0  pn_record_get (record=0x379830885ace8618, key=key@entry=0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/record.c:118
#1  0x00007f75e940d593 in pn_connection_get_context (conn=<optimized out>) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:184
#2  0x00007f75e9651e21 in qd_link_connection (link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:918
#3  0x00007f75e965fcd5 in router_link_attach_handler (context=0x17e3c50, link=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/router_node.c:1686
#4  0x00007f75e9651105 in handle_link_open (container=<optimized out>, pn_link=0x7f75d00ca720) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:217
#5  process_handler (unused=<optimized out>, qd_conn=0x7f75d400cb30, container=0x1777420) at /usr/src/debug/qpid-dispatch-0.4/src/container.c:470
#6  handler (handler_context=0x1777420, conn_context=<optimized out>, event=event@entry=QD_CONN_EVENT_PROCESS, qd_conn=0x7f75d400cb30)
    at /usr/src/debug/qpid-dispatch-0.4/src/container.c:624
#7  0x00007f75e96639fc in process_connector (cxtr=0x7f75d4010290, qd_server=0x177e3f0) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:398
#8  thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:626
#9  0x00007f75e9664a80 in qd_server_run (qd=0x1504030) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:971
#10 0x0000000000401cd8 in main_process (config_path=config_path@entry=0x7ffd15fad6ba "/etc/qpid-dispatch/qdrouterd.conf", 
    python_pkgdir=python_pkgdir@entry=0x402401 "/usr/lib/qpid-dispatch/python", fd=fd@entry=2) at /usr/src/debug/qpid-dispatch-0.4/router/src/main.c:135
#11 0x0000000000401950 in main (argc=3, argv=0x7ffd15fac768) at /usr/src/debug/qpid-dispatch-0.4/router/src/main.c:335
(gdb)

Comment 5 Paul Seymour 2016-11-13 09:52:19 UTC

I am getting various qdrouterd segfaults since the 6.2.4 update:-

kernel: [166585.935944] qdrouterd[17142]: segfault at 98 ip 00007f16d1cc3ef0 sp 00007f16b37fc328 error 4 in libqpid-proton.so.2.0.0[7f16d1c9f000+4d000]
kernel: qdrouterd[17142]: segfault at 98 ip 00007f16d1cc3ef0 sp 00007f16b37fc328 error 4 in libqpid-proton.so.2.0.0[7f16d1c9f000+4d000]
systemd[1]: qdrouterd.service: main process exited, code=killed, status=11/SEGV
systemd[1]: Unit qdrouterd.service entered failed state.
lrprdrhs001 systemd[1]: qdrouterd.service failed.

&

kernel: [169696.067382] traps: qdrouterd[25173] general protection ip:7f665b8ece71 sp:7f664d8052d8 error:0 in libc-2.17.so[7f665b789000+1b6000]
kernel: traps: qdrouterd[25173] general protection ip:7f665b8ece71 sp:7f664d8052d8 error:0 in libc-2.17.so[7f665b789000+1b6000]
systemd[1]: qdrouterd.service: main process exited, code=killed, status=11/SEGV
systemd[1]: Unit qdrouterd.service entered failed state.
systemd[1]: qdrouterd.service failed.

Any ideas or workarounds ?

Comment 6 Pavel Moravec 2016-11-13 16:47:56 UTC

(In reply to Paul Seymour from comment #5)
> I am getting various qdrouterd segfaults since the 6.2.4 update:-
> 
> kernel: [166585.935944] qdrouterd[17142]: segfault at 98 ip 00007f16d1cc3ef0
> sp 00007f16b37fc328 error 4 in libqpid-proton.so.2.0.0[7f16d1c9f000+4d000]
> kernel: qdrouterd[17142]: segfault at 98 ip 00007f16d1cc3ef0 sp
> 00007f16b37fc328 error 4 in libqpid-proton.so.2.0.0[7f16d1c9f000+4d000]
> systemd[1]: qdrouterd.service: main process exited, code=killed,
> status=11/SEGV
> systemd[1]: Unit qdrouterd.service entered failed state.
> lrprdrhs001 systemd[1]: qdrouterd.service failed.
> 
> &
> 
> kernel: [169696.067382] traps: qdrouterd[25173] general protection
> ip:7f665b8ece71 sp:7f664d8052d8 error:0 in libc-2.17.so[7f665b789000+1b6000]
> kernel: traps: qdrouterd[25173] general protection ip:7f665b8ece71
> sp:7f664d8052d8 error:0 in libc-2.17.so[7f665b789000+1b6000]
> systemd[1]: qdrouterd.service: main process exited, code=killed,
> status=11/SEGV
> systemd[1]: Unit qdrouterd.service entered failed state.
> systemd[1]: qdrouterd.service failed.
> 
> Any ideas or workarounds ?

Bugzilla is not the (primary) tool for troubleshooting customer issues - please raise customer case for that.

If you could provide a coredump from the segfault (i.e. via abrt report) to confirm you hit really this bug, that would be great.

Currently no workaround is known. What *might* (but definitely not need to) help is installing packages or erratas to fewer systems in parallel - at least there is one _fixed_ segfault in that area, maybe not 100% correct.

Another workaround is to use lower version of qdrouterd packages. 6.2.3 contains 0.4-16 that has big mem.leak, it should be safe to rollback/downgrade qpid-dispatch-router and libqpid-dispatch to that version.

Comment 9 Pavel Moravec 2016-11-14 08:52:52 UTC

Yet another very similar backtrace from a customer:

(gdb) bt
#0  pn_string_get (string=0x25) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/string.c:120
#1  0x00007f7c77abb01c in pn_link_name (link=<optimized out>) at /usr/src/debug/qpid-proton-0.9/proton-c/src/engine/engine.c:1316
#2  0x00007f7c77d07fd4 in qd_router_link_name (link=0x7f7c1800a150) at /usr/src/debug/qpid-dispatch-0.4/src/router_agent.c:90
#3  qd_entity_refresh_router_link (entity=0x7f7c08950010, impl=0x7f7c1800a150) at /usr/src/debug/qpid-dispatch-0.4/src/router_agent.c:98
#4  0x00007f7c6b834dcc in ffi_call_unix64 () at ../src/x86/unix64.S:90
#5  0x000000000000000c in ?? ()
#6  0x00007f7c577f44a0 in ?? ()
#7  0x00007f7c577f4450 in ?? ()
#8  0x00007f7c6b8346f5 in ffi_call (cif=<optimized out>, fn=<optimized out>, rvalue=0x7f7c771239c9 <insertdict+25>, avalue=0x7f7c577f4480) at ../src/x86/ffi64.c:524
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) frame 2
#2  0x00007f7c77d07fd4 in qd_router_link_name (link=0x7f7c1800a150) at /usr/src/debug/qpid-dispatch-0.4/src/router_agent.c:90
90	    return pn_link_name(qd_link_pn(link->link));
(gdb) p link->link
$1 = (qd_link_t *) 0x7f7c18008110
(gdb) p link->link->pn_link 
$2 = (pn_link_t *) 0x7f7c18002f60
(gdb) p link->link->pn_link->name
$3 = (pn_string_t *) 0x25
(gdb)

Comment 10 Pavel Moravec 2016-11-14 10:00:00 UTC

(and btw an attempt to workaround by disabling logging did not help:

log {
    enable: critical
    module: DEFAULT
}

in qdrouterd.conf still leads to the same segfault)

Comment 11 Pavel Moravec 2016-11-14 16:16:07 UTC

Best reproducer for Satellite6 QE (still little bit tricky / not scenario one would see at field):

- have all Sat services running

- in 1st terminal, freeze qpidd process for at least 11 seconds and then unfreeze it:

kill -SIGSTOP $(pgrep qpidd); sleep 11; kill -SIGCONT $(pgrep qpidd)


- immediately after the expect script is running, restart goferd on some Content Host _twice_ :

service goferd restart; sleep 3; service goferd restart


- check qdrouterd service status

Comment 20 Mike McCune 2016-11-18 19:52:06 UTC

Created attachment 1222002 [details]
qpid-dispatch-router-0.4-20.el7sat.x86_64.rpm

Comment 21 Mike McCune 2016-11-18 19:52:35 UTC

Created attachment 1222003 [details]
qpid-dispatch-tools-0.4-20.el7sat.x86_64.rpm

Comment 22 Mike McCune 2016-11-18 19:53:01 UTC

Created attachment 1222004 [details]
libqpid-dispatch-0.4-20.el7sat.x86_64.rpm

Comment 23 Mike McCune 2016-11-18 19:53:28 UTC

== HOTFIX Instructions ==

We are releasing version 0.4-20 to fix the segfault issue.

To install just download from this bugzilla and install via RPM or yum locally.

Then run katello-service restart

This hotfix will be released in a formal Errata soon.

Comment 25 jcallaha 2016-11-21 15:38:35 UTC

Verified in Satellite 6.2.4.a

I've been testing this over the weekend, culminating in an automation
run against my remaining RHEL6 and RHEL7 systems. After reviewing the
automation results, everything is looking good. I was unable to get the
reproducer script to work properly (issues with python-qpid-proton). But
I did not notice any impact to performance or breaks during my stress
testing on Saturday.

Comment 28 errata-xmlrpc 2016-11-21 18:16:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2811