| Summary: | Broker crash due to large queue threshold recursion. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Ken Giusti <kgiusti> | ||||
| Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ppecka <ppecka> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | beta | CC: | freznice, iboverma, jross, matt, ppecka | ||||
| Target Milestone: | 2.0 | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qpid-cpp-0.9.1079953 | Doc Type: | Bug Fix | ||||
| Doc Text: |
N/A
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-12-07 17:42:42 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Core file is too big to attach. Uploaded it to my home dir on grid0.lab.bos.redhat.com: /home/bos/kgiusti/BZ683216/core.24766.gz Created attachment 483028 [details]
Proposed fix.
A bit of a cheat - record the rate limiting timestamp before raise'ing the event as raising the event may cause more messages to be enqueued.
Fix and test committed upstream: http://svn.apache.org/viewvc?view=rev&rev=1079854 and to 0.10 release branch: http://svn.apache.org/viewvc?rev=1079953&view=rev
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
N/A
*** Bug 683927 has been marked as a duplicate of this bug. *** More detailed description how to reproduce is needed. raising NEEDINFO The test test_alert_on_alert_queue in threshold.py (http://svn.apache.org/viewvc/qpid/trunk/qpid/tests/src/py/qpid_tests/broker_0_10/threshold.py?view=markup&pathrev=1079854) gives a simple reproducer. Running that prior to the fix would cause the broker to crash. Specifically the bug was triggered when the queue subscribed to receive events itself had a threshold set and was allowed to reach that threshold. An event for that queue itself would be generated and added to the queue, which would in turn trigger regeneration of the event in a recursive loop. Using the same pattern as the system test above you could trigger the issue using spout and drain: drain -f "qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#; {link:{x-declare:{arguments:{'qpid.alert_count':1}}}}" then spout "ttq; {create:always, node: {x-declare:{auto_delete:True,exclusive:True,arguments:{'qpid.alert_count':1}}}}" Verified on rhel5 / rhel 6 - i686 / x86_64 # rpm -qa | grep qpid python-qpid-0.10-1.el5 qpid-cpp-server-xml-0.10-7.el5 qpid-qmf-devel-0.10-10.el5 qpid-cpp-client-0.10-7.el5 qpid-java-client-0.10-6.el5 qpid-cpp-client-devel-0.10-7.el5 qpid-cpp-server-devel-0.10-7.el5 qpid-java-common-0.10-6.el5 qpid-qmf-0.10-10.el5 qpid-cpp-client-ssl-0.10-7.el5 qpid-cpp-server-cluster-0.10-7.el5 qpid-cpp-server-0.10-7.el5 qpid-java-example-0.10-6.el5 python-qpid-qmf-0.10-10.el5 qpid-cpp-client-devel-docs-0.10-7.el5 qpid-cpp-server-ssl-0.10-7.el5 qpid-tools-0.10-5.el5 qpid-cpp-server-store-0.10-7.el5 --> VERIFIED |
Description of problem: The broker has crashed due to segmentation fault: Program terminated with signal 11, Segmentation fault. #0 0x0000003eee0728c8 in _int_malloc () from /lib64/libc.so.6 It appears that the queueThreshold event has caused a (nearly) infinite loop where the alert is enqueued, which causes another alert to be issued, and repeats (see below and attached core file). Version-Release number of selected component (if applicable): qpid-cpp-server-0.9.1073306-1.el5 How reproducible: Unknown. Steps to Reproduce: 1. Unknown. 2. 3. Actual results: Expected results: Additional info: Here's the state of the Threshold queue observer: (gdb) up #44 0x0000003515e174bd in qpid::broker::ThresholdAlerts::enqueued (this=0x2aaabc61c0c0, m=<value optimized out>) at qpid/broker/ThresholdAlerts.cpp:47 47 agent.raiseEvent(qmf::org::apache::qpid::broker::EventQueueThresholdExceeded(name, count, size)); (gdb) p *this $12 = {<qpid::broker::QueueObserver> = {_vptr.QueueObserver = 0x35160d7dd0}, name = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x2aaabc5a57e8 "qmfc-v2-ui-grid0.lab.bos.redhat.com.7713.1"}}, agent = @0x2aaaaaaab010, countThreshold = 0, sizeThreshold = 83886080, repeatInterval = { nanosecs = 60000000000}, count = 7915, size = 84360583, lastAlert = {timepoint = 0}} The stack is full of: #4433 0x0000003515e1f4c1 in qpid::broker::TopicExchange::route (this=0x16468dc8, msg=..., routingKey=...) at qpid/broker/TopicExchange.cpp:343 #4434 0x0000003515e39eea in qpid::management::ManagementAgent::sendBufferLH (this=0x2aaaaaaab010, data=<value optimized out>, cid=..., headers=..., content_type=..., exchange=..., routingKey=..., ttl_msec=0) at qpid/management/ManagementAgent.cpp:621 #4435 0x0000003515e3bf06 in qpid::management::ManagementAgent::raiseEvent(const qpid::management::ManagementEvent &, qpid::management::ManagementAgent::._131) ( this=0x2aaaaaaab010, event=..., severity=<value optimized out>) at qpid/management/ManagementAgent.cpp:406 #4436 0x0000003515e174bd in qpid::broker::ThresholdAlerts::enqueued (this=0x2aaabc61c0c0, m=<value optimized out>) at qpid/broker/ThresholdAlerts.cpp:47 #4437 0x0000003515db4d1d in qpid::broker::Queue::enqueued (this=0x2aaabc5ad370, m=...) at qpid/broker/Queue.cpp:1126 #4438 0x0000003515db5954 in qpid::broker::Queue::push (this=0x2aaabc5ad370, msg=<value optimized out>, isRecovery=false) at qpid/broker/Queue.cpp:529 #4439 0x0000003515db8760 in qpid::broker::Queue::deliver (this=0x2aaabc5ad370, msg=...) at qpid/broker/Queue.cpp:168 #4440 0x0000003515d55302 in qpid::broker::DeliverableMessage::deliverTo (this=0x4477a350, queue=...) at qpid/broker/DeliverableMessage.cpp:33 #4441 0x0000003515d6e555 in qpid::broker::Exchange::doRoute (this=0x16468dc8, msg=..., b=...) at qpid/broker/Exchange.cpp:84 #4442 0x0000003515e1f4c1 in qpid::broker::TopicExchange::route (this=0x16468dc8, msg=..., routingKey=...) at qpid/broker/TopicExchange.cpp:343 #4443 0x0000003515e39eea in qpid::management::ManagementAgent::sendBufferLH (this=0x2aaaaaaab010, data=<value optimized out>, cid=..., headers=..., content_type=..., exchange=..., routingKey=..., ttl_msec=0) at qpid/management/ManagementAgent.cpp:621 #4444 0x0000003515e3bf06 in qpid::management::ManagementAgent::raiseEvent(const qpid::management::ManagementEvent &, qpid::management::ManagementAgent::._131) ( this=0x2aaaaaaab010, event=..., severity=<value optimized out>) at qpid/management/ManagementAgent.cpp:406 #4445 0x0000003515e174bd in qpid::broker::ThresholdAlerts::enqueued (this=0x2aaabc61c0c0, m=<value optimized out>) at qpid/broker/ThresholdAlerts.cpp:47 #4446 0x0000003515db4d1d in qpid::broker::Queue::enqueued (this=0x2aaabc5ad370, m=...) at qpid/broker/Queue.cpp:1126 #4447 0x0000003515db5954 in qpid::broker::Queue::push (this=0x2aaabc5ad370, msg=<value optimized out>, isRecovery=false) at qpid/broker/Queue.cpp:529 #4448 0x0000003515db8760 in qpid::broker::Queue::deliver (this=0x2aaabc5ad370, msg=...) at qpid/broker/Queue.cpp:168 #4449 0x0000003515d55302 in qpid::broker::DeliverableMessage::deliverTo (this=0x4477bc10, queue=...) at qpid/broker/DeliverableMessage.cpp:33 #4450 0x0000003515d6e555 in qpid::broker::Exchange::doRoute (this=0x16468dc8, msg=..., b=...) at qpid/broker/Exchange.cpp:84 #4451 0x0000003515e1f4c1 in qpid::broker::TopicExchange::route (this=0x16468dc8, msg=..., routingKey=...) at qpid/broker/TopicExchange.cpp:343 before it finally runs out of memory during the route calls.