Description of problem: Repeatedly while running performance longevity tests, all brokers within a cluster will crash. Logs indicate that the crash occurs shortly after the loss and re-establishment of a federated link. Version-Release number of selected component (if applicable): qpid-cpp-server-0.12-6_ptc_hotfix_3.el6.x86_64 How reproducible: Frequently Steps to Reproduce: 1. Configure a federated, clustered collection of brokers 2. Send and receive messages at a high throughput Actual results: Broker cluster crashes Expected results: Broker cluster does not crash
Created attachment 568768 [details] Backtrace from first broker to crash
Created attachment 568769 [details] Backtrace from second broker to crash
Issue was recently repeated repeatedly in a client environment where debug-level logs were available. The chain of events appears to start with a non-responsive federated link. Sequence of Events: 1) Cluster elder establishes inter-broker link Jun 26 17:41:01 HOST1 qpidd[22138]: 2012-06-26 17:41:01 debug Inter-broker link connecting to HOST2:10000 Jun 26 17:41:01 HOST1 qpidd[22138]: 2012-06-26 17:41:01 info Set TCP_NODELAY on connection to HOST2:10000 Jun 26 17:41:01 HOST1 qpidd[22138]: 2012-06-26 17:41:01 info Inter-broker link established to HOST2:10000 Jun 26 17:41:01 HOST1 qpidd[22138]: 2012-06-26 17:41:01 debug cluster(2.0.0.0:22138 READY) local connection HOST1:54581-HOST2:10000(2.0.0.0:22138-14 local) 2) Cluster elder sends AMQP init frame Jun 26 17:41:01 HOST1 qpidd[22138]: 2012-06-26 17:41:01 debug SENT [HOST1:54581-HOST2:10000] INIT(0-10) 3) Cluster elder never receives any data from federated peer / connection never announced 4) Link ioThreadProcessing fires and attempts to create bridges over federated link Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug Link::ioThreadProcessing() Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug SessionState::SessionState @QPID.0748c28e-995a-47f3-9216-b1d0b2c54b10: 0x7fad0006ff10 Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug @QPID.0748c28e-995a-47f3-9216-b1d0b2c54b10: attached on broker. Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug SessionHandler::sendAttach attach id= Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug Activated route from queue QUEUE1 to EXCHANGE1 Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug SessionState::SessionState @QPID.98811980-fc85-4ad8-b662-1a52ad7f3b25: 0x7fad00094ce0 Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug @QPID.98811980-fc85-4ad8-b662-1a52ad7f3b25: attached on broker. Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug SessionHandler::sendAttach attach id= Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug Activated route from queue QUEUE2 to EXCHANGE2 5) Cluster shuts down because connection was never announced Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 debug Exception constructed: Unknown connection: Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=2048; }] control 2.0.0.0:22138-14 (qpid/cluster/Cluster.cpp:542) Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 critical Error delivering frames: Unknown connection: Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=2048; }] control 2.0.0.0:22138-14 (qpid/cluster/Cluster.cpp:542) Jun 26 17:41:03 HOST1 qpidd[22138]: 2012-06-26 17:41:03 notice cluster(2.0.0.0:22138 LEFT) leaving cluster CLUSTER_NAME
Simplified Steps to Reproduce: 1. Configure a clustered broker 2. Start netcat in listen mode (nc -l 6000) 3. Add a route between the clustered broker and nc (qpid-route queue add localhost:5672 localhost:6000 amq.fanout foo) 4. Clustered broker shuts down
Merged to 0.18 r1362653
Tested on RHEL5.9 and RHEL6.3 (both i386 and x86_64). This issue has been fixed. Packages used for testing: RHEL5.9 qpid-cpp-client-0.18-13.el5 qpid-cpp-client-devel-0.18-13.el5 qpid-cpp-client-devel-docs-0.18-13.el5 qpid-cpp-client-ssl-0.18-13.el5 qpid-cpp-server-0.18-13.el5 qpid-cpp-server-cluster-0.18-13.el5 qpid-cpp-server-devel-0.18-13.el5 qpid-cpp-server-ssl-0.18-13.el5 qpid-cpp-server-store-0.18-13.el5 qpid-cpp-server-xml-0.18-13.el5 qpid-java-client-0.18-6.el5 qpid-java-common-0.18-6.el5 qpid-java-example-0.18-6.el5 qpid-qmf-0.18-13.el5 qpid-qmf-devel-0.18-13.el5 qpid-tools-0.18-7.el5 RHEL6.3 qpid-cpp-client-0.18-13.el6 qpid-cpp-client-devel-0.18-13.el6 qpid-cpp-client-devel-docs-0.18-13.el6 qpid-cpp-server-0.18-13.el6 qpid-cpp-server-cluster-0.18-13.el6 qpid-cpp-server-devel-0.18-13.el6 qpid-cpp-server-store-0.18-13.el6 qpid-cpp-server-xml-0.18-13.el6 qpid-java-client-0.18-6.el6 qpid-java-common-0.18-6.el6 qpid-java-example-0.18-6.el6 qpid-qmf-0.18-13.el6 qpid-tools-0.18-7.el6_3 -> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0561.html