Description of problem: While trying a possible fix for BZ 667428 the broker core dumped with a segmentation fault. The java client, when sending a command - throws an exception if the session is transactional and in the detached state. Then it tries to create a new session after connecting to the surviving node (the other node is killed as part of the test). On the broker logs of the surviving broker, you could see it receiving the session attach command and then the seg fault happening. Version-Release number of selected component (if applicable): qpid-cpp-client-0.7.946106-27.el5 qpid-cpp-server-store-0.7.946106-27.el5 qpid-cpp-server-cluster-0.7.946106-27.el5 qpid-cpp-server-0.7.946106-27.el5 How reproducible: Always Steps to Reproduce: 1. Copy the attached qpid-common and qpid-client jar files and the bz667428.tar.gz to a particular directory and modify nessacery paths in the bz667428.sh script 2. Please note that you need to use the attached jar files instead of the java client rpms as the jar files attach contains the specific solution that triggers this problem. 3. Run the bz667428.sh script and you should see the broker exiting with a seg fault. Actual results: The broker should not seg fault. Expected results: If there is an error on the client side, the broker should handle it gracefully.
Created attachment 476272 [details] Backtrace obtained from the core file.
Created attachment 476273 [details] Reproducer Extract the attached file, including the bz667428.tar.gz contained inside it. Modify the paths within bz667428.sh to point to the correct jar files etc.. Run the bz667428.sh to reproduce the problem.
Created attachment 476275 [details] Backtrace after installing debug symbols The attached backtrace includes line numbers , which makes it easy to track down the issue.
The broker logs qpidd.5672.0.log and qpidd2.10000.0.log are included in the bz667428.tar.gz (which in turn is bundled inside 674183.tar.gz). qpidd2.10000.0.log is the log for the broker that generates the core file due to seg fault.
From looking at the broker log, it appears that the session-attach normally occurs immediately following the creation of a connection. The crashing session-attach (the last line in the log) is an exception to this pattern because it is *not* preceded by a connection setup. The code in stack frame #2 (from the backtrace) looks like: QPID_LOG(debug, getId() << ": attached on broker."); handler = &h; if (mgmtObject != 0) { mgmtObject->set_attached (1); mgmtObject->set_connectionRef (h.getConnection().GetManagementObject()->getObjectId()); mgmtObject->set_channelId (h.getChannel()); } Is it possible that h.getConnection() is returning an invalid value or a reference to a connection that is being deleted and has no Management Object? For a session to be successfully attached, it must have a valid connection, right?
Created attachment 476653 [details] This patch fixes the segfault, not the underlying problem
The problem seems to be that the Java client sometimes sends an attach command on a connection that is not yet fully open. This is illegal, the broker needs to detect this and respond with a connection exception rather than crashing.
Fixed upstream r1066661
The issue has been fixed, tested on RHEL 5.6 i386 / x86_64 on packages: python-qpid-0.7.946106-15.el5 qpid-cpp-client-0.7.946106-28.el5 qpid-cpp-client-devel-0.7.946106-28.el5 qpid-cpp-client-devel-docs-0.7.946106-28.el5 qpid-cpp-client-ssl-0.7.946106-28.el5 qpid-cpp-mrg-debuginfo-0.7.946106-28.el5 qpid-cpp-server-0.7.946106-28.el5 qpid-cpp-server-cluster-0.7.946106-28.el5 qpid-cpp-server-devel-0.7.946106-28.el5 qpid-cpp-server-ssl-0.7.946106-28.el5 qpid-cpp-server-store-0.7.946106-28.el5 qpid-cpp-server-xml-0.7.946106-28.el5 qpid-java-client-0.7.946106-15.el5 qpid-java-common-0.7.946106-15.el5 qpid-java-example-0.7.946106-15.el5 qpid-tools-0.7.946106-12.el5 VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0217.html