Description of problem: The multihost clustered qpidd with persistance configuration suffer from error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113) issue after few minutes of running my qpid_test_mnode_cluster test. packages stored in: mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2 mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2 Version-Release number of selected component (if applicable): qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5 and also qpidd-0.4.743861-1.el5 rhm-0.4.3116-1.el5 How reproducible: 80% Steps to Reproduce: run qpid_test_mnode_cluster or maybe run perftest against clustered qpidd 1. extract qpid_test_mnode_cluster_framing_issue*.tar.bz2 to a directory X 2. find another 1-2 machines and create that directory X shared among them via nfs 3. run s.sh on all three notes, wait untill all nodes get over compiling clients and finally start master m.sh on the node which is exporting dir X 4. wait approx 10 mins and it should hang, terminate all processes and investigate on qpidd logfiles Actual results: error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113) issue Expected results: no issue is expected Additional info: packages stored in: mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2 qpidd with framing error: qpid_test_mnode_cluster/dell-pe2850-01.rhts.bos.redhat.com/j0003.sh.log:46 mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2 qpidd with framing error: qpid_test_mnode_cluster/dell-pesc430-03.rhts.bos.redhat.com/j0000.sh.log:2285657
This looks like a problem with unpatched version of openais. Do you still see the problem with the following version: http://porkchop.redhat.com/brewroot/packages/openais/0.80.5/1.el5/
Not able to go further because of Bug 485689 encountered.
Retested on latest packages and the issue is still reproducible: [root@intel-d3c4702-02 qpid_test_mnode_cluster]# tail intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log 2009-mar-30 05:24:26 info 10.16.64.100:4868(READY) member update: 10.16.64.99:5729(member) 10.16.64.100:4868(member) 2009-mar-30 05:25:35 info 10.16.64.100:4868(OFFER) send update-offer to 10.16.64.58:3255 2009-mar-30 05:25:35 info 10.16.64.100:4868(READY) cancelled update offer to 10.16.64.58:3255 2009-mar-30 05:27:39 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.99:5729(member) 10.16.64.100:4868(member) 2009-mar-30 05:30:55 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.100:4868(member) 2009-mar-30 05:31:19 info 10.16.64.100:4868(READY) member update: 10.16.64.100:4868(member) 2009-mar-30 05:31:19 notice 10.16.64.100:4868(READY) last broker standing, update queue policies 2009-mar-30 05:31:19 error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113) 2009-mar-30 05:31:19 notice 10.16.64.100:4868(LEFT) leaving cluster qpid_mnode_cluster_test j0000.sh: line 1: 4868 Segmentation fault (core dumped) qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0 [root@intel-d3c4702-02 qpid_test_mnode_cluster]# gdb `which qpidd` ./intel-d3c4702-02.rhts.bos.redhat.com/core.4868 GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... Reading symbols from /usr/lib64/libqpidbroker.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.0.1.0.debug...done. done. ... Loaded symbols for /usr/lib64/qpid/daemon/acl.so Core was generated by `qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --dat'. Program terminated with signal 11, Segmentation fault. [New process 4868] [New process 4873] [New process 4871] [New process 4870] [New process 4869] #0 0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6 (gdb) thread apply all bt Thread 5 (process 4869): #0 0x00002b2fee6cae74 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b2fee6c68c5 in _L_lock_965 () from /lib64/libpthread.so.0 #2 0x00002b2fee6c6727 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b2fec7deb90 in qpid::sys::Mutex::lock (this=0x2b2ff322a118) at qpid/sys/posix/Mutex.h:116 #4 0x00002b2fec8be93b in qpid::management::ManagementBroker::periodicProcessing (this=0x2b2ff322a010) at qpid/sys/Mutex.h:33 #5 0x00002b2fec8bf558 in qpid::management::ManagementBroker::Periodic::fire (this=0x2aaac2d73dd0) at qpid/management/ManagementBroker.cpp:252 #6 0x00002b2fec8a48ca in qpid::broker::Timer::run (this=0x2b2ff322a140) at qpid/broker/Timer.cpp:67 #7 0x00002b2fecc8ec4a in runRunnable (p=0x2b2ff322a118) at qpid/sys/posix/Thread.cpp:35 #8 0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0 #9 0x00002b2fee43a0ad in clone () from /lib64/libc.so.6 Thread 4 (process 4870): #0 0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cad68) at qpid/sys/posix/Condition.h:69 #2 0x00002b2fecc8ec4a in runRunnable (p=0x1a9cad9c) at qpid/sys/posix/Thread.cpp:35 #3 0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b2fee43a0ad in clone () from /lib64/libc.so.6 Thread 3 (process 4871): #0 0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cae30) at qpid/sys/posix/Condition.h:69 #2 0x00002b2fecc8ec4a in runRunnable (p=0x1a9cae64) at qpid/sys/posix/Thread.cpp:35 #3 0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b2fee43a0ad in clone () from /lib64/libc.so.6 Thread 2 (process 4873): #0 0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9d8ae0) at qpid/sys/posix/Condition.h:69 #2 0x00002b2fecc8ec4a in runRunnable (p=0x1a9d8b14) at qpid/sys/posix/Thread.cpp:35 #3 0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b2fee43a0ad in clone () from /lib64/libc.so.6 Thread 1 (process 4868): #0 0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6 #1 0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6 #2 0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.", __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683 ---Type <return> to continue, or q <return> to quit--- #3 0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0, severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241 #4 0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93 #5 0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101 #6 0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42 #7 0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416 #8 0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0) at /usr/include/boost/function/function_template.hpp:576 #9 0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>) at /usr/include/c++/4.1.2/bits/stl_algo.h:159 #10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45 #11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337 #12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6 #13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0 #14 0x0000000000000000 in ?? () (gdb) bt #0 0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6 #1 0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6 #2 0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.", __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683 #3 0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0, severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241 #4 0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93 #5 0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101 #6 0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42 #7 0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416 #8 0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0) at /usr/include/boost/function/function_template.hpp:576 #9 0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>) at /usr/include/c++/4.1.2/bits/stl_algo.h:159 #10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45 #11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337 #12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6 #13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0 #14 0x0000000000000000 in ?? () (gdb) quit package list: [root@intel-d3c4702-02 qpid_test_mnode_cluster]# rpm -qa | egrep '(qpid|rhm|openai)' | sort -u openais-0.80.3-22.el5_3.4 openais-debuginfo-0.80.3-22.el5_3.4 openais-devel-0.80.3-22.el5_3.4 python-qpid-0.5.758389-1.el5 qpidc-0.5.752581-3.el5 qpidc-debuginfo-0.5.752581-3.el5 qpidc-devel-0.5.752581-3.el5 qpidc-perftest-0.5.752581-3.el5 qpidc-rdma-0.5.752581-3.el5 qpidc-ssl-0.5.752581-3.el5 qpidd-0.5.752581-3.el5 qpidd-acl-0.5.752581-3.el5 qpidd-cluster-0.5.752581-3.el5 qpidd-devel-0.5.752581-3.el5 qpidd-rdma-0.5.752581-3.el5 qpidd-ssl-0.5.752581-3.el5 qpidd-xml-0.5.752581-3.el5 qpid-java-client-0.5.751061-1.el5 qpid-java-common-0.5.751061-1.el5 rhm-0.5.3206-1.el5 rhm-docs-0.5.756148-1.el5
Marking this urgent until we understand what the cause is more clearly and therefore can assess likely impact.
I partially succeeded to reproduce the issue: part a] framing error reproduced part b] qpidd crash not reproduced The part a] data are stored in mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_fail090401.tar.bz2 the test logfile is here mnt/sharedfs/qpid_test_mnode_cluster/server.log the qpidd with framing error is this one: mnt/sharedfs/qpid_test_mnode_cluster/intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log all qpidd instances started during test are: intel-d3c4702-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:26 notice Listening on TCP port 38872 intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:51:15 notice Listening on TCP port 57386 intel-d3c4702-02.rhts.bos.redhat.com/j0007.sh.log:2009-mar-31 12:51:55 notice Listening on TCP port 54610 intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:15 notice Listening on TCP port 47148
Retested on mrg7-9 in clean network environment and neither framing-error nor qpidd crash detected. The test was looping for about 5x longer that longest run. I propose to put to VERIFIED based on this test. Could you review Alan?
Agreed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html