Bug 485422 - clustered qpidd with persistance - error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
Summary: clustered qpidd with persistance - error Error decoding events: framing-error...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.1.1
: ---
Assignee: Alan Conway
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On: 485689
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-13 14:41 UTC by Frantisek Reznicek
Modified: 2015-11-16 00:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:16:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Frantisek Reznicek 2009-02-13 14:41:27 UTC
Description of problem:
The multihost clustered qpidd with persistance configuration suffer from 
error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
issue after few minutes of running my qpid_test_mnode_cluster test.

packages stored in:
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2


Version-Release number of selected component (if applicable):
qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5
and also
qpidd-0.4.743861-1.el5 rhm-0.4.3116-1.el5
How reproducible:
80%

Steps to Reproduce:
run qpid_test_mnode_cluster or maybe run perftest against clustered qpidd
1. extract qpid_test_mnode_cluster_framing_issue*.tar.bz2 to a directory X
2. find another 1-2 machines and create that directory X shared among 
   them via nfs
3. run s.sh on all three notes, wait untill all nodes get over compiling
   clients and finally start master m.sh on the node which is exporting dir X
4. wait approx 10 mins and it should hang, terminate all processes and
   investigate on qpidd logfiles  

Actual results:
error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113) issue

Expected results:
no issue is expected

Additional info:
 packages stored in:
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2
qpidd with framing error:
  qpid_test_mnode_cluster/dell-pe2850-01.rhts.bos.redhat.com/j0003.sh.log:46

mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2
qpidd with framing error:
  qpid_test_mnode_cluster/dell-pesc430-03.rhts.bos.redhat.com/j0000.sh.log:2285657

Comment 1 Alan Conway 2009-02-14 01:00:04 UTC
This looks like a problem with unpatched version of openais. Do  you still see the problem with the following version: 

http://porkchop.redhat.com/brewroot/packages/openais/0.80.5/1.el5/

Comment 2 Frantisek Reznicek 2009-02-16 11:12:45 UTC
Not able to go further because of Bug 485689 encountered.

Comment 4 Frantisek Reznicek 2009-03-30 09:40:41 UTC
Retested on latest packages and the issue is still reproducible:

[root@intel-d3c4702-02 qpid_test_mnode_cluster]# tail intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log
2009-mar-30 05:24:26 info 10.16.64.100:4868(READY) member update: 10.16.64.99:5729(member) 10.16.64.100:4868(member)
2009-mar-30 05:25:35 info 10.16.64.100:4868(OFFER) send update-offer to 10.16.64.58:3255
2009-mar-30 05:25:35 info 10.16.64.100:4868(READY) cancelled update offer to 10.16.64.58:3255
2009-mar-30 05:27:39 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.99:5729(member) 10.16.64.100:4868(member)
2009-mar-30 05:30:55 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.100:4868(member)
2009-mar-30 05:31:19 info 10.16.64.100:4868(READY) member update: 10.16.64.100:4868(member)
2009-mar-30 05:31:19 notice 10.16.64.100:4868(READY) last broker standing, update queue policies
2009-mar-30 05:31:19 error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
2009-mar-30 05:31:19 notice 10.16.64.100:4868(LEFT) leaving cluster qpid_mnode_cluster_test
j0000.sh: line 1:  4868 Segmentation fault      (core dumped) qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0


[root@intel-d3c4702-02 qpid_test_mnode_cluster]# gdb `which qpidd` ./intel-d3c4702-02.rhts.bos.redhat.com/core.4868
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /usr/lib64/libqpidbroker.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.0.1.0.debug...done.
done.
...
Loaded symbols for /usr/lib64/qpid/daemon/acl.so
Core was generated by `qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --dat'.
Program terminated with signal 11, Segmentation fault.
[New process 4868]
[New process 4873]
[New process 4871]
[New process 4870]
[New process 4869]
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
(gdb) thread apply all bt

Thread 5 (process 4869):
#0  0x00002b2fee6cae74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b2fee6c68c5 in _L_lock_965 () from /lib64/libpthread.so.0
#2  0x00002b2fee6c6727 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b2fec7deb90 in qpid::sys::Mutex::lock (this=0x2b2ff322a118) at qpid/sys/posix/Mutex.h:116
#4  0x00002b2fec8be93b in qpid::management::ManagementBroker::periodicProcessing (this=0x2b2ff322a010) at qpid/sys/Mutex.h:33
#5  0x00002b2fec8bf558 in qpid::management::ManagementBroker::Periodic::fire (this=0x2aaac2d73dd0) at qpid/management/ManagementBroker.cpp:252
#6  0x00002b2fec8a48ca in qpid::broker::Timer::run (this=0x2b2ff322a140) at qpid/broker/Timer.cpp:67
#7  0x00002b2fecc8ec4a in runRunnable (p=0x2b2ff322a118) at qpid/sys/posix/Thread.cpp:35
#8  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#9  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 4 (process 4870):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cad68) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9cad9c) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 3 (process 4871):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cae30) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9cae64) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 2 (process 4873):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9d8ae0) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9d8b14) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 1 (process 4868):
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
#1  0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.",
    __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
---Type <return> to continue, or q <return> to quit---
#3  0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0,
    severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241
#4  0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93
#5  0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101
#6  0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42
#7  0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416
#8  0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0)
    at /usr/include/boost/function/function_template.hpp:576
#9  0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>)
    at /usr/include/c++/4.1.2/bits/stl_algo.h:159
#10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45
#11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337
#12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6
#13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#14 0x0000000000000000 in ?? ()
(gdb) bt
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
#1  0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.",
    __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0,
    severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241
#4  0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93
#5  0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101
#6  0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42
#7  0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416
#8  0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0)
    at /usr/include/boost/function/function_template.hpp:576
#9  0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>)
    at /usr/include/c++/4.1.2/bits/stl_algo.h:159
#10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45
#11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337
#12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6
#13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#14 0x0000000000000000 in ?? ()
(gdb) quit


package list:
[root@intel-d3c4702-02 qpid_test_mnode_cluster]# rpm -qa  | egrep '(qpid|rhm|openai)' | sort -u
openais-0.80.3-22.el5_3.4
openais-debuginfo-0.80.3-22.el5_3.4
openais-devel-0.80.3-22.el5_3.4
python-qpid-0.5.758389-1.el5
qpidc-0.5.752581-3.el5
qpidc-debuginfo-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidc-ssl-0.5.752581-3.el5
qpidd-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-cluster-0.5.752581-3.el5
qpidd-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidd-xml-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5

Comment 5 Gordon Sim 2009-03-30 12:56:43 UTC
Marking this urgent until we understand what the cause is more clearly and therefore can assess likely impact.

Comment 6 Frantisek Reznicek 2009-04-01 07:26:24 UTC
I partially succeeded to reproduce the issue:
part a] framing error reproduced
part b] qpidd crash not reproduced

The part a] data are stored in mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_fail090401.tar.bz2

the test logfile is here mnt/sharedfs/qpid_test_mnode_cluster/server.log
the qpidd with framing error is this one:
mnt/sharedfs/qpid_test_mnode_cluster/intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log

all qpidd instances started during test are:
intel-d3c4702-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:26 notice Listening on TCP port 38872
intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:51:15 notice Listening on TCP port 57386
intel-d3c4702-02.rhts.bos.redhat.com/j0007.sh.log:2009-mar-31 12:51:55 notice Listening on TCP port 54610
intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:15 notice Listening on TCP port 47148

Comment 7 Frantisek Reznicek 2009-04-02 08:17:01 UTC
Retested on mrg7-9 in clean network environment and neither framing-error nor qpidd crash detected. The test was looping for about 5x longer that longest run.

I propose to put to VERIFIED based on this test.

Could you review Alan?

Comment 8 Alan Conway 2009-04-02 14:20:49 UTC
Agreed.

Comment 10 errata-xmlrpc 2009-04-21 16:16:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.