Bug 485422 - clustered qpidd with persistance - error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
clustered qpidd with persistance - error Error decoding events: framing-error...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
urgent Severity high
: 1.1.1
: ---
Assigned To: Alan Conway
Frantisek Reznicek
:
Depends On: 485689
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-13 09:41 EST by Frantisek Reznicek
Modified: 2015-11-15 19:06 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:16:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2009-02-13 09:41:27 EST
Description of problem:
The multihost clustered qpidd with persistance configuration suffer from 
error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
issue after few minutes of running my qpid_test_mnode_cluster test.

packages stored in:
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2


Version-Release number of selected component (if applicable):
qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5
and also
qpidd-0.4.743861-1.el5 rhm-0.4.3116-1.el5
How reproducible:
80%

Steps to Reproduce:
run qpid_test_mnode_cluster or maybe run perftest against clustered qpidd
1. extract qpid_test_mnode_cluster_framing_issue*.tar.bz2 to a directory X
2. find another 1-2 machines and create that directory X shared among 
   them via nfs
3. run s.sh on all three notes, wait untill all nodes get over compiling
   clients and finally start master m.sh on the node which is exporting dir X
4. wait approx 10 mins and it should hang, terminate all processes and
   investigate on qpidd logfiles  

Actual results:
error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113) issue

Expected results:
no issue is expected

Additional info:
 packages stored in:
mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090212.tar.bz2
qpidd with framing error:
  qpid_test_mnode_cluster/dell-pe2850-01.rhts.bos.redhat.com/j0003.sh.log:46

mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_framing_issue090213.tar.bz2
qpidd with framing error:
  qpid_test_mnode_cluster/dell-pesc430-03.rhts.bos.redhat.com/j0000.sh.log:2285657
Comment 1 Alan Conway 2009-02-13 20:00:04 EST
This looks like a problem with unpatched version of openais. Do  you still see the problem with the following version: 

http://porkchop.redhat.com/brewroot/packages/openais/0.80.5/1.el5/
Comment 2 Frantisek Reznicek 2009-02-16 06:12:45 EST
Not able to go further because of Bug 485689 encountered.
Comment 4 Frantisek Reznicek 2009-03-30 05:40:41 EDT
Retested on latest packages and the issue is still reproducible:

[root@intel-d3c4702-02 qpid_test_mnode_cluster]# tail intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log
2009-mar-30 05:24:26 info 10.16.64.100:4868(READY) member update: 10.16.64.99:5729(member) 10.16.64.100:4868(member)
2009-mar-30 05:25:35 info 10.16.64.100:4868(OFFER) send update-offer to 10.16.64.58:3255
2009-mar-30 05:25:35 info 10.16.64.100:4868(READY) cancelled update offer to 10.16.64.58:3255
2009-mar-30 05:27:39 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.99:5729(member) 10.16.64.100:4868(member)
2009-mar-30 05:30:55 info 10.16.64.100:4868(READY) member update: 10.16.64.58:3255(member) 10.16.64.100:4868(member)
2009-mar-30 05:31:19 info 10.16.64.100:4868(READY) member update: 10.16.64.100:4868(member)
2009-mar-30 05:31:19 notice 10.16.64.100:4868(READY) last broker standing, update queue policies
2009-mar-30 05:31:19 error Error decoding events: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:113)
2009-mar-30 05:31:19 notice 10.16.64.100:4868(LEFT) leaving cluster qpid_mnode_cluster_test
j0000.sh: line 1:  4868 Segmentation fault      (core dumped) qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0


[root@intel-d3c4702-02 qpid_test_mnode_cluster]# gdb `which qpidd` ./intel-d3c4702-02.rhts.bos.redhat.com/core.4868
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /usr/lib64/libqpidbroker.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.0.1.0.debug...done.
done.
...
Loaded symbols for /usr/lib64/qpid/daemon/acl.so
Core was generated by `qpidd --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --dat'.
Program terminated with signal 11, Segmentation fault.
[New process 4868]
[New process 4873]
[New process 4871]
[New process 4870]
[New process 4869]
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
(gdb) thread apply all bt

Thread 5 (process 4869):
#0  0x00002b2fee6cae74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b2fee6c68c5 in _L_lock_965 () from /lib64/libpthread.so.0
#2  0x00002b2fee6c6727 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b2fec7deb90 in qpid::sys::Mutex::lock (this=0x2b2ff322a118) at qpid/sys/posix/Mutex.h:116
#4  0x00002b2fec8be93b in qpid::management::ManagementBroker::periodicProcessing (this=0x2b2ff322a010) at qpid/sys/Mutex.h:33
#5  0x00002b2fec8bf558 in qpid::management::ManagementBroker::Periodic::fire (this=0x2aaac2d73dd0) at qpid/management/ManagementBroker.cpp:252
#6  0x00002b2fec8a48ca in qpid::broker::Timer::run (this=0x2b2ff322a140) at qpid/broker/Timer.cpp:67
#7  0x00002b2fecc8ec4a in runRunnable (p=0x2b2ff322a118) at qpid/sys/posix/Thread.cpp:35
#8  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#9  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 4 (process 4870):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cad68) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9cad9c) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 3 (process 4871):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9cae30) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9cae64) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 2 (process 4873):
#0  0x00002b2fee6c8b00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2fec8a4c0f in qpid::broker::Timer::run (this=0x1a9d8ae0) at qpid/sys/posix/Condition.h:69
#2  0x00002b2fecc8ec4a in runRunnable (p=0x1a9d8b14) at qpid/sys/posix/Thread.cpp:35
#3  0x00002b2fee6c4367 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b2fee43a0ad in clone () from /lib64/libc.so.6

Thread 1 (process 4868):
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
#1  0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.",
    __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
---Type <return> to continue, or q <return> to quit---
#3  0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0,
    severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241
#4  0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93
#5  0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101
#6  0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42
#7  0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416
#8  0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0)
    at /usr/include/boost/function/function_template.hpp:576
#9  0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>)
    at /usr/include/c++/4.1.2/bits/stl_algo.h:159
#10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45
#11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337
#12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6
#13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#14 0x0000000000000000 in ?? ()
(gdb) bt
#0  0x00002b2fee3e27e4 in memcpy () from /lib64/libc.so.6
#1  0x00002b2fedc71fa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x2b2fec8d0d6b "console.event.1.0.",
    __rhs=@0x2b2fecb219f8) at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00002b2fec8b448b in qpid::management::ManagementBroker::raiseEvent (this=0x2b2ff322a010, event=@0x7fffbe5a73b0,
    severity=<value optimized out>) at qpid/management/ManagementBroker.cpp:241
#4  0x00002b2fec815fd2 in ~Connection (this=0x2aaacfa61c28) at qpid/broker/Connection.cpp:93
#5  0x00002b2fef1ab220 in ~Connection (this=0x2aaacfa61b00) at qpid/cluster/Connection.cpp:101
#6  0x00002b2fef18493a in ~Cluster (this=0x1a9cd5b0) at qpid/RefCounted.h:42
#7  0x00002b2fef17bff5 in qpid::cluster::Cluster::brokerShutdown (this=0x1a9cd5b0) at qpid/cluster/Cluster.cpp:416
#8  0x00002b2fecca9a2f in boost::function0<void, std::allocator<void> >::operator() (this=0x0)
    at /usr/include/boost/function/function_template.hpp:576
#9  0x00002b2fecca98b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> (__first=<value optimized out>, __last={_M_current = 0x1a9db0d0}, __f=0x2b2fecca8fe0 <invoke>)
    at /usr/include/c++/4.1.2/bits/stl_algo.h:159
#10 0x00002b2fecca8f80 in qpid::Plugin::Target::finalize (this=0x1a9ca9a8) at qpid/Plugin.cpp:45
#11 0x00002b2fec7e5ae9 in ~Broker (this=0x1a9ca9a0) at qpid/broker/Broker.cpp:337
#12 0x00002b2fee39a63e in __cxa_finalize () from /lib64/libc.so.6
#13 0x00002b2fec7a4d86 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#14 0x0000000000000000 in ?? ()
(gdb) quit


package list:
[root@intel-d3c4702-02 qpid_test_mnode_cluster]# rpm -qa  | egrep '(qpid|rhm|openai)' | sort -u
openais-0.80.3-22.el5_3.4
openais-debuginfo-0.80.3-22.el5_3.4
openais-devel-0.80.3-22.el5_3.4
python-qpid-0.5.758389-1.el5
qpidc-0.5.752581-3.el5
qpidc-debuginfo-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidc-ssl-0.5.752581-3.el5
qpidd-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-cluster-0.5.752581-3.el5
qpidd-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidd-xml-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5
Comment 5 Gordon Sim 2009-03-30 08:56:43 EDT
Marking this urgent until we understand what the cause is more clearly and therefore can assess likely impact.
Comment 6 Frantisek Reznicek 2009-04-01 03:26:24 EDT
I partially succeeded to reproduce the issue:
part a] framing error reproduced
part b] qpidd crash not reproduced

The part a] data are stored in mrg3.lab.bos.redhat.com:/root/qpid_test_mnode_cluster_fail090401.tar.bz2

the test logfile is here mnt/sharedfs/qpid_test_mnode_cluster/server.log
the qpidd with framing error is this one:
mnt/sharedfs/qpid_test_mnode_cluster/intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log

all qpidd instances started during test are:
intel-d3c4702-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:26 notice Listening on TCP port 38872
intel-d3c4702-02.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:51:15 notice Listening on TCP port 57386
intel-d3c4702-02.rhts.bos.redhat.com/j0007.sh.log:2009-mar-31 12:51:55 notice Listening on TCP port 54610
intel-d3x1311-01.rhts.bos.redhat.com/j0000.sh.log:2009-mar-31 12:50:15 notice Listening on TCP port 47148
Comment 7 Frantisek Reznicek 2009-04-02 04:17:01 EDT
Retested on mrg7-9 in clean network environment and neither framing-error nor qpidd crash detected. The test was looping for about 5x longer that longest run.

I propose to put to VERIFIED based on this test.

Could you review Alan?
Comment 8 Alan Conway 2009-04-02 10:20:49 EDT
Agreed.
Comment 10 errata-xmlrpc 2009-04-21 12:16:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.