Bug 889241 - Cluster de-sync due to unknown connection
Summary: Cluster de-sync due to unknown connection
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-20 15:19 UTC by Pavel Moravec
Modified: 2022-06-30 22:50 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer (2.22 KB, application/x-gzip)
2012-12-20 15:26 UTC, Pavel Moravec
no flags Details

Description Pavel Moravec 2012-12-20 15:19:40 UTC
Description of problem:
Running a reproducer to be attached, I got one node shutdown with log:

2012-12-20 14:44:55 critical Error delivering frames: Unknown connection: Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=2048; }] control 10.34.1.241:24393-3125 (qpid/cluster/Cluster.cpp:554)

Random node crashes, usually within 30 minutes.


Version-Release number of selected component (if applicable):
qpid-cpp-server-0.14-22.el6_3.x86_64


How reproducible:
30% (more time, more probably)


Steps to Reproduce:
Run reproducer to be attached

  
Actual results:
After some time, one node stops with above critical error.


Expected results:
No cluster de-sync.


Additional info:

Comment 1 Pavel Moravec 2012-12-20 15:26:21 UTC
Created attachment 666734 [details]
reproducer

To reproduce:
1) unpack
2) Translate some auxiliary C++ client:
g++ -lqpidclient -lqpidcommon -lqpidmessaging -lqpidtypes OptionParser.o spout_drain_in_one_session.cpp -o spout_drain_in_one_session
3) Have qpid-receive in $PATH
4) ./889241_reproducer.sh


In nutshell, the reproducer sends and consumes messages to/from 30 durable queues in parallel, such that the queues are usually almost empty. Normal 3node (old) cluster is used.

Comment 2 Pavel Moravec 2012-12-20 15:36:16 UTC
Sometimes, a broker shutted down due to another reason while using the same reproducer:

2012-12-20 15:19:47 error Execution exception: invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219)
2012-12-20 15:19:47 critical cluster(10.34.1.241:18980 READY/error) local error 10152172 did not occur on member 10.34.1.241:18977: invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219)
2012-12-20 15:19:47 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219) (qpid/cluster/ErrorCheck.cpp:89)
2012-12-20 15:19:47 notice cluster(10.34.1.241:18980 LEFT/error) leaving cluster dst
2012-12-20 15:19:47 critical Error in cluster dispatch: Error in CPG dispatch: library (2)
2012-12-20 15:19:47 notice Shut down

Comment 3 Pavel Moravec 2012-12-20 15:38:30 UTC
(In reply to comment #1)
> Created attachment 666734 [details]
> reproducer
> 
> To reproduce:
> 1) unpack
> 2) Translate some auxiliary C++ client:
> g++ -lqpidclient -lqpidcommon -lqpidmessaging -lqpidtypes OptionParser.o
> spout_drain_in_one_session.cpp -o spout_drain_in_one_session
> 3) Have qpid-receive in $PATH
> 4) ./889241_reproducer.sh
> 
> 
> In nutshell, the reproducer sends and consumes messages to/from 30 durable
> queues in parallel, such that the queues are usually almost empty. Normal
> 3node (old) cluster is used.

me-- forgot to attach OptionsParser: take it from qpid-cpp-client-devel package

Comment 4 Pavel Moravec 2013-01-07 16:02:23 UTC
This scenario does come from my internal testing and is not based by a customer user case.


Note You need to log in before you can comment on or make changes to this bug.