Bug 889241 - Cluster de-sync due to unknown connection
Cluster de-sync due to unknown connection
Status: NEW
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
2.2
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alan Conway
MRG Quality Engineering
: TestCaseProvided
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-20 10:19 EST by Pavel Moravec
Modified: 2015-02-01 18:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
reproducer (2.22 KB, application/x-gzip)
2012-12-20 10:26 EST, Pavel Moravec
no flags Details

  None (edit)
Description Pavel Moravec 2012-12-20 10:19:40 EST
Description of problem:
Running a reproducer to be attached, I got one node shutdown with log:

2012-12-20 14:44:55 critical Error delivering frames: Unknown connection: Frame[BEbe; channel=0; {ClusterConnectionDeliverDoOutputBody: limit=2048; }] control 10.34.1.241:24393-3125 (qpid/cluster/Cluster.cpp:554)

Random node crashes, usually within 30 minutes.


Version-Release number of selected component (if applicable):
qpid-cpp-server-0.14-22.el6_3.x86_64


How reproducible:
30% (more time, more probably)


Steps to Reproduce:
Run reproducer to be attached

  
Actual results:
After some time, one node stops with above critical error.


Expected results:
No cluster de-sync.


Additional info:
Comment 1 Pavel Moravec 2012-12-20 10:26:21 EST
Created attachment 666734 [details]
reproducer

To reproduce:
1) unpack
2) Translate some auxiliary C++ client:
g++ -lqpidclient -lqpidcommon -lqpidmessaging -lqpidtypes OptionParser.o spout_drain_in_one_session.cpp -o spout_drain_in_one_session
3) Have qpid-receive in $PATH
4) ./889241_reproducer.sh


In nutshell, the reproducer sends and consumes messages to/from 30 durable queues in parallel, such that the queues are usually almost empty. Normal 3node (old) cluster is used.
Comment 2 Pavel Moravec 2012-12-20 10:36:16 EST
Sometimes, a broker shutted down due to another reason while using the same reproducer:

2012-12-20 15:19:47 error Execution exception: invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219)
2012-12-20 15:19:47 critical cluster(10.34.1.241:18980 READY/error) local error 10152172 did not occur on member 10.34.1.241:18977: invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219)
2012-12-20 15:19:47 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.69a3b08c-c6ec-47fa-bc5b-d2be5a725b7d: Known-completed has invalid commands. (qpid/SessionState.cpp:219) (qpid/cluster/ErrorCheck.cpp:89)
2012-12-20 15:19:47 notice cluster(10.34.1.241:18980 LEFT/error) leaving cluster dst
2012-12-20 15:19:47 critical Error in cluster dispatch: Error in CPG dispatch: library (2)
2012-12-20 15:19:47 notice Shut down
Comment 3 Pavel Moravec 2012-12-20 10:38:30 EST
(In reply to comment #1)
> Created attachment 666734 [details]
> reproducer
> 
> To reproduce:
> 1) unpack
> 2) Translate some auxiliary C++ client:
> g++ -lqpidclient -lqpidcommon -lqpidmessaging -lqpidtypes OptionParser.o
> spout_drain_in_one_session.cpp -o spout_drain_in_one_session
> 3) Have qpid-receive in $PATH
> 4) ./889241_reproducer.sh
> 
> 
> In nutshell, the reproducer sends and consumes messages to/from 30 durable
> queues in parallel, such that the queues are usually almost empty. Normal
> 3node (old) cluster is used.

me-- forgot to attach OptionsParser: take it from qpid-cpp-client-devel package
Comment 4 Pavel Moravec 2013-01-07 11:02:23 EST
This scenario does come from my internal testing and is not based by a customer user case.

Note You need to log in before you can comment on or make changes to this bug.