Bug 640978 - catch-up connection closed prematurely
Summary: catch-up connection closed prematurely
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-07 12:33 UTC by Lubos Trilety
Modified: 2021-03-03 23:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from test (860.86 KB, application/x-gzip)
2010-10-07 12:33 UTC, Lubos Trilety
no flags Details

Description Lubos Trilety 2010-10-07 12:33:23 UTC
Created attachment 452096 [details]
logs from test

Description of problem:
During running of './cluster_authentication_soak 1' sometimes one of the brokers crashes, this can be seen especially on slow machines.

Version-Release number of selected component (if applicable):
qpid-dotnet-0.4.738274-2.el5
qpid-cpp-server-rdma-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-17.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
python-qpid-0.7.946106-14.el5
qpid-cpp-client-rdma-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5

How reproducible:
1%

Steps to Reproduce:
1. raise system load e.g. run n-times 'yes > /dev/null &'
2. run './cluster_authentication_soak 1' in loop

  
Actual results:
in broker log it can be found something like this:
2010-10-06 17:50:16 critical cluster(ip-address1:6360 UPDATEE) catch-up connection closed prematurely ip-address2:51251(ip-address1:6360-1 local,catchup)

Expected results:
no unexpected broker shut-down

Additional info:

Comment 1 Alan Conway 2010-10-07 13:10:56 UTC
This is expected, cluster_authentication_soak does not check that updates are complete before killing a broker, if it kills the broker giving the update the new broker receiving the update will exit with this error.

The test should be fixed to avoid the error message or document the fact that it is expected. Re-assigning to Mick to fix the test.

Comment 2 Lubos Trilety 2010-10-07 13:27:38 UTC
(In reply to comment #1)
> This is expected, cluster_authentication_soak does not check that updates are
> complete before killing a broker, if it kills the broker giving the update the
> new broker receiving the update will exit with this error.
> 
> The test should be fixed to avoid the error message or document the fact that
> it is expected. Re-assigning to Mick to fix the test.

I am not sure if this is the case, cause the broker stops even before the perftest starts (run 169 and 216 in test.log). Or for run 107 the cluster_authentication_soak prints 'not all brokers are alive.', which is a message printed before killing brokers.


Note You need to log in before you can comment on or make changes to this bug.