Created attachment 452096 [details] logs from test Description of problem: During running of './cluster_authentication_soak 1' sometimes one of the brokers crashes, this can be seen especially on slow machines. Version-Release number of selected component (if applicable): qpid-dotnet-0.4.738274-2.el5 qpid-cpp-server-rdma-0.7.946106-17.el5 qpid-cpp-mrg-debuginfo-0.7.946106-17.el5 qpid-java-common-0.7.946106-10.el5 qpid-tools-0.7.946106-11.el5 qpid-cpp-server-0.7.946106-17.el5 qpid-cpp-client-devel-0.7.946106-17.el5 qpid-cpp-server-cluster-0.7.946106-17.el5 python-qpid-0.7.946106-14.el5 qpid-cpp-client-rdma-0.7.946106-17.el5 qpid-cpp-server-ssl-0.7.946106-17.el5 qpid-cpp-server-devel-0.7.946106-17.el5 qpid-cpp-server-xml-0.7.946106-17.el5 qpid-java-client-0.7.946106-10.el5 qpid-cpp-client-ssl-0.7.946106-17.el5 qpid-cpp-client-0.7.946106-17.el5 qpid-cpp-client-devel-docs-0.7.946106-17.el5 How reproducible: 1% Steps to Reproduce: 1. raise system load e.g. run n-times 'yes > /dev/null &' 2. run './cluster_authentication_soak 1' in loop Actual results: in broker log it can be found something like this: 2010-10-06 17:50:16 critical cluster(ip-address1:6360 UPDATEE) catch-up connection closed prematurely ip-address2:51251(ip-address1:6360-1 local,catchup) Expected results: no unexpected broker shut-down Additional info:
This is expected, cluster_authentication_soak does not check that updates are complete before killing a broker, if it kills the broker giving the update the new broker receiving the update will exit with this error. The test should be fixed to avoid the error message or document the fact that it is expected. Re-assigning to Mick to fix the test.
(In reply to comment #1) > This is expected, cluster_authentication_soak does not check that updates are > complete before killing a broker, if it kills the broker giving the update the > new broker receiving the update will exit with this error. > > The test should be fixed to avoid the error message or document the fact that > it is expected. Re-assigning to Mick to fix the test. I am not sure if this is the case, cause the broker stops even before the perftest starts (run 169 and 216 in test.log). Or for run 107 the cluster_authentication_soak prints 'not all brokers are alive.', which is a message printed before killing brokers.