Hide Forgot
Description of problem: An upstream test designed to verify consistent management messages in a cluster is failing due to inconsistencies. Version-Release number of selected component (if applicable): trunk r1060879 How reproducible: moderate - fails about 1 out of 4 times Steps to Reproduce: Running the test in a qpid build: The test is disabled as it does not pass. To enable the test remove these lines from cpp/src/tests/cluster_test_logs.py:91 # FIXME aconway 2011-01-19: disable when called from unit tests # Causing sporadic failures, see https://issues.apache.org/jira/browse/QPID-3007 if __name__ != "__main__": return To run the test in src/tests $ make check TESTS=run_cluster_tests CLUSTER_TESTS='*Long*test_management* -DDURATION=4' &> make-check.log Actual results: test fails about 1/4 times. Expected results: no failures Additional info: https://issues.apache.org/jira/browse/QPID-3007
Fixed by the following upstream revisions: 1066220 QPID-3007: Unique management identifier for connections. 1066219 QPID-3007: Ignore expected connection close warning in cluster_test_logs.py 1066217 QPID-3007: Don't hold on to consumer shared-pointers in UpdateClient::consumerNumbering 1066215 QPID-3007: Don't record management statistics in cluster-unsafe contexts.
Created attachment 476636 [details] Reproducer that can be run on qpid installed from RPMs The runme.sh script runs the test in a loop. Prior to the fix the test was failing every 4-5 iterations. With the fix it has not failed during an overnight run.
The issue in under test atm.
Alan, could you possibly confirm that the issue you saw is following, please? (the below dump is from test ran on -27) ... cluster_tests.LongTests.test_management ................................................................................................................ pass cluster_tests.LongTests.test_management_qmf2 ........................................................................................................... fail Error during test: Traceback (most recent call last): File "./qpid-python-test", line 311, in run phase() File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 454, in test_management_qmf2 self.test_management(args=["--mgmt-qmf2=yes"]) File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 451, in test_management cluster_test_logs.verify_logs() File "/root/bz/bz674338/cluster_mgmt_674338/cluster_test_logs.py", line 106, in verify_logs raise Exception("Files differ in %s"%(os.getcwd())+"".join(errors)) Exception: Files differ in /root/bz/bz674338/cluster_mgmt_674338/brokertest.tmp/cluster_tests.LongTests.test_management_qmf2 cluster1-24.log.filter.8173859 cluster1-23.log.filter.8173859 Totals: 2 tests, 1 passed, 0 skipped, 0 ignored, 1 failed Moreover, it is expected to execute just tests cluster_tests.LongTests.test_management and cluster_tests.LongTests.test_management_qmf2 for this defect? Current ongoing testing indicate the issue is fixed...
Yes that is the issue. > Moreover, it is expected to execute just tests > cluster_tests.LongTests.test_management and > cluster_tests.LongTests.test_management_qmf2 for this defect? Those are the only tests that reliably show the defect. In sporadic cases where qpid-tool is used with a cluster it can cause brokers to exit with an "invalid-arg" error but I have not been able to reproduce that reliably.
Thanks Alan, I was able to reproduce the issue reliable on the -27 build and spin the tests with extended duration to prove that the issue has been fixed (on -28). The extensive testing in parallel on 6 machines (in total time over 25 hours, over 150 runs) The issue has been fixed, tested on RHEL 5.6 i386 / x86_64 on packages: python-qpid-0.7.946106-15.el5.noarch qpid-cpp-client-0.7.946106-28.el5.i386 qpid-cpp-client-devel-0.7.946106-28.el5.i386 qpid-cpp-client-devel-docs-0.7.946106-28.el5.i386 qpid-cpp-client-rdma-0.7.946106-28.el5.i386 qpid-cpp-client-ssl-0.7.946106-28.el5.i386 qpid-cpp-mrg-debuginfo-0.7.946106-28.el5.i386 qpid-cpp-server-0.7.946106-28.el5.i386 qpid-cpp-server-cluster-0.7.946106-28.el5.i386 qpid-cpp-server-devel-0.7.946106-28.el5.i386 qpid-cpp-server-rdma-0.7.946106-28.el5.i386 qpid-cpp-server-ssl-0.7.946106-28.el5.i386 qpid-cpp-server-store-0.7.946106-28.el5.i386 qpid-cpp-server-xml-0.7.946106-28.el5.i386 qpid-dotnet-0.4.738274-2.el5.i386 qpid-java-client-0.7.946106-15.el5.noarch qpid-java-common-0.7.946106-15.el5.noarch qpid-java-example-0.7.946106-15.el5.noarch qpid-tests-0.7.946106-1.el5.noarch qpid-tools-0.7.946106-12.el5.noarch rh-qpid-cpp-tests-0.7.946106-28.el5.i386 -> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0217.html