| Summary: | Inconsistent management messages in a cluster, test fails sporadically | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Alan Conway <aconway> | ||||
| Component: | qpid-cpp | Assignee: | Alan Conway <aconway> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Reznicek <freznice> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 1.3 | CC: | esammons, freznice, gsim, iboverma, jneedle, tross | ||||
| Target Milestone: | 1.3.2 | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qpid-cpp-mrg-0.7.946106-28 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-02-15 12:11:38 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Fixed by the following upstream revisions: 1066220 QPID-3007: Unique management identifier for connections. 1066219 QPID-3007: Ignore expected connection close warning in cluster_test_logs.py 1066217 QPID-3007: Don't hold on to consumer shared-pointers in UpdateClient::consumerNumbering 1066215 QPID-3007: Don't record management statistics in cluster-unsafe contexts. Created attachment 476636 [details]
Reproducer that can be run on qpid installed from RPMs
The runme.sh script runs the test in a loop. Prior to the fix the test was failing every 4-5 iterations. With the fix it has not failed during an overnight run.
The issue in under test atm. Alan,
could you possibly confirm that the issue you saw is following, please?
(the below dump is from test ran on -27)
...
cluster_tests.LongTests.test_management
................................................................................................................
pass
cluster_tests.LongTests.test_management_qmf2
...........................................................................................................
fail
Error during test:
Traceback (most recent call last):
File "./qpid-python-test", line 311, in run
phase()
File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 454, in
test_management_qmf2
self.test_management(args=["--mgmt-qmf2=yes"])
File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 451, in
test_management
cluster_test_logs.verify_logs()
File "/root/bz/bz674338/cluster_mgmt_674338/cluster_test_logs.py", line
106, in verify_logs
raise Exception("Files differ in %s"%(os.getcwd())+"".join(errors))
Exception: Files differ in
/root/bz/bz674338/cluster_mgmt_674338/brokertest.tmp/cluster_tests.LongTests.test_management_qmf2
cluster1-24.log.filter.8173859 cluster1-23.log.filter.8173859
Totals: 2 tests, 1 passed, 0 skipped, 0 ignored, 1 failed
Moreover, it is expected to execute just tests
cluster_tests.LongTests.test_management and
cluster_tests.LongTests.test_management_qmf2 for this defect?
Current ongoing testing indicate the issue is fixed...
Yes that is the issue.
> Moreover, it is expected to execute just tests
> cluster_tests.LongTests.test_management and
> cluster_tests.LongTests.test_management_qmf2 for this defect?
Those are the only tests that reliably show the defect. In sporadic cases where qpid-tool is used with a cluster it can cause brokers to exit with an "invalid-arg" error but I have not been able to reproduce that reliably.
Thanks Alan, I was able to reproduce the issue reliable on the -27 build and spin the tests with extended duration to prove that the issue has been fixed (on -28). The extensive testing in parallel on 6 machines (in total time over 25 hours, over 150 runs) The issue has been fixed, tested on RHEL 5.6 i386 / x86_64 on packages: python-qpid-0.7.946106-15.el5.noarch qpid-cpp-client-0.7.946106-28.el5.i386 qpid-cpp-client-devel-0.7.946106-28.el5.i386 qpid-cpp-client-devel-docs-0.7.946106-28.el5.i386 qpid-cpp-client-rdma-0.7.946106-28.el5.i386 qpid-cpp-client-ssl-0.7.946106-28.el5.i386 qpid-cpp-mrg-debuginfo-0.7.946106-28.el5.i386 qpid-cpp-server-0.7.946106-28.el5.i386 qpid-cpp-server-cluster-0.7.946106-28.el5.i386 qpid-cpp-server-devel-0.7.946106-28.el5.i386 qpid-cpp-server-rdma-0.7.946106-28.el5.i386 qpid-cpp-server-ssl-0.7.946106-28.el5.i386 qpid-cpp-server-store-0.7.946106-28.el5.i386 qpid-cpp-server-xml-0.7.946106-28.el5.i386 qpid-dotnet-0.4.738274-2.el5.i386 qpid-java-client-0.7.946106-15.el5.noarch qpid-java-common-0.7.946106-15.el5.noarch qpid-java-example-0.7.946106-15.el5.noarch qpid-tests-0.7.946106-1.el5.noarch qpid-tools-0.7.946106-12.el5.noarch rh-qpid-cpp-tests-0.7.946106-28.el5.i386 -> VERIFIED An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0217.html |
Description of problem: An upstream test designed to verify consistent management messages in a cluster is failing due to inconsistencies. Version-Release number of selected component (if applicable): trunk r1060879 How reproducible: moderate - fails about 1 out of 4 times Steps to Reproduce: Running the test in a qpid build: The test is disabled as it does not pass. To enable the test remove these lines from cpp/src/tests/cluster_test_logs.py:91 # FIXME aconway 2011-01-19: disable when called from unit tests # Causing sporadic failures, see https://issues.apache.org/jira/browse/QPID-3007 if __name__ != "__main__": return To run the test in src/tests $ make check TESTS=run_cluster_tests CLUSTER_TESTS='*Long*test_management* -DDURATION=4' &> make-check.log Actual results: test fails about 1/4 times. Expected results: no failures Additional info: https://issues.apache.org/jira/browse/QPID-3007