Bug 675921
| Summary: | clustered qpidd broker fails ocassionly the cluster_tests.ShortTests.test_route_update | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||||
| Component: | qpid-cpp | Assignee: | Alan Conway <aconway> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Reznicek <freznice> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 1.3 | CC: | aconway, esammons, gsim, jneedle, jross | ||||||
| Target Milestone: | 2.0 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | qpid-cpp-0.9.1079953 qpid-cpp-mrg-0.10-6.el5 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause: Internal test showed management objects for inter-broker bridges could be created inconsistently in a cluster.
Consequence: Could potentially cause brokers to shut down with "invalid argument" error.
Fix: Correct the inconsistency
Result: Bridge management objects are created consistently in a cluster.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 704424 716239 (view as bug list) | Environment: | |||||||
| Last Closed: | 2011-06-23 15:46:50 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 704424, 716239 | ||||||||
| Attachments: |
|
||||||||
Created attachment 477583 [details] The issue reproducer including logs The reproducer is based on https://bugzilla.redhat.com/attachment.cgi?id=476636 attachement and runs with simple ./runme.sh Fixed in r1068554
QPID-3045 - sporadic failure of cluster_tests.ShortTests.test_route_update
Sporadically the test was failing because the session associated with
an inter-broker bridge was created out of order with other
objects. This is unlikely to cause a fatal cluster inconsistency in
practice but it has been corrected in any case. The fix was to delay
creation of the management object for a bridge session till a point
which is consistent on all cluster members.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Internal test showed management objects for inter-broker bridges could be created inconsistently in a cluster.
Consequence: Could potentially cause brokers to shut down with "invalid argument" error.
Fix: Correct the inconsistency
Result: Bridge management objects are created consistently in a cluster.
The issue has been fixed, long-term tested on RHEL 5.6, 6.1s5 i[36]86 / x86_64 on packages: python-qpid-0.10-1.el5.noarch python-qpid-qmf-0.10-6.el5.x86_64 qpid-cpp-client-0.10-4.el5.x86_64 qpid-cpp-client-devel-0.10-4.el5.x86_64 qpid-cpp-client-devel-docs-0.10-4.el5.x86_64 qpid-cpp-client-rdma-0.10-4.el5.x86_64 qpid-cpp-client-ssl-0.10-4.el5.x86_64 qpid-cpp-mrg-debuginfo-0.10-4.el5.x86_64 qpid-cpp-server-0.10-4.el5.x86_64 qpid-cpp-server-cluster-0.10-4.el5.x86_64 qpid-cpp-server-devel-0.10-4.el5.x86_64 qpid-cpp-server-rdma-0.10-4.el5.x86_64 qpid-cpp-server-ssl-0.10-4.el5.x86_64 qpid-cpp-server-store-0.10-4.el5.x86_64 qpid-cpp-server-xml-0.10-4.el5.x86_64 qpid-dotnet-0.4.738274-2.el5.x86_64 qpid-java-client-0.10-4.el5.noarch qpid-java-common-0.10-4.el5.noarch qpid-java-example-0.10-4.el5.noarch qpid-qmf-0.10-6.el5.x86_64 qpid-qmf-debuginfo-0.10-6.el5.x86_64 qpid-qmf-devel-0.10-6.el5.x86_64 qpid-tests-0.10-1.el5.noarch qpid-tools-0.10-4.el5.noarch rh-qpid-cpp-tests-0.10-4.el5.x86_64 ruby-qpid-qmf-0.10-6.el5.x86_64 sesame-0.10-1.el5.x86_64 sesame-debuginfo-0.10-1.el5.x86_64 -> VERIFIED Created attachment 495537 [details]
The issue reproducer including logs
Approximately one hour after I kicked the bug to VERIFIED I triggered single failure on RHEL 6.1s5 x86_64.
I'm attaching data + logs, kicking to ASSIGNED and raising NEEDINFO.
Alan,
could you possibly review the logs please?
The way I ran the tests:
./runme.sh 1000 cluster_tests.ShortTests.test_route_update &>log2
last log is then ./log2
The cluster test is slightly modified to run on packaged binaries. Broker is forced to run with all installed plugins.
The logs show an inconsistency in management messages, I'll look into this. Committed to trunk r1097838 Committed to 2.0.x branch http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=mrg_2.0.x&id=ed150c5e1a3fe5558304d1693a0fbb67afed7858 Committed to aconway-hotfix2 branch: http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=aconway-hotfix2&id=12bbd89ae7932f3bd6bb4e50dcc90810e16a312d Te issue has been fixed, cluster does not fail above unit test anymore (two long stress tests). Tested on RHEL5.6 i386 / x86_64 on packages: python-qpid-0.10-1.el5.noarch python-qpid-qmf-0.10-6.el5.i386 qpid-cpp-client-0.10-6.el5.i386 qpid-cpp-client-devel-0.10-6.el5.i386 qpid-cpp-client-devel-docs-0.10-6.el5.i386 qpid-cpp-client-rdma-0.10-6.el5.i386 qpid-cpp-client-ssl-0.10-6.el5.i386 qpid-cpp-mrg-debuginfo-0.10-6.el5.i386 qpid-cpp-server-0.10-6.el5.i386 qpid-cpp-server-cluster-0.10-6.el5.i386 qpid-cpp-server-devel-0.10-6.el5.i386 qpid-cpp-server-rdma-0.10-6.el5.i386 qpid-cpp-server-ssl-0.10-6.el5.i386 qpid-cpp-server-store-0.10-6.el5.i386 qpid-cpp-server-xml-0.10-6.el5.i386 qpid-dotnet-0.4.738274-2.el5.i386 qpid-java-client-0.10-4.el5.noarch qpid-java-common-0.10-4.el5.noarch qpid-java-example-0.10-4.el5.noarch qpid-java-jca-0.10-1.el5.noarch qpid-qmf-0.10-6.el5.i386 qpid-qmf-debuginfo-0.10-6.el5.i386 qpid-qmf-devel-0.10-6.el5.i386 qpid-tests-0.10-1.el5.noarch qpid-tools-0.10-4.el5.noarch rh-qpid-cpp-tests-0.10-6.el5.i386 ruby-qpid-qmf-0.10-6.el5.i386 sesame-0.10-1.el5.i386 sesame-debuginfo-0.10-1.el5.i386 -> VERIFIED An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0890.html |
Description of problem: When the cluster tests are ran in the loop against the installed rpms, I can reliably see that cluster_tests.ShortTests.test_route_update test case is failing from time to time. cluster_tests.ShortTests.test_route_update .......................................................................................... fail Error during test: Traceback (most recent call last): File "./qpid-python-test", line 311, in run phase() File "/root/cluster_test_bz674338/cluster_tests.py", line 305, in test_route_update cluster_test_logs.verify_logs() File "/root/cluster_test_bz674338/cluster_test_logs.py", line 106, in verify_logs raise Exception("Files differ in %s"%(os.getcwd())+"".join(errors)) Exception: Files differ in /root/cluster_test_bz674338/brokertest.tmp/cluster_tests.ShortTests.test_route_update cluster9-1.log.filter.322 cluster9-0.log.filter.322 Totals: 24 tests, 8 passed, 0 skipped, 15 ignored, 1 failed Version-Release number of selected component (if applicable): [root@mrg-qe-09 _s]# rpm -qa |grep qpid | sort python-qpid-0.7.946106-15.el5 qpid-cpp-client-0.7.946106-28.el5 qpid-cpp-client-devel-0.7.946106-28.el5 qpid-cpp-client-devel-docs-0.7.946106-28.el5 qpid-cpp-client-rdma-0.7.946106-28.el5 qpid-cpp-client-ssl-0.7.946106-28.el5 qpid-cpp-mrg-debuginfo-0.7.946106-28.el5 qpid-cpp-server-0.7.946106-28.el5 qpid-cpp-server-cluster-0.7.946106-28.el5 qpid-cpp-server-devel-0.7.946106-28.el5 qpid-cpp-server-rdma-0.7.946106-28.el5 qpid-cpp-server-ssl-0.7.946106-28.el5 qpid-cpp-server-store-0.7.946106-28.el5 qpid-cpp-server-xml-0.7.946106-28.el5 qpid-dotnet-0.4.738274-2.el5 qpid-java-client-0.7.946106-15.el5 qpid-java-common-0.7.946106-15.el5 qpid-java-example-0.7.946106-15.el5 qpid-tools-0.7.946106-12.el5 rh-qpid-cpp-tests-0.7.946106-28.el5 How reproducible: once per 4-8 runs Steps to Reproduce: 1. ./runmne 2. the test loops and finally hangs with failure Actual results: cluster_tests.ShortTests.test_route_update rarely fails. Expected results: cluster_tests.ShortTests.test_route_update should always pass. Additional info: