Bug 674338 - Inconsistent management messages in a cluster, test fails sporadically
Summary: Inconsistent management messages in a cluster, test fails sporadically
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 1.3.2
: ---
Assignee: Alan Conway
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-01 14:46 UTC by Alan Conway
Modified: 2015-11-16 01:13 UTC (History)
6 users (show)

Fixed In Version: qpid-cpp-mrg-0.7.946106-28
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-15 12:11:38 UTC
Target Upstream Version:


Attachments (Terms of Use)
Reproducer that can be run on qpid installed from RPMs (90.00 KB, application/x-tar)
2011-02-02 19:17 UTC, Alan Conway
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0217 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid bug fix and enhancement update 2011-02-15 12:10:15 UTC

Description Alan Conway 2011-02-01 14:46:21 UTC
Description of problem: An upstream test designed to verify consistent management messages in a cluster is failing due to inconsistencies.


Version-Release number of selected component (if applicable): trunk r1060879


How reproducible: moderate - fails about 1 out of 4 times


Steps to Reproduce:

Running the test in a qpid build: The test is disabled as it does not pass. To enable the test remove these lines from cpp/src/tests/cluster_test_logs.py:91

    # FIXME aconway 2011-01-19: disable when called from unit tests
    # Causing sporadic failures, see https://issues.apache.org/jira/browse/QPID-3007
    if __name__ != "__main__": return

To run the test in src/tests

$ make check TESTS=run_cluster_tests  CLUSTER_TESTS='*Long*test_management* -DDURATION=4' &> make-check.log

Actual results: test fails about 1/4 times.

Expected results: no failures

Additional info: https://issues.apache.org/jira/browse/QPID-3007

Comment 1 Alan Conway 2011-02-01 21:35:05 UTC
Fixed by the following upstream revisions:

1066220 QPID-3007: Unique management identifier for connections.
1066219 QPID-3007: Ignore expected connection close warning in cluster_test_logs.py
1066217 QPID-3007: Don't hold on to consumer shared-pointers in UpdateClient::consumerNumbering
1066215 QPID-3007: Don't record management statistics in cluster-unsafe contexts.

Comment 2 Alan Conway 2011-02-02 19:17:59 UTC
Created attachment 476636 [details]
Reproducer that can be run on qpid installed from RPMs

The runme.sh script runs the test in a loop. Prior to the fix the test was failing every 4-5 iterations. With the fix it has not failed during an overnight run.

Comment 5 Frantisek Reznicek 2011-02-04 09:01:51 UTC
The issue in under test atm.

Comment 7 Frantisek Reznicek 2011-02-04 11:03:16 UTC
Alan,
could you possibly confirm that the issue you saw is following, please?
(the below dump is from test ran on -27)


...
cluster_tests.LongTests.test_management
................................................................................................................
pass
cluster_tests.LongTests.test_management_qmf2
...........................................................................................................
fail
Error during test:
  Traceback (most recent call last):
    File "./qpid-python-test", line 311, in run
      phase()
    File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 454, in
test_management_qmf2
      self.test_management(args=["--mgmt-qmf2=yes"])
    File "/root/bz/bz674338/cluster_mgmt_674338/cluster_tests.py", line 451, in
test_management
      cluster_test_logs.verify_logs()
    File "/root/bz/bz674338/cluster_mgmt_674338/cluster_test_logs.py", line
106, in verify_logs
      raise Exception("Files differ in %s"%(os.getcwd())+"".join(errors))
  Exception: Files differ in
/root/bz/bz674338/cluster_mgmt_674338/brokertest.tmp/cluster_tests.LongTests.test_management_qmf2
      cluster1-24.log.filter.8173859 cluster1-23.log.filter.8173859
Totals: 2 tests, 1 passed, 0 skipped, 0 ignored, 1 failed

Moreover, it is expected to execute just tests
cluster_tests.LongTests.test_management and
cluster_tests.LongTests.test_management_qmf2 for this defect?


Current ongoing testing indicate the issue is fixed...

Comment 8 Alan Conway 2011-02-04 15:08:50 UTC
Yes that is the issue.

> Moreover, it is expected to execute just tests
> cluster_tests.LongTests.test_management and
> cluster_tests.LongTests.test_management_qmf2 for this defect?

Those are the only tests that reliably show the defect. In sporadic cases where qpid-tool is used with a cluster it can cause brokers to exit with an "invalid-arg" error but I have not been able to reproduce that reliably.

Comment 9 Frantisek Reznicek 2011-02-04 16:33:06 UTC
Thanks Alan,
I was able to reproduce the issue reliable on the -27 build and spin the tests with extended duration to prove that the issue has been fixed (on -28).

The extensive testing in parallel on 6 machines (in total time over 25 hours, over 150 runs)

The issue has been fixed, tested on RHEL 5.6 i386 / x86_64 on packages:
python-qpid-0.7.946106-15.el5.noarch
qpid-cpp-client-0.7.946106-28.el5.i386
qpid-cpp-client-devel-0.7.946106-28.el5.i386
qpid-cpp-client-devel-docs-0.7.946106-28.el5.i386
qpid-cpp-client-rdma-0.7.946106-28.el5.i386
qpid-cpp-client-ssl-0.7.946106-28.el5.i386
qpid-cpp-mrg-debuginfo-0.7.946106-28.el5.i386
qpid-cpp-server-0.7.946106-28.el5.i386
qpid-cpp-server-cluster-0.7.946106-28.el5.i386
qpid-cpp-server-devel-0.7.946106-28.el5.i386
qpid-cpp-server-rdma-0.7.946106-28.el5.i386
qpid-cpp-server-ssl-0.7.946106-28.el5.i386
qpid-cpp-server-store-0.7.946106-28.el5.i386
qpid-cpp-server-xml-0.7.946106-28.el5.i386
qpid-dotnet-0.4.738274-2.el5.i386
qpid-java-client-0.7.946106-15.el5.noarch
qpid-java-common-0.7.946106-15.el5.noarch
qpid-java-example-0.7.946106-15.el5.noarch
qpid-tests-0.7.946106-1.el5.noarch
qpid-tools-0.7.946106-12.el5.noarch
rh-qpid-cpp-tests-0.7.946106-28.el5.i386


-> VERIFIED

Comment 10 errata-xmlrpc 2011-02-15 12:11:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0217.html


Note You need to log in before you can comment on or make changes to this bug.