Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 675921

Summary: clustered qpidd broker fails ocassionly the cluster_tests.ShortTests.test_route_update
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: low Docs Contact:
Priority: medium    
Version: 1.3CC: aconway, esammons, gsim, jneedle, jross
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.9.1079953 qpid-cpp-mrg-0.10-6.el5 Doc Type: Bug Fix
Doc Text:
Cause: Internal test showed management objects for inter-broker bridges could be created inconsistently in a cluster. Consequence: Could potentially cause brokers to shut down with "invalid argument" error. Fix: Correct the inconsistency Result: Bridge management objects are created consistently in a cluster.
Story Points: ---
Clone Of:
: 704424 716239 (view as bug list) Environment:
Last Closed: 2011-06-23 15:46:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 704424, 716239    
Attachments:
Description Flags
The issue reproducer including logs
none
The issue reproducer including logs none

Description Frantisek Reznicek 2011-02-08 09:42:34 UTC
Description of problem:

When the cluster tests are ran in the loop against the installed rpms, I can reliably see that cluster_tests.ShortTests.test_route_update test case is failing from time to time.

cluster_tests.ShortTests.test_route_update .......................................................................................... fail
Error during test:
  Traceback (most recent call last):
    File "./qpid-python-test", line 311, in run
      phase()
    File "/root/cluster_test_bz674338/cluster_tests.py", line 305, in test_route_update
      cluster_test_logs.verify_logs()
    File "/root/cluster_test_bz674338/cluster_test_logs.py", line 106, in verify_logs
      raise Exception("Files differ in %s"%(os.getcwd())+"".join(errors))
  Exception: Files differ in /root/cluster_test_bz674338/brokertest.tmp/cluster_tests.ShortTests.test_route_update
      cluster9-1.log.filter.322 cluster9-0.log.filter.322
Totals: 24 tests, 8 passed, 0 skipped, 15 ignored, 1 failed


Version-Release number of selected component (if applicable):

[root@mrg-qe-09 _s]# rpm -qa |grep qpid | sort
python-qpid-0.7.946106-15.el5
qpid-cpp-client-0.7.946106-28.el5
qpid-cpp-client-devel-0.7.946106-28.el5
qpid-cpp-client-devel-docs-0.7.946106-28.el5
qpid-cpp-client-rdma-0.7.946106-28.el5
qpid-cpp-client-ssl-0.7.946106-28.el5
qpid-cpp-mrg-debuginfo-0.7.946106-28.el5
qpid-cpp-server-0.7.946106-28.el5
qpid-cpp-server-cluster-0.7.946106-28.el5
qpid-cpp-server-devel-0.7.946106-28.el5
qpid-cpp-server-rdma-0.7.946106-28.el5
qpid-cpp-server-ssl-0.7.946106-28.el5
qpid-cpp-server-store-0.7.946106-28.el5
qpid-cpp-server-xml-0.7.946106-28.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-client-0.7.946106-15.el5
qpid-java-common-0.7.946106-15.el5
qpid-java-example-0.7.946106-15.el5
qpid-tools-0.7.946106-12.el5
rh-qpid-cpp-tests-0.7.946106-28.el5

How reproducible:
once per 4-8 runs

Steps to Reproduce:
1. ./runmne
2. the test loops and finally hangs with failure
  
Actual results:
cluster_tests.ShortTests.test_route_update rarely fails.

Expected results:
cluster_tests.ShortTests.test_route_update should always pass.

Additional info:

Comment 1 Frantisek Reznicek 2011-02-08 09:47:51 UTC
Created attachment 477583 [details]
The issue reproducer including logs

The reproducer is based on https://bugzilla.redhat.com/attachment.cgi?id=476636 attachement

and runs with simple ./runme.sh

Comment 2 Alan Conway 2011-02-08 20:30:06 UTC
Fixed in r1068554

    QPID-3045 - sporadic failure of cluster_tests.ShortTests.test_route_update
    
    Sporadically the test was failing because the session associated with
    an inter-broker bridge was created out of order with other
    objects. This is unlikely to cause a fatal cluster inconsistency in
    practice but it has been corrected in any case. The fix was to delay
    creation of the management object for a bridge session till a point
    which is consistent on all cluster members.

Comment 3 Alan Conway 2011-03-11 22:08:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Internal test showed management objects for inter-broker bridges could be created inconsistently in a cluster. 
Consequence: Could potentially cause brokers to shut down with "invalid argument" error.
Fix: Correct the inconsistency
Result: Bridge management objects are created consistently in a cluster.

Comment 5 Frantisek Reznicek 2011-04-28 12:27:08 UTC
The issue has been fixed, long-term tested on RHEL 5.6, 6.1s5 i[36]86 / x86_64 on packages:
python-qpid-0.10-1.el5.noarch
python-qpid-qmf-0.10-6.el5.x86_64
qpid-cpp-client-0.10-4.el5.x86_64
qpid-cpp-client-devel-0.10-4.el5.x86_64
qpid-cpp-client-devel-docs-0.10-4.el5.x86_64
qpid-cpp-client-rdma-0.10-4.el5.x86_64
qpid-cpp-client-ssl-0.10-4.el5.x86_64
qpid-cpp-mrg-debuginfo-0.10-4.el5.x86_64
qpid-cpp-server-0.10-4.el5.x86_64
qpid-cpp-server-cluster-0.10-4.el5.x86_64
qpid-cpp-server-devel-0.10-4.el5.x86_64
qpid-cpp-server-rdma-0.10-4.el5.x86_64
qpid-cpp-server-ssl-0.10-4.el5.x86_64
qpid-cpp-server-store-0.10-4.el5.x86_64
qpid-cpp-server-xml-0.10-4.el5.x86_64
qpid-dotnet-0.4.738274-2.el5.x86_64
qpid-java-client-0.10-4.el5.noarch
qpid-java-common-0.10-4.el5.noarch
qpid-java-example-0.10-4.el5.noarch
qpid-qmf-0.10-6.el5.x86_64
qpid-qmf-debuginfo-0.10-6.el5.x86_64
qpid-qmf-devel-0.10-6.el5.x86_64
qpid-tests-0.10-1.el5.noarch
qpid-tools-0.10-4.el5.noarch
rh-qpid-cpp-tests-0.10-4.el5.x86_64
ruby-qpid-qmf-0.10-6.el5.x86_64
sesame-0.10-1.el5.x86_64
sesame-debuginfo-0.10-1.el5.x86_64


-> VERIFIED

Comment 6 Frantisek Reznicek 2011-04-28 13:30:56 UTC
Created attachment 495537 [details]
The issue reproducer including logs

Approximately one hour after I kicked the bug to VERIFIED I triggered single failure on RHEL 6.1s5 x86_64. 

I'm attaching data + logs, kicking to ASSIGNED and raising NEEDINFO.

Alan,
could you possibly review the logs please?


The way I ran the tests:
  ./runme.sh 1000 cluster_tests.ShortTests.test_route_update &>log2

  last log is then ./log2

The cluster test is slightly modified to run on packaged binaries. Broker is forced to run with all installed plugins.

Comment 7 Alan Conway 2011-04-28 15:33:22 UTC
The logs show an inconsistency in management messages, I'll look into this.

Comment 8 Alan Conway 2011-04-29 15:22:49 UTC
Committed to trunk r1097838

Comment 11 Frantisek Reznicek 2011-05-13 05:15:54 UTC
Te issue has been fixed, cluster does not fail above unit test anymore (two long stress tests).

Tested on RHEL5.6 i386 / x86_64 on packages:
python-qpid-0.10-1.el5.noarch
python-qpid-qmf-0.10-6.el5.i386
qpid-cpp-client-0.10-6.el5.i386
qpid-cpp-client-devel-0.10-6.el5.i386
qpid-cpp-client-devel-docs-0.10-6.el5.i386
qpid-cpp-client-rdma-0.10-6.el5.i386
qpid-cpp-client-ssl-0.10-6.el5.i386
qpid-cpp-mrg-debuginfo-0.10-6.el5.i386
qpid-cpp-server-0.10-6.el5.i386
qpid-cpp-server-cluster-0.10-6.el5.i386
qpid-cpp-server-devel-0.10-6.el5.i386
qpid-cpp-server-rdma-0.10-6.el5.i386
qpid-cpp-server-ssl-0.10-6.el5.i386
qpid-cpp-server-store-0.10-6.el5.i386
qpid-cpp-server-xml-0.10-6.el5.i386
qpid-dotnet-0.4.738274-2.el5.i386
qpid-java-client-0.10-4.el5.noarch
qpid-java-common-0.10-4.el5.noarch
qpid-java-example-0.10-4.el5.noarch
qpid-java-jca-0.10-1.el5.noarch
qpid-qmf-0.10-6.el5.i386
qpid-qmf-debuginfo-0.10-6.el5.i386
qpid-qmf-devel-0.10-6.el5.i386
qpid-tests-0.10-1.el5.noarch
qpid-tools-0.10-4.el5.noarch
rh-qpid-cpp-tests-0.10-6.el5.i386
ruby-qpid-qmf-0.10-6.el5.i386
sesame-0.10-1.el5.i386
sesame-debuginfo-0.10-1.el5.i386


-> VERIFIED

Comment 12 errata-xmlrpc 2011-06-23 15:46:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html