Bug 872934

Summary: Unit test causing segfault on clustered broker
Product: Red Hat Enterprise MRG Reporter: Petr Matousek <pematous>
Component: qpid-cppAssignee: mick <mgoulish>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matousek <pematous>
Severity: urgent Docs Contact:
Priority: high    
Version: DevelopmentCC: freznice, iboverma, jross, lzhaldyb, mcressma
Target Milestone: 2.3Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-19 16:38:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
clustered broker log (5672)
none
clustered broker log (5673)
none
clustered broker log (5672)
none
clustered broker log (5673)
none
broker coredump (5672)
none
broker coredump (5673) none

Description Petr Matousek 2012-11-04 12:48:23 UTC
Description of problem:

Following qmf unit test is causing segfault on rhel5 clustered broker in the exit phase:
qpid_tests.broker_0_10.qmf_events.EventTests.test_queue_autodelete_exclusive ......................................................................... fail
Error during teardown:  Traceback (most recent call last):
    File "/usr/bin/qpid-python-test", line 340, in run
      phase()
    File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/init.py", line 55, in teardown
      self.teardown_connection(self.conn)
    File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/init.py", line 59, in teardown_connection
      conn.close(timeout=self.timeout())
    File "<string>", line 6, in close
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 316, in close
      ssn.close(timeout=timeout)
    File "<string>", line 6, in close
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 749, in close
      if not self._ewait(lambda: self.closed, timeout=timeout):
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait
      result = self.connection._ewait(lambda: self.error or predicate(), timeout)
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 209, in _ewait
      self.check_error()
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 202, in check_error
      raise self.error
  ConnectionError: (104, 'Connection reset by peer')
Totals: 1 tests, 0 passed, 0 skipped, 0 ignored, 1 failed
[1]-  Segmentation fault      (core dumped) qpidd --cluster-name pematous --data-dir=broker1 >&broker1.log
[2]+  Segmentation fault      (core dumped) qpidd -p 5673 --cluster-name pematous --data-dir=broker2 >&broker2.log

Broker's log files are attached.

Standalone broker do not suffer from that.

Version-Release number of selected component (if applicable):
qpid-cpp-*-0.18-6

How reproducible:
100%

Steps to Reproduce:
1. qpidd  --cluster-name pematous --data-dir=broker1 &>broker1.log &
2. qpidd -p 5673 --cluster-name pematous --data-dir=broker2 &>broker2.log &
3. PYTHONPATH=${PYTHONPATH}:. qpid-python-test -m qpid_tests qpid_tests.broker_0_10.qmf_events.EventTests.test_queue_autodelete_exclusive
  
Actual results:
Seqfault

Expected results:
Test is passing

Additional info:

Comment 2 Petr Matousek 2012-11-04 13:00:53 UTC
NOTE: this was also seen on rhel6

Comment 3 Petr Matousek 2012-11-04 13:02:00 UTC
Created attachment 637988 [details]
clustered broker log (5672)

Comment 4 Petr Matousek 2012-11-04 13:02:33 UTC
Created attachment 638000 [details]
clustered broker log (5673)

Comment 5 Petr Matousek 2012-11-04 13:44:29 UTC
Created attachment 638007 [details]
clustered broker log (5672)

Comment 6 Petr Matousek 2012-11-04 13:45:07 UTC
Created attachment 638008 [details]
clustered broker log (5673)

Comment 7 Petr Matousek 2012-11-04 13:46:43 UTC
Created attachment 638009 [details]
broker coredump (5672)

Comment 8 Petr Matousek 2012-11-04 13:47:18 UTC
Created attachment 638010 [details]
broker coredump (5673)

Comment 9 mick 2012-11-09 13:38:39 UTC
This problem introduced to the 0.18-mrg branch at this point:

  Bug 869002 - QPID-4394
  sha: 795e416d4a3a07c655ae47e23c48b7244436a87c

Broker::createQueue thinks it has created the queue.
Cluster::deliverToQueue disagrees.   

...still investigating...

Comment 10 mick 2012-11-19 18:46:46 UTC
partial fix:

to avoid the SEGV, 

in file src/qpid/cluster/Cluster.cpp

in fn   Cluster::deliverToQueue

in the code block    if ( ! q )

there should be a return; after the call to leave(l);

Otherwise, control will pass to the code below that block, and we will attempt to deliver a message to a queue that we already know to be nonexistent, using a pointer that is null.


This change must definitely be made, but it is still only a partial fix, as the test in question still fails.  

...still investigating...

Comment 11 mick 2012-11-20 19:28:27 UTC
The SEGV has been fixed by a recent checkin that added a throw to the code mentioned in comment 10, above.

( The fix came from Alan's checkin for BZ 875660 )

I believe that the continuing test failure is benign, and only indicates that the testing code needs to be made a little smarter.  ( I will explain in the new BZ. )

Please redefine this bug as concerning the SEGV only.
I am moving the failure of the test to a new BZ.

Comment 12 mick 2012-11-20 20:37:12 UTC
new BZ is 878638.

Comment 13 Petr Matousek 2012-11-28 10:06:27 UTC
The SEGV has been fixed, however the above mentioned test is still failing and causing the  broker to shut down due to cluster delivery to non-existent queue.
This issue is tracked by bug 878638.

Verified on rhel5.9 and rhel6.3 (x86_64, i386)

packages used for testing:
qpid-cpp-*-0.18-10

-> VERIFIED