Bug 1075684

Summary: openstack-cinder-volume service doesn't reconnect to qpidd after qpidd restart
Product: Red Hat OpenStack Reporter: Chris Roberts <chrobert>
Component: openstack-cinderAssignee: Flavio Percoco <fpercoco>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Dafna Ron <dron>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: breeler, chrobert, dyocum, eharney, erich, fpercoco, gsim, jdexter, mlopes, ramon, sclewis, scohen, sgotliv, yeylon
Target Milestone: ---   
Target Release: 5.0 (RHEL 7)   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1025025 Environment:
Last Closed: 2014-05-29 15:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Roberts 2014-03-12 15:14:50 UTC
+++ This bug was initially created as a clone of Bug #1025025 +++

Description of problem:

When the qpidd service is restarted, openstack-cinder-volume requires a restart to reconnect to the qpidd service.

Version-Release number of selected component (if applicable):


openstack-cinder-2013.1.3-2.el6ost.noarch


How reproducible:

Every

Steps to Reproduce:
1. Start qpidd and openstack-cinder-volume and verify cinder functionality (create/delete volumes, etc.)
2. Restart qpidd service
3. Again, try to create/delete cinder volumes.

Actual results:

Volumes fail to create/delete - displayed as in error state in dashboard

Expected results:

Volumes create/delete successfully.

Additional info:

--- Additional comment from Dan Yocum on 2013-10-30 14:47:10 EDT ---



--- Additional comment from Dan Yocum on 2013-10-30 14:47:56 EDT ---



--- Additional comment from Eric Harney on 2013-11-13 15:30:17 EST ---

Here is my attempt to reproduce this -- I couldn't.  No qpid errors generated in /var/log/messages and behavior is as expected.

--- Additional comment from Eric Harney on 2013-11-13 15:31:26 EST ---

Dan: Any more info you can give about how to reproduce this?  It doesn't seem to happen on my machine.  (3.0.z on RHEL 6.5 Beta)

--- Additional comment from Dan Yocum on 2013-11-20 12:57:11 EST ---

Sadly, no, I don't have any more details right now.  I'll try to reproduce the problem when I find some spare time (buwahahahaha!  I crack me up), but I don't see that happening any time soon.  :/

I'm ok with closing the ticket and if/when I run into it again, I'll re-open with more details.

--- Additional comment from Eric Harney on 2013-11-27 09:37:01 EST ---

Closing since I can't find any way to make this happen.  Note that the errors in the volume log appear to be related to driver issues and not AMQP.

--- Additional comment from Dan Yocum on 2014-01-08 08:48:03 EST ---

Let's leave this open for the next couple of weeks.

If/when the problem reproduces itself, we won't touch the volume service and you can log in and see what it's failing on.  Sound like a plan?

--- Additional comment from Gordon Sim on 2014-01-08 11:22:21 EST ---

The qpidd errors sound like it might be https://bugzilla.redhat.com/show_bug.cgi?id=1041575 (see https://bugs.launchpad.net/cinder/+bug/1187298).

--- Additional comment from Eric Harney on 2014-01-14 18:04:54 EST ---

Bug 1050215 could be related/helpful here.

--- Additional comment from Ayal Baron on 2014-02-05 09:29:11 EST ---

Since this is a test only bug and the bug it's tracking is closed already as well as us not being able to reproduce it, I'm moving to ON_QA.

--- Additional comment from Dafna Ron on 2014-02-19 05:05:09 EST ---

This specific issue is not reproduced but we have a different one which I will open a bug for. 
closing this as verified with the following packages 

[root@orange-vdsf ~]# rpm -qa |grep cinder
openstack-cinder-2013.2.2-1.el6ost.noarch
python-cinderclient-1.0.7-2.el6ost.noarch
python-cinder-2013.2.2-1.el6ost.noarch

root@puma31 ~]# rpm -qa |grep qpid
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
python-qpid-0.18-4.el6.noarch

--- Additional comment from errata-xmlrpc on 2014-03-04 15:12:56 EST ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0213.html

Comment 2 Flavio Percoco 2014-03-12 15:47:26 UTC
This seems somehow related to #1060689

Do we have the logs of the server? Can this be reproduced?

Comment 3 Chris Roberts 2014-03-12 15:53:48 UTC
Jeff Dexter is working on getting the logs from the system.

Comment 5 Chris Roberts 2014-03-14 16:26:02 UTC
/var/log/cinder/volume.log
2014-03-11 14:49:06    ERROR [root] Unexpected exception occurred 1 time(s)... retrying.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/excutils.py", line 62, in inner_func
    return infunc(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 529, in _consumer_thread
    self.consume()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 520, in consume
    it.next()
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 437, in iterconsume
    yield self.ensure(_error_callback, _consume)
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 377, in ensure
    return method(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py", line 428, in _consume
    nxt_receiver = self.session.next_receiver(timeout=timeout)
  File "<string>", line 6, in next_receiver
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 660, in next_receiver
    if self._ecwait(lambda: self.incoming, timeout):
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
    result = self._ewait(lambda: self.closed or predicate(), timeout)
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 567, in _ewait
    self.check_error()
  File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 556, in check_error
    raise self.error
SessionError: Queue cinder-volume has been deleted. (qpid/broker/Queue.cpp:1866)(408)
2014-03-11 14:49:16     INFO [cinder.service] Caught SIGTERM, stopping children
2014-03-11 14:49:16     INFO [cinder.service] Waiting on 1 children to exit
2014-03-11 14:49:16     INFO [cinder.service] Caught SIGTERM, exiting
2014-03-11 14:49:16     INFO [cinder.service] Child 24392 exited with status 1
2014-03-11 14:49:16     INFO [cinder.service] Starting 1 workers
2014-03-11 14:49:16     INFO [cinder.service] Started child 11841
2014-03-11 14:49:16    AUDIT [cinder.service] Starting cinder-volume node (version 2013.1.4)
2014-03-11 14:50:39     INFO [cinder.volume.manager] volume volume-b9a7f284-dfd5-42e7-a157-e4beb7427e0f: skipping export
2014-03-11 14:50:39     INFO [cinder.volume.manager] Updating volume status
2014-03-11 14:50:39     INFO [cinder.openstack.common.rpc.impl_qpid] Connected to AMQP server on public-msg1.os1.phx2.redhat.com:5672
2014-03-11 14:50:39     INFO [cinder.openstack.common.rpc.impl_qpid] Connected to AMQP server on public-msg1.os1.phx2.redhat.com:5672

There are no other red flags around this time, system was not under high load, (confirmed via sar data), no messages in log file

Comment 6 Gordon Sim 2014-03-14 16:33:08 UTC
(In reply to Chris Roberts from comment #5)
> /var/log/cinder/volume.log
> 2014-03-11 14:49:06    ERROR [root] Unexpected exception occurred 1
> time(s)... retrying.
> Traceback (most recent call last):
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/excutils.py", line
> 62, in inner_func
>     return infunc(*args, **kwargs)
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py",
> line 529, in _consumer_thread
>     self.consume()
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py",
> line 520, in consume
>     it.next()
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py",
> line 437, in iterconsume
>     yield self.ensure(_error_callback, _consume)
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py",
> line 377, in ensure
>     return method(*args, **kwargs)
>   File
> "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py",
> line 428, in _consume
>     nxt_receiver = self.session.next_receiver(timeout=timeout)
>   File "<string>", line 6, in next_receiver
>   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line
> 660, in next_receiver
>     if self._ecwait(lambda: self.incoming, timeout):
>   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line
> 50, in _ecwait
>     result = self._ewait(lambda: self.closed or predicate(), timeout)
>   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line
> 567, in _ewait
>     self.check_error()
>   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line
> 556, in check_error
>     raise self.error
> SessionError: Queue cinder-volume has been deleted.
> (qpid/broker/Queue.cpp:1866)(408)


I believe this is https://bugzilla.redhat.com/show_bug.cgi?id=1041575 and should be fixed by upgrading to python-qpid-0.18-8

Comment 7 Dan Yocum 2014-03-25 19:18:41 UTC
I can verify that python-qpid-0.18-9 is in NONE of the MRG channels in RHN classic - I've just checked.  

Nor is it in the ost-3 or ost-4 channels.

[root@int-comp040 ~]# rhn-channel -l
rhel-x86_64-server-6
rhel-x86_64-server-6-mrg-grid-2
rhel-x86_64-server-6-mrg-grid-execute-2
rhel-x86_64-server-6-mrg-management-2
rhel-x86_64-server-6-mrg-messaging-2
rhel-x86_64-server-6-mrg-realtime-2
rhel-x86_64-server-6-ost-3
rhel-x86_64-server-6-rhscl-1
rhel-x86_64-server-optional-6
rhel-x86_64-server-supplementary-6
[root@int-comp040 ~]# yum list python-qpid\*
Loaded plugins: downloadonly, priorities, rhnplugin, security
This system is receiving updates from RHN Classic or RHN Satellite.
867 packages excluded due to repository priority protections
Installed Packages
python-qpid.noarch                   0.14-11.el6_3         @rhel-x86_64-server-6
Available Packages
python-qpid-proton.x86_64            0.6-2.el6             epel                 
python-qpid-proton-doc.noarch        0.6-2.el6             epel                 
python-qpid-qmf.x86_64               0.14-14.el6_3         rhel-x86_64-server-6

Comment 8 Jeff Dexter 2014-03-25 19:46:50 UTC
Dan, could python-qpid-0.18-9.el6.noarch be one of the 867 packages excluded due to repository priority protections?

Comment 11 Dan Yocum 2014-06-09 16:13:33 UTC
So, yes, I got the yum priorities correct and upgraded qpidd and python-qpid and I'm fairly certain that _this_ issue went away. There are still cinder-qpid issues (openstack-cinder-volume requires a restart periodically) but that issue is not this issue (I don't think).