Bug 1399654 - Cinder service-list is seen as down
Summary: Cinder service-list is seen as down
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-messaging
Version: 6.0 (Juno)
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Victor Stinner
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-29 14:05 UTC by Jaison Raju
Modified: 2020-01-17 16:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-30 03:55:36 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jaison Raju 2016-11-29 14:05:55 UTC
Description of problem:
cinder volume service is down .
service restart brings it up for few mins .
No events noticed on cinder logs . 

Version-Release number of selected component (if applicable):
RHOS6
python-oslo-messaging-1.4.1-3.el7ost.noarch
rabbitmq-server is provided by 3rd party (Pivotal)
{rabbit,"RabbitMQ","3.3.1"}

How reproducible:
Only on customer end .

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
rabbitmq-server logs :
=ERROR REPORT==== 29-Nov-2016::04:19:02 ===
connection <0.14981.69>, channel 1 - soft error:
{amqp_error,precondition_failed,
            "parameters for queue 'engine' in vhost '/' not equivalent",
            'queue.declare'}

Comment 8 Jaison Raju 2016-11-30 03:55:36 UTC
(In reply to John Eckersberg from comment #7)
> I've seen countless times in the past where the ceph code blocks inside of
> the eventloop, which prevents the periodic job running causing the service
> to be marked as down.  Usually if you strace the process, you will see it
> spinning on futex() with FUTEX_WAIT.

Thanks a lot John .
It seems this could be the issue .
The issue was resolved after all ceph concerns were resolved .
As soon as all PGs were active+clean cinder service came back and all Openstack Services were back online .
Closing as not a bug .
I think it would be a good idea to create a doc with information on what all needs to be checked for each service , when it is seen as down .
I will start working on it .
Thanks Victor , Petr for the helping out on this .

Regards,
Jaison R


Note You need to log in before you can comment on or make changes to this bug.