Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1066936

Summary:

cinder: volume stuck in creating/deleting when command is sent while qpid is down and than started (restart qpid race)

Product:

Red Hat OpenStack

Reporter:

Dafna Ron <dron>

Component:

openstack-cinder

Assignee:

Sergey Gotliv <sgotliv>

Status:

CLOSED UPSTREAM

QA Contact:

Dafna Ron <dron>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.0

CC:

eharney, fpercoco, scohen, sgotliv, yeylon

Target Milestone:

---

Target Release:

6.0 (Juno)

Hardware:

x86_64

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-08-26 19:56:05 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs	none

Description Dafna Ron 2014-02-19 10:57:33 UTC

Created attachment 865024 [details]
logs

Description of problem:

I think its a race but if you slow it down it reproduces 100% 
volume creat/delete will get stuck whith no timeout when command is sent when qpid is restarted. 

Version-Release number of selected component (if applicable):

[root@puma31 ~]# rpm -qa |grep qpid
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
python-qpid-0.18-4.el6.noarch

[root@orange-vdsf ~(keystone_admin)]# rpm -qa |grep cinder
openstack-cinder-2013.2.2-1.el6ost.noarch
python-cinderclient-1.0.7-2.el6ost.noarch
python-cinder-2013.2.2-1.el6ost.noarch


How reproducible:

100%

Steps to Reproduce:

My setup is remote cinder and controller
 
1. stop qpid service
2. create a volume 
3. start qpid

Actual results:

the command is sent leaving the volume status change to creating 
In actuality, the command is only shown in api log (so I don't think its actually sent) and there is no time out. 

Expected results:

we should: 
1. either fail the command right away or with timeout
2. change volume status to error

Additional info: logs

Comment 1 Flavio Percoco 2014-03-17 09:18:29 UTC

Looks like the API node (and most probably this needs to be fixed in the scheduler node too) has all the info needed to change the status. I'd assume this happens with other commands too.

In the case of volume creation - in stable/havana - this call may need to be wrapped[0], although this sounds like something that could be improved in taskflow too. I... think this was fixed in Icehouse, at least it should have a better way to handle this kind of failures.

[0] https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L1504