Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1066936

Summary: cinder: volume stuck in creating/deleting when command is sent while qpid is down and than started (restart qpid race)
Product: Red Hat OpenStack Reporter: Dafna Ron <dron>
Component: openstack-cinderAssignee: Sergey Gotliv <sgotliv>
Status: CLOSED UPSTREAM QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0CC: eharney, fpercoco, scohen, sgotliv, yeylon
Target Milestone: ---   
Target Release: 6.0 (Juno)   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-26 19:56:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2014-02-19 10:57:33 UTC
Created attachment 865024 [details]
logs

Description of problem:

I think its a race but if you slow it down it reproduces 100% 
volume creat/delete will get stuck whith no timeout when command is sent when qpid is restarted. 

Version-Release number of selected component (if applicable):

[root@puma31 ~]# rpm -qa |grep qpid
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
python-qpid-0.18-4.el6.noarch

[root@orange-vdsf ~(keystone_admin)]# rpm -qa |grep cinder
openstack-cinder-2013.2.2-1.el6ost.noarch
python-cinderclient-1.0.7-2.el6ost.noarch
python-cinder-2013.2.2-1.el6ost.noarch


How reproducible:

100%

Steps to Reproduce:

My setup is remote cinder and controller
 
1. stop qpid service
2. create a volume 
3. start qpid

Actual results:

the command is sent leaving the volume status change to creating 
In actuality, the command is only shown in api log (so I don't think its actually sent) and there is no time out. 

Expected results:

we should: 
1. either fail the command right away or with timeout
2. change volume status to error

Additional info: logs

Comment 1 Flavio Percoco 2014-03-17 09:18:29 UTC
Looks like the API node (and most probably this needs to be fixed in the scheduler node too) has all the info needed to change the status. I'd assume this happens with other commands too.

In the case of volume creation - in stable/havana - this call may need to be wrapped[0], although this sounds like something that could be improved in taskflow too. I... think this was fixed in Icehouse, at least it should have a better way to handle this kind of failures.

[0] https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L1504