Bug 1066936
| Summary: | cinder: volume stuck in creating/deleting when command is sent while qpid is down and than started (restart qpid race) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Dafna Ron <dron> | ||||
| Component: | openstack-cinder | Assignee: | Sergey Gotliv <sgotliv> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Dafna Ron <dron> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.0 | CC: | eharney, fpercoco, scohen, sgotliv, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 6.0 (Juno) | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-08-26 19:56:05 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Looks like the API node (and most probably this needs to be fixed in the scheduler node too) has all the info needed to change the status. I'd assume this happens with other commands too. In the case of volume creation - in stable/havana - this call may need to be wrapped[0], although this sounds like something that could be improved in taskflow too. I... think this was fixed in Icehouse, at least it should have a better way to handle this kind of failures. [0] https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L1504 |
Created attachment 865024 [details] logs Description of problem: I think its a race but if you slow it down it reproduces 100% volume creat/delete will get stuck whith no timeout when command is sent when qpid is restarted. Version-Release number of selected component (if applicable): [root@puma31 ~]# rpm -qa |grep qpid qpid-cpp-client-0.18-14.el6.x86_64 qpid-cpp-server-0.18-14.el6.x86_64 python-qpid-0.18-4.el6.noarch [root@orange-vdsf ~(keystone_admin)]# rpm -qa |grep cinder openstack-cinder-2013.2.2-1.el6ost.noarch python-cinderclient-1.0.7-2.el6ost.noarch python-cinder-2013.2.2-1.el6ost.noarch How reproducible: 100% Steps to Reproduce: My setup is remote cinder and controller 1. stop qpid service 2. create a volume 3. start qpid Actual results: the command is sent leaving the volume status change to creating In actuality, the command is only shown in api log (so I don't think its actually sent) and there is no time out. Expected results: we should: 1. either fail the command right away or with timeout 2. change volume status to error Additional info: logs