Bug 1085005

Summary: openstack-nova: several instances are able to be configure the same bootable volume
Product: Red Hat OpenStack Reporter: Vladan Popovic <vpopovic>
Component: openstack-novaAssignee: Nikola Dipanov <ndipanov>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: ajeain, breeler, dallan, dron, eglynn, ndipanov, sclewis, scohen, tshefi, xqueralt, yeylon, yrabl
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: openstack-nova-2013.2.1-1.el6ost Doc Type: Bug Fix
Doc Text:
Cause: There is a race condition with checking the status of requested volumes in the API, and combined with the fact that we would attempt to reschedule a failed instance to a different host, they can cause a volume to be "stolen" from an instance that managed to attach it successfully. Consequence: If several instances get requested close to each other, and they request the same volume, it was possible for an instance that got rescheduled due to failing the volume setup (for the already taken volume), to disconnect that volume from the instance that already had it attached in the reschedule process. Fix: If an instance fails the volume setup during boot (due to the volume being set up by a different instance), we will not attempt to re-schedule the failed instance to a different host and thus avoid disconnecting a completely attached volume. Result: Only one instance that requested the same volume succeeds while all others go into the ERROR state.
Story Points: ---
Clone Of: 1020501 Environment:
Last Closed: 2014-05-29 20:35:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1020501    

Comment 1 Dafna Ron 2014-04-07 14:12:16 UTC
We will verify this bug so I am taking it. 
to verify: 
1. run the original script booting instances from the volume 
2. make sure that only one instance move to available while the rest move to ERROR. 
3. make sure the volume is not damaged (detach/reattach the volume)
4. run the same script but instead of booting instances from the volume - attach the volume to several running instances.

Comment 6 Nikola Dipanov 2014-04-10 11:47:56 UTC
Dafna - Comment 1 sounds like a good plan to test it.

Attaching volumes should not really be part of this bug, as the issue was with how we handled rescheduling on failures. But I do urge you to test attaching as well and possibly report a different bug.

Comment 11 Nikola Dipanov 2014-04-30 08:44:23 UTC
Moving back to 4.0 as this is indeed fixed in 4.0

Comment 14 Yogev Rabl 2014-05-04 13:24:36 UTC
verified on:

python-novaclient-2.15.0-4.el6ost.noarch
openstack-nova-conductor-2013.2.3-7.el6ost.noarch
openstack-nova-scheduler-2013.2.3-7.el6ost.noarch
openstack-nova-common-2013.2.3-7.el6ost.noarch
openstack-nova-api-2013.2.3-7.el6ost.noarch
openstack-nova-console-2013.2.3-7.el6ost.noarch
openstack-nova-network-2013.2.3-7.el6ost.noarch
openstack-nova-cert-2013.2.3-7.el6ost.noarch
python-nova-2013.2.3-7.el6ost.noarch
openstack-nova-compute-2013.2.3-7.el6ost.noarch
openstack-nova-novncproxy-2013.2.3-7.el6ost.noarch
python-cinderclient-1.0.7-2.el6ost.noarch

Comment 16 errata-xmlrpc 2014-05-29 20:35:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0578.html