Bug 1698159

Summary: [OSP16] Swap volume of multiattached volume will corrupt data
Product: Red Hat OpenStack Reporter: Lee Yarwood <lyarwood>
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED CURRENTRELEASE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 16.0 (Train)CC: dasmith, eglynn, jhakimra, kchamart, mbooth, sbauza, sgordon, vromanso
Target Milestone: Upstream M2Keywords: Patch, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1698162 (view as bug list) Environment:
Last Closed: 2019-07-18 14:40:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1698162, 1698167, 1698175    

Description Lee Yarwood 2019-04-09 17:41:36 UTC
Description of problem:

https://bugs.launchpad.net/nova/+bug/1775418

We currently permit the following:

Create multiattach volumes a and b
Create servers 1 and 2
Attach volume a to servers 1 and 2
swap_volume(server 1, volume a, volume b)

In fact, we have a tempest test which tests exactly this sequence: api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach

The problem is that writes from server 2 during the copy operation on server 1 will continue to hit the underlying storage, but as server 1 doesn't know about them they won't be reflected on the copy on volume b. This will lead to an inconsistent copy, and therefore data corruption on volume b.

Also, this whole flow makes no sense for a multiattached volume because even if we managed a consistent copy all we've achieved is forking our data between the 2 volumes. The purpose of this call is to allow the operator to move volumes. We need a fundamentally different approach for multiattached volumes.

In the short term we should at least prevent data corruption by preventing swap volume of a multiattached volume. This would also cause the above tempest test to fail, but as I don't believe it's possible to implement the test safely this would be correct.

Version-Release number of selected component (if applicable):
OpenStack Train

How reproducible:
Always

Steps to Reproduce:
- Create multiattach volumes a and b
- Create servers 1 and 2
- Attach volume a to servers 1 and 2
- swap_volume(server 1, volume a, volume b)

Actual results:
Volume corruption due to multiple active R/W attachments.

Expected results:
Attempt to swap volumes is rejected.

Additional info:
https://review.openstack.org/#/q/topic:bug/1775418+(status:open+OR+status:merged)

Comment 3 Lee Yarwood 2019-07-18 14:40:52 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1698162 will track this into the OSP 15.0 release.