Bug 1516660 - Persist device metadata also in the legacy flow (AttributeError: volumeID for storage Drives)
Summary: Persist device metadata also in the legacy flow (AttributeError: volumeID for...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.0
: 4.20.9.1
Assignee: Francesco Romani
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On: 1542117
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 08:37 UTC by Francesco Romani
Modified: 2018-02-22 09:58 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-02-22 09:58:37 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
michal.skrivanek: devel_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 84565 0 master MERGED virt: make sure to store device metadata 2017-11-24 12:27:30 UTC

Description Francesco Romani 2017-11-23 08:37:49 UTC
Description of problem:
Vdsm 4.20.z supports two way to start VMs: Domain XML (preferred in oVirt >= 4.2.0) and legacy vm.conf style (only choice in oVirt <= 4.1).
Furthermore, Vdsm 4.20.z too advantage of the new domain XML initialization system to keep all the data of one VM in the domain XML specification, removing the so-called "recovery files", which used to store all the VM parameters -and some of its state.

Those two changes may interact in obscure ways, leading to failure to persist key device information. The VM will (in most case) silently operate in degraded state, with some operations failing in weird modes, like raising 'AttributeError'.

The most common aforementioned 'key device information' lost are the drive UUIDs (imageid, volumeid..). Those are supposed to be stored in the per-device metadata section of the domain XML.

1. regardless of how the VM is started, the VM is recovered using the data present in the domain XML spec, and nothing else.
2. when the VM is started with Domain XML, Engine *MUST* send valid initial values. Vdsm's job is to keep those value updated should storage management happen (e.g. any operation which changes the drive active layer, like snapshot).
3. when the VM is started with the vm.conf style, the parameters are sent among all the others. BUT, those UUIDs are oVirt-specific information which has no counterpart in the domain XML data (contrary to most of other device data), so they must be saved in the device metadata.

point #3 is where this bug lies. Vdsm 4.20.8 does not update the domain XML with the device metadata taken from vm.conf, thus the recovery will silently miss some data.

Please note that if the Vdsm is *not* restarted while the offending VM runs, this bug will not trigger - the in-memory representation of the VM is correct.

Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. start a VM with legacy vm.conf style, with Engine 4.1.z and Vdsm 4.19.z
2. migrate (either live or passing through hibernation) the VM to one host running Vdsm 4.20.z
3. restart vdsm 4.20.z once the VM is running again

Actual results:
Some device attribute, most notably Drive UUIDs (volumeID) are lost, some storage-related operation will fail with weird results like AttributeError

Expected results:
VM operates normally across any number of VM restart. No device data is lost


Additional info:

Comment 1 Francesco Romani 2017-11-23 08:39:07 UTC
We don't need a doc_text. This is a regression users should never see.

Comment 2 Francesco Romani 2017-11-24 13:32:57 UTC
patch merged, to appear in Vdsm 4.20.9 -> MODIFIED

Comment 3 Israel Pinto 2018-02-22 09:44:47 UTC
Verify with:
4.1 engine version: 4.1.10-0.1.el7
4.2 engine version: 4.2.2.1-0.1.el7
Hosts:
4.2 host: 
OS Version:RHEL - 7.5 - 6.el7
Kernel Version:3.10.0 - 855.el7.x86_64
KVM Version:2.9.0 - 16.el7_4.13.1
LIBVIRT Version:libvirt-3.9.0-13.el7
VDSM Version:vdsm-4.20.19-1.el7ev
4.1 host:
OS Version:RHEL - 7.5 - 6.el7
Kernel Version:3.10.0 - 851.el7.x86_64
KVM Version:2.10.0 - 21.el7
LIBVIRT Version:libvirt-3.9.0-13.el7
VDSM Version:vdsm-4.19.46-1.el7ev

Steps:
On 4.1 cluster 4.1
On 4.2 cluster 4.1 
1. Start VM on host 4.1 (vdsm 4.19)
2. Migrate VM to host 4.2 (vdsm 4.20.19)
3. Restart vdsm on host 4.2 

Results:
VM is running disk device is active

Comment 4 Sandro Bonazzola 2018-02-22 09:58:37 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.