Description of problem: Vdsm 4.20.z supports two way to start VMs: Domain XML (preferred in oVirt >= 4.2.0) and legacy vm.conf style (only choice in oVirt <= 4.1). Furthermore, Vdsm 4.20.z too advantage of the new domain XML initialization system to keep all the data of one VM in the domain XML specification, removing the so-called "recovery files", which used to store all the VM parameters -and some of its state. Those two changes may interact in obscure ways, leading to failure to persist key device information. The VM will (in most case) silently operate in degraded state, with some operations failing in weird modes, like raising 'AttributeError'. The most common aforementioned 'key device information' lost are the drive UUIDs (imageid, volumeid..). Those are supposed to be stored in the per-device metadata section of the domain XML. 1. regardless of how the VM is started, the VM is recovered using the data present in the domain XML spec, and nothing else. 2. when the VM is started with Domain XML, Engine *MUST* send valid initial values. Vdsm's job is to keep those value updated should storage management happen (e.g. any operation which changes the drive active layer, like snapshot). 3. when the VM is started with the vm.conf style, the parameters are sent among all the others. BUT, those UUIDs are oVirt-specific information which has no counterpart in the domain XML data (contrary to most of other device data), so they must be saved in the device metadata. point #3 is where this bug lies. Vdsm 4.20.8 does not update the domain XML with the device metadata taken from vm.conf, thus the recovery will silently miss some data. Please note that if the Vdsm is *not* restarted while the offending VM runs, this bug will not trigger - the in-memory representation of the VM is correct. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. start a VM with legacy vm.conf style, with Engine 4.1.z and Vdsm 4.19.z 2. migrate (either live or passing through hibernation) the VM to one host running Vdsm 4.20.z 3. restart vdsm 4.20.z once the VM is running again Actual results: Some device attribute, most notably Drive UUIDs (volumeID) are lost, some storage-related operation will fail with weird results like AttributeError Expected results: VM operates normally across any number of VM restart. No device data is lost Additional info:
We don't need a doc_text. This is a regression users should never see.
patch merged, to appear in Vdsm 4.20.9 -> MODIFIED
Verify with: 4.1 engine version: 4.1.10-0.1.el7 4.2 engine version: 4.2.2.1-0.1.el7 Hosts: 4.2 host: OS Version:RHEL - 7.5 - 6.el7 Kernel Version:3.10.0 - 855.el7.x86_64 KVM Version:2.9.0 - 16.el7_4.13.1 LIBVIRT Version:libvirt-3.9.0-13.el7 VDSM Version:vdsm-4.20.19-1.el7ev 4.1 host: OS Version:RHEL - 7.5 - 6.el7 Kernel Version:3.10.0 - 851.el7.x86_64 KVM Version:2.10.0 - 21.el7 LIBVIRT Version:libvirt-3.9.0-13.el7 VDSM Version:vdsm-4.19.46-1.el7ev Steps: On 4.1 cluster 4.1 On 4.2 cluster 4.1 1. Start VM on host 4.1 (vdsm 4.19) 2. Migrate VM to host 4.2 (vdsm 4.20.19) 3. Restart vdsm on host 4.2 Results: VM is running disk device is active
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.