Description of problem: Windows VM ends up with ghost NIC and missing secondary disks after upgrading cluster compatibility to 4.6 followed by VM restart. The vNIC goes offline and secondary disks are missing. Version-Release number of selected component (if applicable): How reproducible: ovirt-engine-4.4.6 Steps to Reproduce: 1. Prepare RHV-4.4 environment, but do not update cluster compatibility to 4.6 yet 2. Install a Windows VM with minimum 2 disks and one vNIC. 3. Update cluster compatibility to 4.6 and restart the VM Actual results: VM ends up with ghost NIC and missing secondary disks. Expected results: VM should continue to see the vNIC and all disks even after updating cluster compatibility to 4.6 Additional info: This can be resolved by re-configuring vNIC and re-attaching the secondary disks.
Further investigation revealed that: 1. When importing Windows 2016/2019 VMs to either cluster level 4.5 or 4.6 the secondary disks are set to offline 2. When changing the emulated machine of the VM from pc-q35-rhel8.3.0 to pc-q35-rhel8.4.0, an additional NIC is discovered within the guest and the secondary disks go offline We relied on the fix to bz 1939546 to fix this but #1 happens also with the latest virtio-win version that is supposed to include this fix (1.9.17-3). We assume that after importing the VMs the user set static IP on the NIC and set the secondary disks online - which was expected And the problem is that the network settings are lost and the disks go offline again when the emulated machine changes since a new NIC device that doesn't hold the previous settings is identified and the secondary disks are changed back to offline AFAIK, that didn't happen before when changing the machine type so I'll clone this to qemu in order to investigate why this happens now
Summing up an offline discussion on this bz: As the platform team are unable to prevent this issue we need to do something on the oVirt/RHV side. The solution suggested by platform to set custom emulated machine for the VM is a complex and risky change on our side. So we agreed on providing a warning in the update cluster dialog when transitioning from machine type < RHEL 8.4 to machine type >= RHEL 8.4 with an explanation on this issue that may happen and what users can do to address it (fix their guests after the VM is upgraded to the new cluster-level, set custom cluster level before the cluster-upgrade, or avoid upgrading the cluster-level).
Hi Arik, Some follow up questions please: 1. What would be the actions required from the user? And once we have those, we should document it in KCS (or official documentation?). 2. This warning would be given when? After we change the CL to 4.6 or after upgrading the manager and before changing the CL or? does it matter at all? 3. To what VMs would this apply? All the VMs that had machine type < pc-q35-rhel8.4.0? 4. When is the machine type set? When the VM is created first or when the VM is started? i.e. would the machine type change if the VM is shut down and is now started on a RHEL 8.4 host? p.s. QEMU machine types in RHV: https://access.redhat.com/articles/3229221
Hi Marina, > 1. What would be the actions required from the user? And once we have those, > we should document it in KCS (or official documentation?). It really depends on what the user will choose, the user has several options here: 1. To proceed with the upgrade of the cluster version and "fix" the guests afterwards (restore the static network configuration, online secondary disks) 2. To avoid upgrading the cluster level, i.e., keeping the previous compatibility version of the cluster 3. To set the VMs that there's a better chance that would be affected by this with custom compatibility level We already documented it in the release notes and the upgrade guide when introducing cluster-level 4.6 (https://bugzilla.redhat.com/show_bug.cgi?id=1940232#c26) - back then we didn't know about the issue with secondary disks though so we can update that text though. > 2. This warning would be given when? After we change the CL to 4.6 or after > upgrading the manager and before changing the CL or? does it matter at all? The latter - after upgrdeing the manager and before changing the cluster level so users will have a chance to choose one of the aforementioned actions > 3. To what VMs would this apply? All the VMs that had machine type < > pc-q35-rhel8.4.0? That's still under review, I like the patch that was posted by Lucia with two warnings: 1. Due to the emulated machine change, some Windows virtual machines might lose static IP configuration. This would need to be resolved manually. The following virtual machines might be affected: 2. Due to the emulated machine change, some Windows virtual machines might start with offline secondary disks. This would need to be resolved manually. The following virtual machines might be affected: When their machine type changes from < 8.4 to >=8.4 > 4. When is the machine type set? When the VM is created first or when the VM > is started? i.e. would the machine type change if the VM is shut down and is > now started on a RHEL 8.4 host? The machine type is set when the VM is created
(In reply to Arik from comment #27) > Hi Marina, > > > 1. What would be the actions required from the user? And once we have those, > > we should document it in KCS (or official documentation?). > > It really depends on what the user will choose, the user has several options > here: > 1. To proceed with the upgrade of the cluster version and "fix" the guests > afterwards (restore the static network configuration, online secondary disks) > 2. To avoid upgrading the cluster level, i.e., keeping the previous > compatibility version of the cluster > 3. To set the VMs that there's a better chance that would be affected by > this with custom compatibility level Could this option (3) above be part of the Cluster CL level upgrade dialog? A button to make the engine automatically set custom CL/machine for the affected VMs with values equal to those used on the current CL level. Would be handy for large deployments.
(In reply to Germano Veit Michel from comment #28) > Could this option (3) above be part of the Cluster CL level upgrade dialog? > A button to make the engine automatically set custom CL/machine for the > affected VMs with values equal to those used on the current CL level. > Would be handy for large deployments. I agree and it was discussed but we didn't reach an agreement on this
Arik, Michal: I modified this KCS to reflect all the decisions. FYI: https://access.redhat.com/solutions/6179481
(In reply to Marina Kalinin from comment #33) > Arik, Michal: I modified this KCS to reflect all the decisions. FYI: > https://access.redhat.com/solutions/6179481 Looks good, thanks
Tested with: ovirt-engine-4.4.9.2-0.6.el8ev.noarch Steps: 1. Create a cluster with Compatibility Version 4.5 2. Create VMs according to the following matrix: OS type Disks number Nics number Custom Compatibility Version Custom Emulated Machine OtherOS 0 0 - - OtherOS 0 1 - - OtherOS 1 0 - - OtherOS 1 1 - - OtherOS 2 0 - - OtherOS 2 1 - - OtherOS 2 1 4.5 - OtherOS 2 1 - pc-q35-rhel8.3.0 Windows 0 0 - - Windows 0 1 - - Windows 1 0 - - Windows 1 1 - - Windows 2 0 - - Windows 2 1 - - Windows 2 1 4.5 - Windows 2 1 - pc-q35-rhel8.3.0 RHEL 8.x 2 1 - - 3. Update the cluster compatibility version to 4.6 Results: 1. There is a confirmation dialog saying: " Due to the emulated machine change, some Windows virtual machines might start with offline secondary disks. This would need to be resolved manually. The following virtual machines might be affected: OtherOS_2Disks_0Nic OtherOS_2Disks_1Nic Windows_2Disks_0Nic Windows_2Disks_1Nic Due to the emulated machine change, some Windows virtual machines might lose static IP configuration. This would need to be resolved manually. The following virtual machines might be affected: Windows_0Disk_1Nic [Windows 2012 x64] OtherOS_2Disks_0Nic [Other OS] OtherOS_1Disk_0Nic [Other OS] OtherOS_1Disk_1Nic [Other OS] Windows_0Disk_0Nic [Windows 10 x64] OtherOS_0Disk_0Nic [Other OS] Windows_1Disk_0Nic [Windows 2019 x64] Windows_1Disk_1Nic [Windows 2019 x64] OtherOS_0Disk_1Nic [Other OS] OtherOS_2Disks_1Nic [Other OS] Windows_2Disks_0Nic [Windows 2019 x64] Windows_2Disks_1Nic [Windows 2019 x64] Are you sure you want to continue? " 2. According to the info in the confirmation dialog: - All OtherOS and Windows VMs with 2 disks, except the ones with Custom Compatibility Version or Custom Emulated Machine configured, are listed in "offline secondary disks" warning section. - All OtherOS and Windows VMs with less than 2 disks, are not listed in "offline secondary disks" warning section. - All OtherOS and Windows VMs, except the ones with Custom Compatibility Version or Custom Emulated Machine configured, are listed in "lose static IP configuration" warning section. - The RHEL VM isn't listed in any warning section. The VMs are listed in the warning message as expected, except that it's better not to list OtherOS/Windows VM without any Nic in the "lose static IP configuration" warning section. Lucia, what do you think? Do we need to filter out OtherOS/Windows VM without any Nic?
Yeah, we could have filtered only VMs with at least one NIC that is set with a NIC-profile But what's the consequence of that? we'll display also VMs with no NIC or with no NIC that has a non-empty NIC profile After all, we say those VMs "might lose static IP configuration" - considering that it's unlikely to have only such VMs, these VMs will typically be shown along with other VMs that this warning may apply for. I don't think we should block 4.4.9 on that. That's something we can improve on the master branch though.
(In reply to Arik from comment #38) > Yeah, we could have filtered only VMs with at least one NIC that is set with > a NIC-profile > But what's the consequence of that? we'll display also VMs with no NIC or > with no NIC that has a non-empty NIC profile > After all, we say those VMs "might lose static IP configuration" - > considering that it's unlikely to have only such VMs, these VMs will > typically be shown along with other VMs that this warning may apply for. I > don't think we should block 4.4.9 on that. That's something we can improve > on the master branch though. Ok, this makes sense, move this bug to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) security update [ovirt-4.4.9]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4626