Bug 1850220

Summary: Backward compatibility of vm devices monitoring
Product: [oVirt] ovirt-engine Reporter: Arik <ahadas>
Component: BLL.VirtAssignee: Liran Rotenberg <lrotenbe>
Status: CLOSED CURRENTRELEASE QA Contact: Qin Yuan <qiyuan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.1CC: bugs, lrotenbe, michal.skrivanek, rdlugyhe
Target Milestone: ovirt-4.4.1Flags: pm-rhel: ovirt-4.4+
Target Release: 4.4.1.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.1.5 Doc Type: Bug Fix
Doc Text:
Old virtual machines that have not been restarted since user aliases were introduced in RHV version 4.2 use old device aliases created by libvirt. The current release adds support for those old device aliases and links them to the new user-aliases to prevent correlation issues and devices being unplugged.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-05 06:25:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arik 2020-06-23 18:33:42 UTC
Engine 4.4 monitors devices only by their user-alias (VmDevicesMonitoring::correlateWithUserAlias).
This is problematic because if we have VMs that were started before user-aliases were introduced (4.2), we won't be able to correlate the devices and they'll be unplugged.
In addition, devices may get unplugged after restoring memory state from a snapshot.

Comment 1 Michal Skrivanek 2020-06-24 05:20:04 UTC
Where/how do you get to monitor a <4.2 VM when we do not support <4.2 cluster levels?

Comment 2 Arik 2020-06-24 05:44:17 UTC
Let's say that a VM started in oVirt 4.1 and then you upgrade to 4.2 -
so you have a VM that was started in 4.1, without user-aliases, running in a 4.1 cluster managed by engine 4.2 (and set with next-run configuration).
Then you upgrade to engine 4.4 - this engine will get this VM.

As long as we don't require users to restart their VMs during the upgrade process we cannot rely on engine >= 4.2 to set their aliases with user-aliases.

Comment 3 Liran Rotenberg 2020-06-24 08:36:04 UTC
Yes, but when the VM creating the domxml, it will take the device id and set it as an alias to that device.
The correlation happens when the VM is reported by libvirt.

I think it will happen only to unmanaged VMs or VMs we update their machine type/chipset(UpdateVmCommand::updateDeviceAddresses)/activating NIC(ActivateDeactivateVmNicCommand::updateDevice).

We may change VmDevicesMonitoring::correlateWithUserAlias, checking the devices ids in case we don't have any alias.

Comment 4 Qin Yuan 2020-07-13 01:10:56 UTC
Verified with:
ovirt-engine-4.4.1.7-0.3.el8ev.noarch

Steps:
1. Create and run a VM having old device alias made by libvirt:
   1) prepare a 4.3 engine
   2) create a data center, set compatibility version to 4.1
   3) add a cluster, set compatibility version to 4.1
   4) add a 4.3 host to the cluster
   5) add nfs storage domain
   6) create a VM, setting custom compatibility version to 4.3
   7) add two nics, two disks to the VM
   8) run the VM

2. Upgrade engine to 4.4:
   1) change the cluster compatibility version from 4.1 to 4.3
   2) change the data center compatibility version from 4.1 to 4.3
   3) back up engine 4.3
   4) fresh install an engine 4.4 on a different machine
   5) copy backup file to engine 4.4 machine and restore it
   6) stop engine 4.3
   7) run engine-setup on engine 4.4
   8) run /usr/share/ovirt-engine/setup/bin/ovirt-engine-rename
   9) login to engine 4.4, check the VM

Results:
1. The VM created in step 1 has the old device alias made by libvirt. See engine log:

2020-07-13 05:51:21,386+08 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-235) [e3180550-60a2-436a-8f23-f402f5b50160] VM {memGuaranteedSize=1024, smpThreadsPerCore=1, cpuType=Opteron_G5, vmId=171586de-8e03-4e80-9f0b-d98049d3dd07, acpiEnable=true, vmType=kvm, smp=1, smpCoresPerSocket=1, emulatedMachine=pc-i440fx-rhel7.6.0, smartcardEnable=false, guestNumaNodes=[{cpus=0, nodeIndex=0, memory=1024}], transparentHugePages=true, numOfIoThreads=1, displayNetwork=ovirtmgmt, vmName=vm_41, maxVCpus=16, kvmEnable=true, devices=[{type=video, specParams={vgamem=16384, heads=1, vram=8192, ram=65536}, device=qxl, deviceId=30329518-6f44-4342-ad68-a757ce40dea1}, {type=graphics, specParams={keyMap=en-us, fileTransferEnable=true, spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir, copyPasteEnable=true}, device=vnc, deviceId=b9118f2f-361a-4552-ad86-c845a8d2fece}, {type=graphics, specParams={keyMap=en-us, fileTransferEnable=true, spiceSecureChannels=smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir, copyPasteEnable=true}, device=spice, deviceId=723c39ee-be2f-467a-86be-8989a0980744}, {iface=ide, shared=false, path=, readonly=true, index=2, type=disk, specParams={path=}, device=cdrom, deviceId=1e48ef64-15de-44fa-95c0-d8e98d43101b}, {discard=false, shared=false, address={bus=0, controller=0, unit=0, type=drive, target=0}, imageID=77ab7183-dd2f-47ed-a051-cc7406eaf589, format=raw, index=0, optional=false, type=disk, deviceId=77ab7183-dd2f-47ed-a051-cc7406eaf589, domainID=fcd24f19-27a5-45ac-9197-a337b48ff3dd, propagateErrors=off, iface=scsi, readonly=false, bootOrder=1, poolID=63d19353-8342-4b63-b52f-bd6331548834, volumeID=d646b47c-ea17-4376-b53e-59b31720dbe9, diskType=file, specParams={}, device=disk}, {discard=false, shared=false, address={bus=0, controller=0, unit=1, type=drive, target=0}, imageID=cf7e4b84-ad54-4038-b04c-0cbc4f5abbde, format=raw, optional=false, type=disk, deviceId=cf7e4b84-ad54-4038-b04c-0cbc4f5abbde, domainID=fcd24f19-27a5-45ac-9197-a337b48ff3dd, propagateErrors=off, iface=scsi, readonly=false, poolID=63d19353-8342-4b63-b52f-bd6331548834, volumeID=32c927d6-39aa-471d-b916-223c8cc72ebf, diskType=file, specParams={}, device=disk}, {filter=vdsm-no-mac-spoofing, nicModel=pv, filterParameters=[], type=interface, specParams={inbound={}, outbound={}}, device=bridge, linkActive=true, deviceId=8e5c02c2-0945-4172-8d29-a1ac160f43c0, macAddr=56:6f:8d:b5:00:01, network=ovirtmgmt}, {filter=vdsm-no-mac-spoofing, nicModel=pv, filterParameters=[], type=interface, specParams={inbound={}, outbound={}}, device=bridge, linkActive=true, deviceId=401091d6-cce7-4987-b834-2811a74e13b7, macAddr=56:6f:8d:b5:00:02, network=ovirtmgmt}, {index=0, model=piix3-uhci, type=controller, specParams={}, device=usb, deviceId=0c2aacc1-762d-4f3b-a572-2fb839323621}, {type=balloon, specParams={model=virtio}, device=memballoon, deviceId=68582f62-688b-4d5b-86fc-0584b06beb7f}, {index=0, model=virtio-scsi, type=controller, specParams={ioThreadId=1}, device=scsi, deviceId=cdf511ef-3948-4635-8868-dbc87fb879b3}, {type=controller, specParams={}, device=virtio-serial, deviceId=85e1c29e-4999-4cc2-bd63-47d7c93bfed5}, {model=virtio, type=rng, specParams={source=urandom}, device=virtio, deviceId=c8ceb23f-ea99-4a93-b81b-a529207bb27b}], custom={}, timeOffset=0, nice=0, maxMemSize=4096, maxMemSlots=16, bootMenuEnable=false, memSize=1024, agentChannelName=ovirt-guest-agent.0}
2020-07-13 05:51:25,728+08 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10) [] VM '171586de-8e03-4e80-9f0b-d98049d3dd07'(vm_41) moved from 'WaitForLaunch' --> 'PoweringUp'
2020-07-13 05:51:25,770+08 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (ForkJoinPool-1-worker-10) [] FINISH, FullListVDSCommand, return: [{acpiEnable=true, emulatedMachine=pc-i440fx-rhel7.6.0, vmId=171586de-8e03-4e80-9f0b-d98049d3dd07, guestDiskMapping={}, transparentHugePages=true, timeOffset=0, cpuType=Opteron_G5, smp=1, guestNumaNodes=[{nodeIndex=0, cpus=0, memory=1024}], smartcardEnable=false, custom={}, vmType=kvm, memSize=1024, smpCoresPerSocket=1, vmName=vm_41, nice=0, status=Up...{nicModel=pv, macAddr=56:6f:8d:b5:00:01, linkActive=true, network=ovirtmgmt, filterParameters=[], specParams={inbound={}, outbound={}}, vmid=171586de-8e03-4e80-9f0b-d98049d3dd07, filter=vdsm-no-mac-spoofing, alias=net0, deviceId=8e5c02c2-0945-4172-8d29-a1ac160f43c0, address={slot=0x03, bus=0x00, domain=0x0000, type=pci, function=0x0}, device=bridge, type=interface, vm_custom={}, name=vnet0}, {nicModel=pv, macAddr=56:6f:8d:b5:00:02, linkActive=true, network=ovirtmgmt, filterParameters=[], specParams={inbound={}, outbound={}}, vmid=171586de-8e03-4e80-9f0b-d98049d3dd07, filter=vdsm-no-mac-spoofing, alias=net1, deviceId=401091d6-cce7-4987-b834-2811a74e13b7, address={slot=0x04, bus=0x00, domain=0x0000, type=pci, function=0x0}, device=bridge, type=interface, vm_custom={}, name=vnet1},...format=raw, deviceId=77ab7183-dd2f-47ed-a051-cc7406eaf589, address={bus=0, controller=0, type=drive, target=0, unit=0}, device=disk, path=/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/77ab7183-dd2f-47ed-a051-cc7406eaf589/d646b47c-ea17-4376-b53e-59b31720dbe9, propagateErrors=off, optional=false, name=sda, vm_custom={}, bootOrder=1, vmid=171586de-8e03-4e80-9f0b-d98049d3dd07, volumeID=d646b47c-ea17-4376-b53e-59b31720dbe9, diskType=file, alias=scsi0-0-0-0, discard=false, volumeChain=[{domainID=fcd24f19-27a5-45ac-9197-a337b48ff3dd, leaseOffset=0, volumeID=d646b47c-ea17-4376-b53e-59b31720dbe9, leasePath=/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/77ab7183-dd2f-47ed-a051-cc7406eaf589/d646b47c-ea17-4376-b53e-59b31720dbe9.lease, imageID=77ab7183-dd2f-47ed-a051-cc7406eaf589, path=/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/77ab7183-dd2f-47ed-a051-cc7406eaf589/d646b47c-ea17-4376-b53e-59b31720dbe9}]}, {poolID=63d19353-8342-4b63-b52f-bd6331548834, reqsize=0, index=1, iface=scsi, apparentsize=5368709120, specParams={}, imageID=cf7e4b84-ad54-4038-b04c-0cbc4f5abbde, readonly=False, shared=false, truesize=0, type=disk, domainID=fcd24f19-27a5-45ac-9197-a337b48ff3dd, volumeInfo={path=/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/cf7e4b84-ad54-4038-b04c-0cbc4f5abbde/32c927d6-39aa-471d-b916-223c8cc72ebf, type=file}, format=raw, deviceId=cf7e4b84-ad54-4038-b04c-0cbc4f5abbde, address={bus=0, controller=0, type=drive, target=0, unit=1}, device=disk, path=/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/cf7e4b84-ad54-4038-b04c-0cbc4f5abbde/32c927d6-39aa-471d-b916-223c8cc72ebf, propagateErrors=off, optional=false, name=sdb, vm_custom={}, vmid=171586de-8e03-4e80-9f0b-d98049d3dd07, volumeID=32c927d6-39aa-471d-b916-223c8cc72ebf, diskType=file, alias=scsi0-0-0-1...


2. After upgrade engine to 4.4, the VM runs normally, nics, disks are not unplugged.
   The aliases in VM dumpxml are the same as the alias before engine upgrade to 4.4:

    ...
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/77ab7183-dd2f-47ed-a051-cc7406eaf589/d646b47c-ea17-4376-b53e-59b31720dbe9'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>77ab7183-dd2f-47ed-a051-cc7406eaf589</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/x.x.x.x:_home_nfs_data1/fcd24f19-27a5-45ac-9197-a337b48ff3dd/images/cf7e4b84-ad54-4038-b04c-0cbc4f5abbde/32c927d6-39aa-471d-b916-223c8cc72ebf'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <serial>cf7e4b84-ad54-4038-b04c-0cbc4f5abbde</serial>
      <alias name='scsi0-0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <interface type='bridge'>
      <mac address='56:6f:8d:b5:00:01'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='56:6f:8d:b5:00:02'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
    ...

Comment 5 Sandro Bonazzola 2020-08-05 06:25:06 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.