Nir, this looks like the snapshot failed to apply. Are we aware of any outstanding snapshot bugs here?
not loosing data -> VM still running -> high severity
(In reply to Ryan Barry from comment #1) > Nir, this looks like the snapshot failed to apply. Are we aware of any > outstanding snapshot bugs here? I don't know, snapshot code is vdsm is maintain now by Milan.
Storage snapshotting, too
Liran, you played with snapshots quite a bit recently - did it happen to you? can it be related to the freezed filesystem?
No, in the version given the freeze issue wasn't exist yet. when previewing we are not creating a new volume, when we commit we do. In RunVmCommand we sometimes delete volumes (RunVmCommand::removeMemoryFromActiveSnapshot). Maybe the combination of previewing-run vm-commit of memory snapshot makes unwanted things to the vm domxml.
The problem within this bug will split into two. I will concentrate on the subject within the bug to this one(Feel free to open a new bug on the VDSM traceback). The problem here is with the VM devices. When we create a run a VM we take it's ID and set it as alias. This is the state it's saved for the memory snapshot. Then running preview/undo to snapshot, we are reading the OVF. But we create new ID for some of the devices. Afterwards we correlate the devices between the DB to what we got from the dumpxml of libvirt, we see that the device ID is not correlating to the existing alias (we had the first one as alias with the preview). This makes the device as unplugged from engine point of view. Therefore, there is a problem with some devices and you will get a VM without them. In your engine log you can see: 2020-03-31 18:21:15,340+03 WARN [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] Received unexpected user-alias: video0 2020-03-31 18:21:15,340+03 WARN [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] unmanaged video device with address '' is ignored 2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='1b78a69e-0ac9-43ec-8e8b-9d20053fbb23', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='cdrom', type='DISK', specParams='[]', address='', managed='true', plugged='false', readOnly='true', deviceAlias='ua-cb2bcd90-93fc-44c7-bba5-582ddc3528f0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}' 2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='2b82518d-36df-4cd5-86d7-6a2a43fd0880', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='qxl', type='VIDEO', specParams='[vgamem=16384, heads=1, ram=65536, vram=8192]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='ua-b028dd53-2536-4d34-b8b3-8504f441935a', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}' 2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='c212ff9d-0d60-4f44-b43b-4d3ffea42768', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='virtio-scsi', type='CONTROLLER', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='ua-b7c1a08c-a8fa-45d6-9b4a-28d622143b5c', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}' libvirt actually at some point, at least for video did add a device: <video> <model type='none'/> <alias name='video0'/> </video> But it's type='none', which make it unusable. The problem is not only for the console, but as you can see, you will loose the CDROM, and the virtio-scsi controller(=any virtio-scsi disk won't be recognized on the VM. A workaround that might do is updating the VM video(qxl->vga or vga->qxl) and changing the disk interface.
(In reply to Liran Rotenberg from comment #7) > I will concentrate on the subject within the bug to this one(Feel free to > open a new bug on the VDSM traceback). I think the reason for this is the ownership on the disk image of the memory dump that changes. Before previewing the snapshot I get: -rw-rw----. 1 vdsm kvm 467013632 Jun 23 20:20 cfe25b27-9692-4616-8b21-e1eb7e60a131 After the VM restores its memory from that volume I get: -rw-rw----. 1 root root 467013632 Jun 23 20:20 cfe25b27-9692-4616-8b21-e1eb7e60a131 Need to check what has changed. > libvirt actually at some point, at least for video did add a device: > <video> > <model type='none'/> > <alias name='video0'/> > </video> > But it's type='none', which make it unusable. The problem is not only for > the console, but as you can see, you will loose the CDROM, and the > virtio-scsi controller(=any virtio-scsi disk won't be recognized on the VM. > > A workaround that might do is updating the VM video(qxl->vga or vga->qxl) > and changing the disk interface. This part looks like a comedy of mistakes that started with unneeded changes in VmDevicesConverter and ended up with the fix for bz 1729424 that reveals those bugs. Filed bz 1850220 to address this (much more important).
Yes, it is regarding the volume ownership. Libvirt changing it when shutting-down the VM. 2020-06-23 15:03:49.847+0000: 1455: info : virSecuritySELinuxRestoreFileLabel:1464 : Restoring SELinux context on '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22' I already opened a little discussion with Michal from libvirt. The easy solution is to change the memory volume ownership when restoring, but maybe there is another option.
We are starting the VM with virDomainRestoreFlags. 2020-06-23 15:38:48.993+0000: 1457: debug : virThreadJobSet:94 : Thread 1457 (virNetServerHandleJob) is now running job remoteDispatchDomainRestoreFlags 2020-06-23 15:38:48.993+0000: 1457: debug : virDomainRestoreFlags:1026 : conn=0x7fe2a8004f80, from=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__ 0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22, dxml=<?xml version='1.0' encoding='utf-8'?> <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> <name>snapshot_test</name> <uuid>793e0441-5621-47b3-8591-32cabc39652b</uuid> ... </domain>, flags=0x4 2020-06-23 15:38:48.993+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee5767460 2020-06-23 15:38:48.996+0000: 1457: debug : virFileIsSharedFSType:3478 : Check if path /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 with FS magic 26985 is shared ... 2020-06-23 15:38:48.993+0000: 1457: debug : virDomainRestoreFlags:1026 : conn=0x7fe2a8004f80, from=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__ 0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22, dxml=<?xml version='1.0' encoding='utf-8'?> .. 2020-06-23 15:38:48.993+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee5767460 2020-06-23 15:38:48.996+0000: 1457: debug : virFileIsSharedFSType:3478 : Check if path /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 with FS magic 26985 is shared ... 2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessStart:7077 : conn=0x7fe2a8004f80 driver=0x556ee577a620 vm=0x7fe29002eb20 name=snapshot_test id=-1 asyncJob=start migrateFrom=stdio migrateFd=35 migratePath=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 snapshot=(nil) vmop=2 flags=0x22 2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessInit:5526 : vm=0x7fe29002eb20 name=snapshot_test id=-1 migration=1 2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessInit:5528 : Beginning VM startup process ... 2020-06-23 15:38:49.975+0000: 1457: debug : qemuProcessLaunch:6674 : conn=0x7fe2a8004f80 driver=0x556ee577a620 vm=0x7fe29002eb20 name=snapshot_test if=54 asyncJob=6 incoming.launchURI=defer incoming.deferredURI=fd:35 incoming.fd=35 incoming.path=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 snapshot=(nil) vmop=2 flags=0x22 And on shutdown: 2020-06-23 15:38:58.791+0000: 1457: info : virObjectNew:251 : OBJECT_NEW: obj=0x7fe284081860 classname=virDomainEventLifecycle 2020-06-23 15:38:58.791+0000: 1457: info : virObjectNew:251 : OBJECT_NEW: obj=0x7fe28408f210 classname=virDomainEventLifecycle 2020-06-23 15:38:58.792+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee579cb90 2020-06-23 15:38:58.792+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee576c6a0 2020-06-23 15:38:58.792+0000: 1457: info : virSecuritySELinuxRestoreFileLabel:1464 : Restoring SELinux context on '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_co mpute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22' I think it's done with security intentions and I didn't see anything in the flags to prevent it, and it's not part of the domxml where I can put a seclabel element. As pointed out, the last option is to call chown for this volume. Is there anything else we can do? * Michal the needinfo on you because I briefly talked with you about it. Feel free to change it to other person.
What's the libvirt version? The attached libvirt logs are from info severity and not debug, which makes it harder to debug. Anyway, when I do restore domain from a file I can see the owner changing (restore through virDomainRestoreFlags). But, I've went all the way down to v2.0.0 (released 1st July 2016) and the restore changed the owner of the file. IOW, libvirt has been chown()-ing the file it's restoring the domain from for the past 4 years (if not since forever). And it makes sense, sort of. Libvirt has to change SELinux context on the file to allow QEMU reading the migration stream. And then, after QEMU is done reading, the SELinux context is restored. Changing the owner, however, does not make sense I agree. But that doesn't seem to be the problem since that part hasn't changed. So I'm wondering what else has changed. But I probably still don't understand the issue here. What exactly is happening and what can Libvirt do better?
I used libvirt-daemon-6.0.0-24.module+el8.2.1+6997+c666f621.x86_64. It is strange because the ownership doesn't change in libvirt 4.5 as far I can tell (I will have to check on rhev 4.3 with libvirt debug log whether we see that line there). The issue is VDSM doesn't have the right ownership to the volume after restoring and shutting down the VM. Meaning, after the initial use if we wish to restore that snapshot again with the memory, we can't.
(In reply to Michal Privoznik from comment #11) > Anyway, when I do restore domain from a file I can see the owner changing > (restore through virDomainRestoreFlags). But, I've went all the way down to > v2.0.0 (released 1st July 2016) and the restore changed the owner of the > file. IOW, libvirt has been chown()-ing the file it's restoring the domain > from for the past 4 years (if not since forever). In oVirt 3.3 we've tested: "For stateless VM, the memory will be restored on each run." [1] This means the VM restores the memory from the same file each time it starts - so back then it worked. But that was in mid-2013 (~July) so it may have changed in libvirt even before v2.0.0. It's not covered in our tests and it's not a common flow users would do [1] https://www.ovirt.org/develop/release-management/features/virt/ram-snapshots.html
Verified on RHV 4.4.1-10 (RHEL 8 hosts and engine) Steps to Reproduce: 1. Create a data center, cluster, VM with protocol spice console 2. Run the VM 3. Find the VM's devices ids in the DB and save those ids. 4. Access the VM's Console using virt viewer in protocol Spice 5. Create a directory and a file inside 6. Create a snapshot of the VM with memory and disks 7. Create another file inside the directory 8. Stop the VM 9. Preview the VM's snapshot 10. Run the VM and Access Console (I had to check the first file exists and the second isn't) 11. Stop the VM 12. Commit the VM's changes with memory and disks 13. Run VM 14. Access Console using virt viewer in protocol Spice 15. Find the VM's devices ids in the DB and save those ids and compare it with the previous data you collected. Results: The console has opened and it worked fine. The VM's devices ids are the same.
I created the VM with iso attached to the cd-rom and virt-scsi disk interface.
The changed ownership on memory volumes issue moves to bz 1857554
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.