Bug 1819299 - Cannot access VM console after previewing and committing a snapshot
Summary: Cannot access VM console after previewing and committing a snapshot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.1
: 4.4.1.5
Assignee: Liran Rotenberg
QA Contact: Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-31 16:16 UTC by Tamir
Modified: 2020-08-17 06:27 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.4.1.5
Doc Type: Bug Fix
Doc Text:
Previously when loading a memory snapshot, the RHV Manager did not load existing device IDs. Instead, it created new IDs for each device. The Manager was unable to correlate the IDs with the devices and treated them as though they were unplugged. The current release fixes this issue. Now, the Manager consumes the device IDs and correlates them with the devices.
Clone Of:
: 1851016 (view as bug list)
Environment:
Last Closed: 2020-08-05 06:24:57 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
ahadas: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 109893 0 master MERGED core: parse device id from OVF 2020-10-12 06:55:11 UTC

Comment 1 Ryan Barry 2020-04-01 00:57:18 UTC
Nir, this looks like the snapshot failed to apply. Are we aware of any outstanding snapshot bugs here?

Comment 2 Lukas Svaty 2020-04-03 09:36:15 UTC
not loosing data -> VM still running -> high severity

Comment 3 Nir Soffer 2020-05-20 23:37:52 UTC
(In reply to Ryan Barry from comment #1)
> Nir, this looks like the snapshot failed to apply. Are we aware of any
> outstanding snapshot bugs here?

I don't know, snapshot code is vdsm is maintain now by Milan.

Comment 4 Ryan Barry 2020-05-21 00:28:13 UTC
Storage snapshotting, too

Comment 5 Arik 2020-06-11 11:36:25 UTC
Liran, you played with snapshots quite a bit recently - did it happen to you? can it be related to the freezed filesystem?

Comment 6 Liran Rotenberg 2020-06-11 12:20:48 UTC
No, in the version given the freeze issue wasn't exist yet.

when previewing we are not creating a new volume, when we commit we do.
In RunVmCommand we sometimes delete volumes (RunVmCommand::removeMemoryFromActiveSnapshot).
Maybe the combination of previewing-run vm-commit of memory snapshot makes unwanted things to the vm domxml.

Comment 7 Liran Rotenberg 2020-06-15 16:00:17 UTC
The problem within this bug will split into two.

I will concentrate on the subject within the bug to this one(Feel free to open a new bug on the VDSM traceback).
The problem here is with the VM devices.

When we create a run a VM we take it's ID and set it as alias. This is the state it's saved for the memory snapshot.
Then running preview/undo to snapshot, we are reading the OVF. But we create new ID for some of the devices.
Afterwards we correlate the devices between the DB to what we got from the dumpxml of libvirt, we see that the device ID is not correlating to the existing alias (we had the first one as alias with the preview).
This makes the device as unplugged from engine point of view. Therefore, there is a problem with some devices and you will get a VM without them.

In your engine log you can see:
2020-03-31 18:21:15,340+03 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] Received unexpected user-alias: video0
2020-03-31 18:21:15,340+03 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] unmanaged video device with address '' is ignored
2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='1b78a69e-0ac9-43ec-8e8b-9d20053fbb23', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='cdrom', type='DISK', specParams='[]', address='', managed='true', plugged='false', readOnly='true', deviceAlias='ua-cb2bcd90-93fc-44c7-bba5-582ddc3528f0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'
2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='2b82518d-36df-4cd5-86d7-6a2a43fd0880', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='qxl', type='VIDEO', specParams='[vgamem=16384, heads=1, ram=65536, vram=8192]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='ua-b028dd53-2536-4d34-b8b3-8504f441935a', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'
2020-03-31 18:21:15,354+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [] VM 'fb2566a3-ab0a-4f16-8d2f-dc33f13e4815' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='c212ff9d-0d60-4f44-b43b-4d3ffea42768', vmId='fb2566a3-ab0a-4f16-8d2f-dc33f13e4815'}', device='virtio-scsi', type='CONTROLLER', specParams='[]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='ua-b7c1a08c-a8fa-45d6-9b4a-28d622143b5c', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'

libvirt actually at some point, at least for video did add a device:
    <video>
      <model type='none'/>
      <alias name='video0'/>
    </video>
But it's type='none', which make it unusable. The problem is not only for the console, but as you can see, you will loose the CDROM, and the virtio-scsi controller(=any virtio-scsi disk won't be recognized on the VM.

A workaround that might do is updating the VM video(qxl->vga or vga->qxl) and changing the disk interface.

Comment 8 Arik 2020-06-23 18:42:56 UTC
(In reply to Liran Rotenberg from comment #7)
> I will concentrate on the subject within the bug to this one(Feel free to
> open a new bug on the VDSM traceback).

I think the reason for this is the ownership on the disk image of the memory dump that changes.

Before previewing the snapshot I get:
-rw-rw----. 1 vdsm kvm 467013632 Jun 23 20:20 cfe25b27-9692-4616-8b21-e1eb7e60a131

After the VM restores its memory from that volume I get:
-rw-rw----. 1 root root 467013632 Jun 23 20:20 cfe25b27-9692-4616-8b21-e1eb7e60a131

Need to check what has changed.

> libvirt actually at some point, at least for video did add a device:
>     <video>
>       <model type='none'/>
>       <alias name='video0'/>
>     </video>
> But it's type='none', which make it unusable. The problem is not only for
> the console, but as you can see, you will loose the CDROM, and the
> virtio-scsi controller(=any virtio-scsi disk won't be recognized on the VM.
> 
> A workaround that might do is updating the VM video(qxl->vga or vga->qxl)
> and changing the disk interface.

This part looks like a comedy of mistakes that started with unneeded changes in VmDevicesConverter and ended up with the fix for bz 1729424 that reveals those bugs. Filed bz 1850220 to address this (much more important).

Comment 9 Liran Rotenberg 2020-06-23 20:18:36 UTC
Yes, it is regarding the volume ownership.

Libvirt changing it when shutting-down the VM.
2020-06-23 15:03:49.847+0000: 1455: info : virSecuritySELinuxRestoreFileLabel:1464 : Restoring SELinux context on '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22'

I already opened a little discussion with Michal from libvirt. The easy solution is to change the memory volume ownership when restoring, but maybe there is another option.

Comment 10 Liran Rotenberg 2020-06-24 12:05:48 UTC
We are starting the VM with virDomainRestoreFlags.

2020-06-23 15:38:48.993+0000: 1457: debug : virThreadJobSet:94 : Thread 1457 (virNetServerHandleJob) is now running job remoteDispatchDomainRestoreFlags
2020-06-23 15:38:48.993+0000: 1457: debug : virDomainRestoreFlags:1026 : conn=0x7fe2a8004f80, from=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__
0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22, dxml=<?xml version='1.0' encoding='utf-8'?>
<domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
    <name>snapshot_test</name>
    <uuid>793e0441-5621-47b3-8591-32cabc39652b</uuid>
...
</domain>, flags=0x4
2020-06-23 15:38:48.993+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee5767460
2020-06-23 15:38:48.996+0000: 1457: debug : virFileIsSharedFSType:3478 : Check if path /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 with FS magic 26985 is shared
...

2020-06-23 15:38:48.993+0000: 1457: debug : virDomainRestoreFlags:1026 : conn=0x7fe2a8004f80, from=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__
0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22, dxml=<?xml version='1.0' encoding='utf-8'?>
..
2020-06-23 15:38:48.993+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee5767460
2020-06-23 15:38:48.996+0000: 1457: debug : virFileIsSharedFSType:3478 : Check if path /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 with FS magic 26985 is shared

...

2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessStart:7077 : conn=0x7fe2a8004f80 driver=0x556ee577a620 vm=0x7fe29002eb20 name=snapshot_test id=-1 asyncJob=start migrateFrom=stdio migrateFd=35 migratePath=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 snapshot=(nil) vmop=2 flags=0x22
2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessInit:5526 : vm=0x7fe29002eb20 name=snapshot_test id=-1 migration=1
2020-06-23 15:38:49.471+0000: 1457: debug : qemuProcessInit:5528 : Beginning VM startup process
...
2020-06-23 15:38:49.975+0000: 1457: debug : qemuProcessLaunch:6674 : conn=0x7fe2a8004f80 driver=0x556ee577a620 vm=0x7fe29002eb20 name=snapshot_test if=54 asyncJob=6 incoming.launchURI=defer incoming.deferredURI=fd:35 incoming.fd=35 incoming.path=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22 snapshot=(nil) vmop=2 flags=0x22

And on shutdown:
2020-06-23 15:38:58.791+0000: 1457: info : virObjectNew:251 : OBJECT_NEW: obj=0x7fe284081860 classname=virDomainEventLifecycle
2020-06-23 15:38:58.791+0000: 1457: info : virObjectNew:251 : OBJECT_NEW: obj=0x7fe28408f210 classname=virDomainEventLifecycle
2020-06-23 15:38:58.792+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee579cb90
2020-06-23 15:38:58.792+0000: 1457: info : virObjectRef:386 : OBJECT_REF: obj=0x556ee576c6a0
2020-06-23 15:38:58.792+0000: 1457: info : virSecuritySELinuxRestoreFileLabel:1464 : Restoring SELinux context on '/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_co
mpute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/39f4c6bd-2de0-4bab-89b3-a4926631761d/72d85c14-d910-43ef-8597-826d8f14ff22'

I think it's done with security intentions and I didn't see anything in the flags to prevent it, and it's not part of the domxml where I can put a seclabel element.
As pointed out, the last option is to call chown for this volume. Is there anything else we can do?
* Michal the needinfo on you because I briefly talked with you about it. Feel free to change it to other person.

Comment 11 Michal Privoznik 2020-06-25 10:01:12 UTC
What's the libvirt version? The attached libvirt logs are from info severity and not debug, which makes it harder to debug.

Anyway, when I do restore domain from a file I can see the owner changing (restore through virDomainRestoreFlags). But, I've went all the way down to v2.0.0 (released 1st July 2016) and the restore changed the owner of the file. IOW, libvirt has been chown()-ing the file it's restoring the domain from for the past 4 years (if not since forever). And it makes sense, sort of. Libvirt has to change SELinux context on the file to allow QEMU reading the migration stream. And then, after QEMU is done reading, the SELinux context is restored. Changing the owner, however, does not make sense I agree. But that doesn't seem to be the problem since that part hasn't changed. So I'm wondering what else has changed.

But I probably still don't understand the issue here. What exactly is happening and what can Libvirt do better?

Comment 12 Liran Rotenberg 2020-06-25 10:51:34 UTC
I used libvirt-daemon-6.0.0-24.module+el8.2.1+6997+c666f621.x86_64.

It is strange because the ownership doesn't change in libvirt 4.5 as far I can tell (I will have to check on rhev 4.3 with libvirt debug log whether we see that line there).
The issue is VDSM doesn't have the right ownership to the volume after restoring and shutting down the VM.
Meaning, after the initial use if we wish to restore that snapshot again with the memory, we can't.

Comment 13 Arik 2020-06-25 14:22:52 UTC
(In reply to Michal Privoznik from comment #11)
> Anyway, when I do restore domain from a file I can see the owner changing
> (restore through virDomainRestoreFlags). But, I've went all the way down to
> v2.0.0 (released 1st July 2016) and the restore changed the owner of the
> file. IOW, libvirt has been chown()-ing the file it's restoring the domain
> from for the past 4 years (if not since forever).

In oVirt 3.3 we've tested: "For stateless VM, the memory will be restored on each run." [1]
This means the VM restores the memory from the same file each time it starts - so back then it worked.
But that was in mid-2013 (~July) so it may have changed in libvirt even before v2.0.0.
It's not covered in our tests and it's not a common flow users would do

[1] https://www.ovirt.org/develop/release-management/features/virt/ram-snapshots.html

Comment 14 Tamir 2020-07-09 14:02:25 UTC
Verified on RHV 4.4.1-10 (RHEL 8 hosts and engine)

Steps to Reproduce:
1. Create a data center, cluster, VM with protocol spice console
2. Run the VM
3. Find the VM's devices ids in the DB and save those ids. 
4. Access the VM's Console using virt viewer in protocol Spice
5. Create a directory and a file inside
6. Create a snapshot of the VM with memory and disks
7. Create another file inside the directory
8. Stop the VM
9. Preview the VM's snapshot
10. Run the VM and Access Console (I had to check the first file exists and the second isn't)
11. Stop the VM
12. Commit the VM's changes with memory and disks
13. Run VM
14. Access Console using virt viewer in protocol Spice
15. Find the VM's devices ids in the DB and save those ids and compare it with the previous data you collected.

Results:
The console has opened and it worked fine. The VM's devices ids are the same.

Comment 15 Tamir 2020-07-09 14:08:47 UTC
I created the VM with iso attached to the cd-rom and virt-scsi disk interface.

Comment 18 Arik 2020-07-16 06:47:05 UTC
The changed ownership on memory volumes issue moves to bz 1857554

Comment 19 Sandro Bonazzola 2020-08-05 06:24:57 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.