Bug 1273044 - VFIO: Snapshot of VM with GPU attached cannot be created.
Summary: VFIO: Snapshot of VM with GPU attached cannot be created.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 3.6.0.1
Hardware: Unspecified
OS: Unspecified
medium
high vote
Target Milestone: ovirt-3.6.3
: 3.6.3
Assignee: Martin Betak
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-19 12:49 UTC by Nisim Simsolo
Modified: 2016-02-29 16:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-18 11:01:17 UTC
oVirt Team: Virt
rule-engine: ovirt-3.6.z+
rule-engine: planning_ack+
tjelinek: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine.log (11.37 MB, text/plain)
2015-10-19 12:50 UTC, Nisim Simsolo
no flags Details
vdsm.log (12.86 MB, text/plain)
2015-10-19 12:52 UTC, Nisim Simsolo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 49644 0 master MERGED backend: Disable Live Snapshot on VMs with PCI passthrough 2020-06-29 12:48:03 UTC
oVirt gerrit 50018 0 ovirt-engine-3.6 MERGED backend: Disable Live Snapshot on VMs with PCI passthrough 2020-06-29 12:48:03 UTC

Description Nisim Simsolo 2015-10-19 12:49:39 UTC
Description of problem:
Snapshot of VM with GPU attached cannot be created.
- Relevant only for PCI devices.
- Relevant for both windows and linux VMs.

Version-Release number of selected component (if applicable):
rhevm-3.6.0.1-0.1.el6
sanlock-3.2.4-1.el7.x86_64
vdsm-4.17.9-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7.x86_64
libvirt-client-1.2.17-5.el7.x86_64

How reproducible:
Consistently.

Steps to Reproduce:
1. Run VM with GPU device attached.
2. Try to create snapshot.
3.

Actual results:
Snapshot failed:

- engine log: 2015-10-19 15:20:16,730 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-18) [2fbfc07e] Correlation ID: null, Call Stack: null, Custom Event ID: -1
, Message: VDSM intel-vfio.tlv.redhat.com command failed: Snapshot failed

- vdsm.log: Thread-2855::ERROR::2015-10-19 15:20:42,616::vm::3158::virt.vm::(snapshot) vmId=`103623d3-9e88-42f5-9fcf-3aea6c4ace96`::Unable to take snapshot
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3156, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: Requested operation is not valid: domain has assigned non-USB host devices

Expected results:
Snapshot should be created.

Additional info:
Vdsm and engine logs attached.
vmId (win10_intel1): '103623d3-9e88-42f5-9fcf-3aea6c4ace96'

Issue occurred at (engine time): 15:20:11,379

Comment 1 Nisim Simsolo 2015-10-19 12:50:53 UTC
Created attachment 1084401 [details]
engine.log

Comment 2 Nisim Simsolo 2015-10-19 12:52:06 UTC
Created attachment 1084402 [details]
vdsm.log

Comment 3 Red Hat Bugzilla Rules Engine 2015-10-20 15:14:13 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Yaniv Lavi 2015-10-29 12:45:50 UTC
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.

Comment 5 Martin Polednik 2015-12-01 10:16:10 UTC
We should disable snapshots for VMs with non-USB hostdev assigned (as libvirt states). Moving to Martin as that is an engine request.

Comment 6 Red Hat Bugzilla Rules Engine 2016-01-28 12:10:38 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 7 Red Hat Bugzilla Rules Engine 2016-01-28 12:14:10 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 8 Red Hat Bugzilla Rules Engine 2016-01-28 12:18:36 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 9 Red Hat Bugzilla Rules Engine 2016-01-28 12:24:15 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 10 Red Hat Bugzilla Rules Engine 2016-01-28 12:39:22 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 11 Red Hat Bugzilla Rules Engine 2016-01-28 13:02:13 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 12 Tomas Jelinek 2016-01-29 07:27:03 UTC
moving back to on_qa with the correct target release

Comment 13 Nisim Simsolo 2016-02-11 15:28:28 UTC
Verified using builds: 
rhevm-3.6.3.1-0.1.el6
sanlock-3.2.4-1.el7.x86_64
libvirt-client-1.2.17-13.el7_2.2.x86_64
vdsm-4.17.20-0.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64

Verification scenario: 
1. Run VM with GPU attached and try to create a snapshot.
Snapshot creation rejected by webadmin with the next error (dialog operation canceled message): 
    "Error while executing action:
    1_VFIO_rhel7_amd:
    Cannot create Snapshot. VM has PCI host devices attached."
2. Run VM with USB passthrough only and create a snapshot.
Verify snapshot created and preview/commit/clone actions can be done on it.
3. Run VM with both PCI and USB devices attached to it. Try to create a snapshot.
Verify snapshot creation rejected by webadmin, with the same error as in step 1.
4. Power off VM, detach all devices, run VM and create snapshot.
Verify snapshot created and preview/commit/clone actions can be done on it.

Comment 14 Michal Skrivanek 2016-02-12 09:14:07 UTC
(In reply to Nisim Simsolo from comment #13)
> 2. Run VM with USB passthrough only and create a snapshot.
> Verify snapshot created and preview/commit/clone actions can be done on it.

I don't think we could support USB passthrough operations. I think we lack the checks on resume/dst for the same host device
Martin?

Comment 15 Martin Betak 2016-02-29 16:03:03 UTC
@Michal: after discussion with mpolednik and libvirt devs this seems to be safe for the guest - the only risk would be that some pending writes may not be persisted for "stateful" devices (e.g. usb stick) but otherwise should be equivalent to "Not removing usb device safely" -> maybe worth a warning in the frontend. Otherwise the usb drivers/guest OSs seem to be quite used to this kind of plug-n-play behavior.


Note You need to log in before you can comment on or make changes to this bug.