Bug 1595636 - vdsm-hook-vfio-mdev failed to run VM with Intel GVT-g device.
Summary: vdsm-hook-vfio-mdev failed to run VM with Intel GVT-g device.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.2.6
: ---
Assignee: Milan Zamazal
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-27 08:55 UTC by Nisim Simsolo
Modified: 2018-09-03 15:07 UTC (History)
5 users (show)

Fixed In Version: v4.20.37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-03 15:07:27 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
mtessun: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
vdsm.log (623.47 KB, application/x-xz)
2018-06-27 08:56 UTC, Nisim Simsolo
no flags Details
engine.log (784.29 KB, application/x-gzip)
2018-06-27 08:56 UTC, Nisim Simsolo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 92674 0 master MERGED virt: Don't crash on missing vGPU info 2018-08-03 12:47:57 UTC
oVirt gerrit 93479 0 ovirt-4.2 MERGED virt: Don't crash on missing vGPU info 2018-08-07 14:53:14 UTC

Description Nisim Simsolo 2018-06-27 08:55:08 UTC
Description of problem:
Trying to run VM with mdev_type GVTg failed because there is no file named "name" under /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/

- vdsm.log ERROR:
2018-06-26 14:48:32,547+0300 ERROR (vm/c4608698) [virt.vm] (vmId='c4608698-f036-4345-b52a-c71c2cb4c00c') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862, in _run
    self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 153, in before_vm_start
    return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in _runHooksDir
    raise exception.HookError(err)
HookError: Hook Error: ('',)

- engine.log ERROR
2018-06-26 14:48:34,373+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM 1rhel7_sealed_UI is down with error. Exit message: Hook Error: ('',).


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.4.5-0.1.el7_3
vdsm-4.20.31-1.el7ev.x86_64
libvirt-client-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
vdsm-hook-vfio-mdev-4.20.31-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add host with Intel GVT-g enabled to RHVM
2. Edit VM -> custom properties and add mdev_type -> i915-GVTg_V5_$
3. Run VM

Actual results:
VM failed to run

Expected results:
VM should be able to run with GVT-g device

Additional info:
vdsm.log and engine.log attached

Comment 1 Nisim Simsolo 2018-06-27 08:56:02 UTC
Created attachment 1454986 [details]
vdsm.log

Comment 2 Nisim Simsolo 2018-06-27 08:56:27 UTC
Created attachment 1454987 [details]
engine.log

Comment 3 Michal Skrivanek 2018-06-27 09:28:20 UTC
what are the available types then? what does this return:

for device in /sys/class/mdev_bus/*; do for mdev_type in \
$device/mdev_supported_types/*; do echo "mdev_type: \
\"$(basename $mdev_type)\" --- description: $(cat $mdev_type/description)"; \
done; done

Comment 4 Michal Skrivanek 2018-06-27 09:29:59 UTC
also, what's the actual content of your /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/ directory

Comment 5 Nisim Simsolo 2018-06-27 10:23:42 UTC
(In reply to Michal Skrivanek from comment #3)
> what are the available types then? what does this return:
> 
> for device in /sys/class/mdev_bus/*; do for mdev_type in \
> $device/mdev_supported_types/*; do echo "mdev_type: \
> \"$(basename $mdev_type)\" --- description: $(cat $mdev_type/description)"; \
> done; done

mdev_type: "i915-GVTg_V5_4" --- description: low_gm_size: 128MB
high_gm_size: 512MB
fence: 4
resolution: 1920x1200
weight: 4
mdev_type: "i915-GVTg_V5_8" --- description: low_gm_size: 64MB
high_gm_size: 384MB
fence: 4
resolution: 1024x768
weight: 2

Comment 6 Nisim Simsolo 2018-06-27 10:25:43 UTC
(In reply to Michal Skrivanek from comment #4)
> also, what's the actual content of your
> /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/
> directory

# ls -l /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/
total 0
-r--r--r--. 1 root root 4096 Jun 26 16:59 available_instances
--w-------. 1 root root 4096 Jun 26 17:37 create
-r--r--r--. 1 root root 4096 Jun 26 16:59 description
-r--r--r--. 1 root root 4096 Jun 26 16:59 device_api
drwxr-xr-x. 2 root root    0 Jun 27 10:54 devices
#

Comment 7 Milan Zamazal 2018-06-27 11:07:20 UTC
The problem is missing `name' file in i915-GVTg_V5_4 directory. mdev hook and Vdsm assume that the file is present, which is apparently not the case with Intel.

Comment 8 Martin Tessun 2018-06-27 11:30:25 UTC
Idea to fix it: If no name is present, just use the directory name (in this case i915-GVTg_V5_4)

Comment 9 Milan Zamazal 2018-06-27 11:46:06 UTC
OK, thank you for the idea, I'll try it.

Comment 10 Alex Williamson 2018-06-27 14:35:37 UTC
Not that this is under debate, but note that the kernel documentation shows both the name and description as optional:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-vfio-mdev

Clearly we shouldn't depend on optional attributes.

Comment 11 Milan Zamazal 2018-06-27 15:34:06 UTC
Indeed, thank you for the reference!

Comment 12 Nisim Simsolo 2018-08-14 11:06:09 UTC
Verification build: 
ovirt-engine-4.2.6.1_SNAPSHOT-89.g295078e.0.scratch.master.el7ev.noarch
libvirt-client-3.9.0-14.el7_5.7.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.5.x86_64
vdsm-4.20.37-1.el7ev.x86_64
sanlock-3.6.0-1.el7.x86_64

Verification HW:
VGA compatible controller: Intel Corporation HD Graphics 530
Model name: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz

Verification scenario: 
1. Verify GVTg mdev devices are listed under /sys/class/mdev_bus/0000\:00\:0X.0/mdev_supported_types/
2. Browse Webadmin -> edit VM, add mdev_type hook with GVTg device name
3. Run VM
4. Verify VM is running properly and no errors related in vdsm.log and engine.log
Verify GVTg device is added to VM PCI device with the correct kernel driver, for example:

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06)
	Subsystem: Lenovo Device [17aa:5050]
	Kernel driver in use: i915
	Kernel modules: i915
5. Reboot VM, After reboot has completed, verify VM is running properly with GVTg mdev device
6. Power off VM and run VM. Verify VM is running properly with GVTg mdev device


Note You need to log in before you can comment on or make changes to this bug.