Bug 1595636

Summary: vdsm-hook-vfio-mdev failed to run VM with Intel GVT-g device.
Product: [oVirt] vdsm Reporter: Nisim Simsolo <nsimsolo>
Component: CoreAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Nisim Simsolo <nsimsolo>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: alex.williamson, bugs, mtessun, mzamazal, nsimsolo
Target Milestone: ovirt-4.2.6Flags: rule-engine: ovirt-4.2+
mtessun: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.20.37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-03 15:07:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm.log
none
engine.log none

Description Nisim Simsolo 2018-06-27 08:55:08 UTC
Description of problem:
Trying to run VM with mdev_type GVTg failed because there is no file named "name" under /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/

- vdsm.log ERROR:
2018-06-26 14:48:32,547+0300 ERROR (vm/c4608698) [virt.vm] (vmId='c4608698-f036-4345-b52a-c71c2cb4c00c') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862, in _run
    self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 153, in before_vm_start
    return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in _runHooksDir
    raise exception.HookError(err)
HookError: Hook Error: ('',)

- engine.log ERROR
2018-06-26 14:48:34,373+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM 1rhel7_sealed_UI is down with error. Exit message: Hook Error: ('',).


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.4.5-0.1.el7_3
vdsm-4.20.31-1.el7ev.x86_64
libvirt-client-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
vdsm-hook-vfio-mdev-4.20.31-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add host with Intel GVT-g enabled to RHVM
2. Edit VM -> custom properties and add mdev_type -> i915-GVTg_V5_$
3. Run VM

Actual results:
VM failed to run

Expected results:
VM should be able to run with GVT-g device

Additional info:
vdsm.log and engine.log attached

Comment 1 Nisim Simsolo 2018-06-27 08:56:02 UTC
Created attachment 1454986 [details]
vdsm.log

Comment 2 Nisim Simsolo 2018-06-27 08:56:27 UTC
Created attachment 1454987 [details]
engine.log

Comment 3 Michal Skrivanek 2018-06-27 09:28:20 UTC
what are the available types then? what does this return:

for device in /sys/class/mdev_bus/*; do for mdev_type in \
$device/mdev_supported_types/*; do echo "mdev_type: \
\"$(basename $mdev_type)\" --- description: $(cat $mdev_type/description)"; \
done; done

Comment 4 Michal Skrivanek 2018-06-27 09:29:59 UTC
also, what's the actual content of your /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/ directory

Comment 5 Nisim Simsolo 2018-06-27 10:23:42 UTC
(In reply to Michal Skrivanek from comment #3)
> what are the available types then? what does this return:
> 
> for device in /sys/class/mdev_bus/*; do for mdev_type in \
> $device/mdev_supported_types/*; do echo "mdev_type: \
> \"$(basename $mdev_type)\" --- description: $(cat $mdev_type/description)"; \
> done; done

mdev_type: "i915-GVTg_V5_4" --- description: low_gm_size: 128MB
high_gm_size: 512MB
fence: 4
resolution: 1920x1200
weight: 4
mdev_type: "i915-GVTg_V5_8" --- description: low_gm_size: 64MB
high_gm_size: 384MB
fence: 4
resolution: 1024x768
weight: 2

Comment 6 Nisim Simsolo 2018-06-27 10:25:43 UTC
(In reply to Michal Skrivanek from comment #4)
> also, what's the actual content of your
> /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_#/
> directory

# ls -l /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/
total 0
-r--r--r--. 1 root root 4096 Jun 26 16:59 available_instances
--w-------. 1 root root 4096 Jun 26 17:37 create
-r--r--r--. 1 root root 4096 Jun 26 16:59 description
-r--r--r--. 1 root root 4096 Jun 26 16:59 device_api
drwxr-xr-x. 2 root root    0 Jun 27 10:54 devices
#

Comment 7 Milan Zamazal 2018-06-27 11:07:20 UTC
The problem is missing `name' file in i915-GVTg_V5_4 directory. mdev hook and Vdsm assume that the file is present, which is apparently not the case with Intel.

Comment 8 Martin Tessun 2018-06-27 11:30:25 UTC
Idea to fix it: If no name is present, just use the directory name (in this case i915-GVTg_V5_4)

Comment 9 Milan Zamazal 2018-06-27 11:46:06 UTC
OK, thank you for the idea, I'll try it.

Comment 10 Alex Williamson 2018-06-27 14:35:37 UTC
Not that this is under debate, but note that the kernel documentation shows both the name and description as optional:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-vfio-mdev

Clearly we shouldn't depend on optional attributes.

Comment 11 Milan Zamazal 2018-06-27 15:34:06 UTC
Indeed, thank you for the reference!

Comment 12 Nisim Simsolo 2018-08-14 11:06:09 UTC
Verification build: 
ovirt-engine-4.2.6.1_SNAPSHOT-89.g295078e.0.scratch.master.el7ev.noarch
libvirt-client-3.9.0-14.el7_5.7.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.5.x86_64
vdsm-4.20.37-1.el7ev.x86_64
sanlock-3.6.0-1.el7.x86_64

Verification HW:
VGA compatible controller: Intel Corporation HD Graphics 530
Model name: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz

Verification scenario: 
1. Verify GVTg mdev devices are listed under /sys/class/mdev_bus/0000\:00\:0X.0/mdev_supported_types/
2. Browse Webadmin -> edit VM, add mdev_type hook with GVTg device name
3. Run VM
4. Verify VM is running properly and no errors related in vdsm.log and engine.log
Verify GVTg device is added to VM PCI device with the correct kernel driver, for example:

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06)
	Subsystem: Lenovo Device [17aa:5050]
	Kernel driver in use: i915
	Kernel modules: i915
5. Reboot VM, After reboot has completed, verify VM is running properly with GVTg mdev device
6. Power off VM and run VM. Verify VM is running properly with GVTg mdev device