Bug 1852433 - vGPU: VM failed to run with mdev_type instance.
Summary: vGPU: VM failed to run with mdev_type instance.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Documentation
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: rhev-docs@redhat.com
QA Contact: rhev-docs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1852718
TreeView+ depends on / blocked
 
Reported: 2020-06-30 12:20 UTC by Michal Skrivanek
Modified: 2021-12-14 04:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-14 04:02:07 UTC
oVirt Team: Docs
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-36934 0 None None None 2021-12-10 18:42:10 UTC

Description Michal Skrivanek 2020-06-30 12:20:34 UTC
This bug was initially created as a copy of Bug #1846343

I am copying this bug because: 



Description of problem:
After adding Nvidia vGPU instance using WebAdmin -> VM -> host devices -> manage vGPU button 
or using edit VM -> custom properties -> mdev_type, 
the VM failed to run with the next vdsm.log errors:
 
2020-06-11 15:04:14,007+0300 ERROR (vm/6099c96f) [virt.vm] (vmId='6099c96f-d79d-47ae-b39f-9489bc552cf0') The vm start process failed (vm:871)
Traceback (most recent call last):
.
.
libvirt.libvirtError: internal error: Process exited prior to exec: libvirt:  error : failed to access '/sys/bus/mdev/devices/e1f27070-b062-4ea3-a689-89e37a56f677/iommu_group': No such file or directory

2020-06-11 15:04:18,533+0300 ERROR (jsonrpc/1) [root] Couldn't parse NVDIMM device data (hostdev:755)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/common/hostdev.py", line 753, in list_nvdimms
    data = json.loads(output)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
--------------------------

vGPU Nvidia drivers are installed and Nvidia service is running.
also, it is possible to see vGPU instances in the host, for example:
# /home/nsimsolo/vgpu_instances1.sh 
mdev_type: nvidia-11 --- description: num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0B
mdev_type: nvidia-12 --- description: num_heads=2, frl_config=60, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0Q
mdev_type: nvidia-13 --- description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=8 --- name: GRID M60-1A
mdev_type: nvidia-14 --- description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=8 --- name: GRID M60-1B
----------------

This issue is not related to emulated machine type (issue occured on pc-i440fx and Q35)

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.1.2-0.10.el8ev
vdsm-4.40.19-1.el8ev.x86_64
libvirt-daemon-6.0.0-22.module+el8.2.1+6815+1c792dc8.x86_64
qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64
Nvidia host drivers (Tesla M60): NVIDIA-vGPU-rhel-8.2-450.36.01.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Browse Webadmin -> click on VM name -> host devices tab -> manage vGPU, select Nvidia instane and click "save" button.
2. Run VM
3.

Actual results:
VM failed to run 

Expected results:
VM should run with attached vGPU device.

Additional info:
vdsm.log and engine.log attached

Comment 1 Michal Skrivanek 2020-06-30 12:23:47 UTC
we need to document the workaround until bug 1846343 is fixed - either https://bugzilla.redhat.com/show_bug.cgi?id=1846343#c18 or https://bugzilla.redhat.com/show_bug.cgi?id=1846343#c24, Milan, your call


Note You need to log in before you can comment on or make changes to this bug.