Bug 1905417
| Summary: | vGPU: VM failed to run with mdev_type instance (java NPE in engine.log) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Nisim Simsolo <nsimsolo> | ||||||||
| Component: | BLL.Virt | Assignee: | Liran Rotenberg <lrotenbe> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nisim Simsolo <nsimsolo> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 4.4.4.3 | CC: | ahadas, bugs, gveitmic, lrotenbe, nsimsolo | ||||||||
| Target Milestone: | ovirt-4.4.4 | Flags: | pm-rhel:
ovirt-4.4+
pm-rhel: planning_ack+ ahadas: devel_ack+ pm-rhel: testing_ack+ |
||||||||
| Target Release: | 4.4.4.4 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | ovirt-engine-4.4.4.4 | Doc Type: | Bug Fix | ||||||||
| Doc Text: |
Previously, running a VM that has MDEV device would result in NullPointerException. Now, the VM will boot as expected without any error.
|
Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2021-01-12 16:23:55 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Nisim Simsolo
2020-12-08 09:59:56 UTC
Created attachment 1737561 [details]
engine.log
Created attachment 1737562 [details]
vdsm.log
Created attachment 1737563 [details]
VM QEMU log
The VM failed to start due to NPE getting the host device capability.
When we try to write the max memory value to the domain we calculate the NVDIMM, if exists.
For that we pass on the host devices and search for it:
(VmInfoBuildUtils::getNvdimmTotalSize)
if (hostDevice.getCapability().equals("nvdimm"))
But if the capability is null, we will get that NPE.
From the engine log it looks like the problematic one is `hostdev0`, which is the MDEV host device.
The question is whether it's OK not having capability for MDEV or not. Or, if some kernel modules are not loaded which can cause it.
Thanks to Nisim I could debug his environment. The problem is that mdev is an host device but the engine doesn't know of. It is on the VM devices and when trying to do: HostDevice hostDevice = hostDevicesSupplier.get().get(device.getDevice()); The hostDevice will be null. We should check the device exists before checking if it's nvdimm. Verified: ovirt-engine-4.4.4.5-0.10.el8ev vdsm-4.40.40-1.el8ev.x86_64 qemu-kvm-5.1.0-14.module+el8.3.0+8790+80f9c6d8.1.x86_64 libvirt-daemon-6.6.0-7.1.module+el8.3.0+8852+b44fca9f.x86_64 Nvidia drivers for host and VM: grid12.0_beta host: NVIDIA-vGPU-rhel-8.3-460.26.x86_64 VM: NVIDIA-Linux-x86_64-460.26-grid.run This bugzilla is included in oVirt 4.4.4 release, published on December 21st 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |