Bug 1846343
| Summary: | vGPU: VM failed to run with mdev_type instance. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Nisim Simsolo <nsimsolo> | ||||||||
| Component: | dracut | Assignee: | Lukáš Nykrýn <lnykryn> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 8.2 | CC: | abpatil, ahadas, alex.williamson, bugs, coli, dmarchan, dracut-maint-list, fsumsal, hbarcomb, juzhang, knoel, lnykryn, michal.skrivanek, mkalinin, mzamazal, nsimsolo, ovasik, zhguo | ||||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||
| Target Release: | 8.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | dracut-049-92.git20200702.el8 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-11-04 01:42:48 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1852718 | ||||||||||
| Attachments: |
|
||||||||||
Created attachment 1696752 [details]
vdsm.log
Created attachment 1696754 [details]
engine.log
Created attachment 1696755 [details]
VM QEMU log
> 2020-06-11 15:04:18,533+0300 ERROR (jsonrpc/1) [root] Couldn't parse NVDIMM
> device data (hostdev:755)
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/common/hostdev.py", line 753,
> in list_nvdimms
> data = json.loads(output)
> File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
> return _default_decoder.decode(s)
> File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
> raise JSONDecodeError("Expecting value", s, err.value) from None
> json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
> --------------------------
Milan, please keep me honest here - I think the error above wouldn't prevent the VM from starting but this one is:
2020-06-11 15:16:28,067+0300 ERROR (vm/0cc01cbb) [virt.vm] (vmId='0cc01cbb-8cf0-499d-b7b9-afb822cde4f7') The vm start process failed (vm:871)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 801, in _startUnderlyingVm
self._run()
File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2608, in _run
dom.createWithFlags(flags)
File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1265, in createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirt.libvirtError: internal error: Process exited prior to exec: libvirt: error : failed to access '/sys/bus/mdev/devices/71cba851-7aad-44e3-be0d-6046e4aa0c34/iommu_group': No such file or directory
2020-06-11 15:16:28,067+0300 INFO (vm/0cc01cbb) [virt.vm] (vmId='0cc01cbb-8cf0-499d-b7b9-afb822cde4f7') Changed state to Down: internal error: Process exited prior to exec: libvirt: error : failed to access '/sys/bus/mdev/devices/71cba851-7aad-44e3-be0d-6046e4aa0c34/iommu_group': No such file or directory (code=1) (vm:1629)
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Arik, you're right, the JSON traceback is just a harmless annoyance in the log (already fixed in Vdsm master), the real error is the libvirt error. I think I've seen such an error in the past when the host wasn't properly configured for mdev. But it can also be a platform error or a change in el8. Nisim, could you please check the host was booted with proper kernel command line options? I think `intel_iommu=on iommu=pt' should be present. (In reply to Milan Zamazal from comment #6) > Nisim, could you please check the host was booted with proper kernel command > line options? I think `intel_iommu=on iommu=pt' should be present. it's running with proper kernel cmdline: # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.7.1.el8_2.x86_64 root=/dev/mapper/rhel_lion01-root ro crashkernel=auto resume=/dev/mapper/rhel_lion01-swap rd.lvm.lv=rhel_lion01/root rd.lvm.lv=rhel_lion01/swap rhgb quiet rdblacklist=nouveau intel_iommu=on And both Nvidia vGPUs in this host are installed and running: # nvidia-smi Thu Jun 11 16:50:29 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.36.01 Driver Version: 450.36.01 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:84:00.0 Off | Off | | N/A 38C P8 24W / 150W | 14MiB / 8191MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 On | 00000000:85:00.0 Off | Off | | N/A 33C P8 24W / 150W | 14MiB / 8191MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 On | 00000000:8B:00.0 Off | Off | | N/A 35C P8 24W / 150W | 14MiB / 8191MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 On | 00000000:8C:00.0 Off | Off | | N/A 46C P8 24W / 150W | 14MiB / 8191MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | OK, Nisim, let's check a bit more: - Which vfio kernel modules are loaded? Specifically, is vfio_mdev loaded? - Is there anything in /sys/kernel/iommu_groups/? - Is there anything in /sys/class/iommu/? (In reply to Milan Zamazal from comment #8) > OK, Nisim, let's check a bit more: > > - Which vfio kernel modules are loaded? Specifically, is vfio_mdev loaded? > - Is there anything in /sys/kernel/iommu_groups/? > - Is there anything in /sys/class/iommu/? # lsmod | grep nvidia_vgpu_vfio nvidia_vgpu_vfio 53248 0 nvidia 19501056 10 nvidia_vgpu_vfio mdev 20480 1 nvidia_vgpu_vfio vfio 36864 1 nvidia_vgpu_vfio # ls /sys/kernel/iommu_groups/ 0 10 12 14 16 18 2 21 23 25 27 29 30 32 34 36 38 4 41 43 45 47 49 50 52 54 56 58 6 61 63 7 9 1 11 13 15 17 19 20 22 24 26 28 3 31 33 35 37 39 40 42 44 46 48 5 51 53 55 57 59 60 62 64 8 # cat /sys/kernel/iommu_groups/55/devices/0000\:84\:00.0/mdev_supported_types/nvidia-22/description num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=5120x2880, max_instance=1 # ls /sys/class/iommu/ dmar0 dmar1 # ls /sys/class/iommu/dmar0/devices/ 0000:80:01.0 0000:80:03.0 0000:80:04.1 0000:80:04.3 0000:80:04.5 0000:80:04.7 0000:81:00.1 0000:83:08.0 0000:84:00.0 0000:86:00.0 0000:87:10.0 0000:8a:08.0 0000:8b:00.0 0000:80:02.0 0000:80:04.0 0000:80:04.2 0000:80:04.4 0000:80:04.6 0000:81:00.0 0000:82:00.0 0000:83:10.0 0000:85:00.0 0000:87:08.0 0000:89:00.0 0000:8a:10.0 0000:8c:00.0 How about the vfio_mdev module? (In reply to Milan Zamazal from comment #10) > How about the vfio_mdev module? I can see it in lsmod, but: # modinfo vfio_mdev filename: /lib/modules/4.18.0-193.7.1.el8_2.x86_64/kernel/drivers/vfio/mdev/vfio_mdev.ko.xz description: VFIO based driver for Mediated device author: NVIDIA Corporation license: GPL v2 version: 0.1 rhelversion: 8.2 srcversion: 20FFF915712EA2E529A6752 depends: mdev,vfio intree: Y name: vfio_mdev vermagic: 4.18.0-193.7.1.el8_2.x86_64 SMP mod_unload modversions sig_id: PKCS#7 signer: Red Hat Enterprise Linux kernel signing key sig_key: 70:5C:5F:89:3D:91:85:84:58:94:B6:EC:AE:44:FF:B7:8A:27:82:5C sig_hashalgo: sha256 signature: 0D:98:63:4E:B0:22:B7:FD:D1:D2:1F:2B:17:57:B0:CB:7B:E4:C2:65: After loading vfio_mdev kernel module and rebooting the host,
it is now possible to run VM with vGPU:
# lsmod | grep nvidia_vgpu_vfio
nvidia_vgpu_vfio 53248 19
nvidia 19501056 145 nvidia_vgpu_vfio
mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio
vfio 36864 6 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
# ls -l /sys/bus/mdev/devices/
total 0
lrwxrwxrwx. 1 root root 0 Jun 11 18:14 06b068d1-92ae-469e-bdce-9243050092ef -> ../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/0000:83:08.0/0000:84:00.0/06b068d1-92ae-469e-bdce-9243050092ef
# nvidia-smi
Thu Jun 11 18:31:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.01 Driver Version: 450.36.01 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:84:00.0 Off | Off |
| N/A 39C P8 24W / 150W | 2050MiB / 8191MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:85:00.0 Off | Off |
| N/A 33C P8 24W / 150W | 14MiB / 8191MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:8B:00.0 Off | Off |
| N/A 36C P8 24W / 150W | 14MiB / 8191MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:8C:00.0 Off | Off |
| N/A 47C P8 24W / 150W | 14MiB / 8191MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 9640 C+G vgpu 2031MiB |
+-----------------------------------------------------------------------------+
#
So the problem is vfio_mdev kernel module is not loaded by default. It apparently used to be loaded in 4.3 automatically, maybe there used to be a module dependency that caused it to be loaded, which is not present anymore. What options do we have to make the module available? We can load it unconditionally. While it can probably be always loaded and may be harmless (is it?), it's of no use on hosts unless related hardware is used. Loading it on a vGPU VM start looks weird. But maybe we could check its presence on VM start failure and provide hint in the log if the module is missing? The question is whether oVirt should be responsible for loading the module at all. If the user is responsible for installing the vGPU drivers, perhaps the user should be responsible for making the module loaded too and we should just mention the possible problem in the documentation? And which component or entity is responsible for adding intel_iommu=on kernel command line option -- perhaps the same entity should add the module? Opinions? starts with documenting it and clearer error message we have kernel cmdline configuration in UI so maybe we can do that there? though...it rather looks like a bug. maybe libvirt, kernel. (In reply to Michal Skrivanek from comment #15) > starts with documenting it and clearer error message +1 > > we have kernel cmdline configuration in UI so maybe we can do that there? > though...it rather looks like a bug. maybe libvirt, kernel. So I think the question is whether it is really a bug or an intentional change - in case of the latter we should probably report this on the host capabilities and schedule VMs accordingly, no? no, modules are supposed to be loaded automatically by any decent OS. If there's a good reason why this oone cannot be then please find out and document that `mdev' module has a soft post module dependency on `vfio_mdev'. Indeed, if I run `modprobe mdev' on my el8 machine then `vfio_mdev' gets (and apparently remains) loaded as well. I'll need to look into Nisim's environment why `vfio_mdev' is not loaded when `mdev' is. [Alex, we have trouble with vfio_mdev module not being loaded automatically on a vGPU host.] Looking at Nisim's environment, which should be basically a freshly installed RHEL 8 machine with Nvidia drivers installed from rpm: - After reboot, vfio_mdev module is not loaded although nvidia_vgpu_vfio is: # lsmod | egrep '(vfio|mdev)' nvidia_vgpu_vfio 53248 0 nvidia 19501056 10 nvidia_vgpu_vfio mdev 20480 1 nvidia_vgpu_vfio vfio 36864 1 nvidia_vgpu_vfio - Let's remove nvidia_vgpu_vfio, looks all right: # modprobe -r nvidia_vgpu_vfio # lsmod | egrep '(vfio|mdev)' - Let's insert nvidia_vgpu_vfio again: # modprobe nvidia_vgpu_vfio # lsmod | egrep '(vfio|mdev)' nvidia_vgpu_vfio 53248 0 vfio_mdev 16384 0 mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio vfio_iommu_type1 32768 0 vfio 36864 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 nvidia 19501056 10 nvidia_vgpu_vfio Both vfio_mdev and vfio_iommu_type1 modules are loaded now! Alex, I was told you may be able to help. I can see in the kernel module dependencies that mdev soft depends on vfio_mdev. How is it possible that vfio_mdev gets loaded when nvidia_vgpu_vfio is inserted manually, while not after boot although nvidia_vgpu_info is present? Any idea, can there be something wrong with initramfs or anything else? BTW, it used to work in RHEL 7. (In reply to Milan Zamazal from comment #19) > [Alex, we have trouble with vfio_mdev module not being loaded automatically > on a vGPU host.] > > Looking at Nisim's environment, which should be basically a freshly > installed RHEL 8 machine with Nvidia drivers installed from rpm: > > - After reboot, vfio_mdev module is not loaded although nvidia_vgpu_vfio is: > > # lsmod | egrep '(vfio|mdev)' > nvidia_vgpu_vfio 53248 0 > nvidia 19501056 10 nvidia_vgpu_vfio > mdev 20480 1 nvidia_vgpu_vfio > vfio 36864 1 nvidia_vgpu_vfio > > - Let's remove nvidia_vgpu_vfio, looks all right: > > # modprobe -r nvidia_vgpu_vfio > # lsmod | egrep '(vfio|mdev)' > > - Let's insert nvidia_vgpu_vfio again: > > # modprobe nvidia_vgpu_vfio > # lsmod | egrep '(vfio|mdev)' > nvidia_vgpu_vfio 53248 0 > vfio_mdev 16384 0 > mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio > vfio_iommu_type1 32768 0 > vfio 36864 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 > nvidia 19501056 10 nvidia_vgpu_vfio > > Both vfio_mdev and vfio_iommu_type1 modules are loaded now! > > Alex, I was told you may be able to help. I can see in the kernel module > dependencies that mdev soft depends on vfio_mdev. How is it possible that > vfio_mdev gets loaded when nvidia_vgpu_vfio is inserted manually, while not > after boot although nvidia_vgpu_info is present? Any idea, can there be > something wrong with initramfs or anything else? BTW, it used to work in > RHEL 7. In the second case, you're showing that the kernel module dependencies are all correct. My guess would be that some sort of local configuration on this system is causing the nvidia modules to get loaded from the initramfs but the soft dependencies aren't present. Is there anything under /etc/dracut.conf.d that might explain this? Or perhaps inspect the initrfram fs with lsinitrd? Cc Zhiyi who likely also has experience here. (In reply to Alex Williamson from comment #20) > (In reply to Milan Zamazal from comment #19) > > [Alex, we have trouble with vfio_mdev module not being loaded automatically > > on a vGPU host.] > > > > Looking at Nisim's environment, which should be basically a freshly > > installed RHEL 8 machine with Nvidia drivers installed from rpm: > > > > - After reboot, vfio_mdev module is not loaded although nvidia_vgpu_vfio is: > > > > # lsmod | egrep '(vfio|mdev)' > > nvidia_vgpu_vfio 53248 0 > > nvidia 19501056 10 nvidia_vgpu_vfio > > mdev 20480 1 nvidia_vgpu_vfio > > vfio 36864 1 nvidia_vgpu_vfio > > > > - Let's remove nvidia_vgpu_vfio, looks all right: > > > > # modprobe -r nvidia_vgpu_vfio > > # lsmod | egrep '(vfio|mdev)' > > > > - Let's insert nvidia_vgpu_vfio again: > > > > # modprobe nvidia_vgpu_vfio > > # lsmod | egrep '(vfio|mdev)' > > nvidia_vgpu_vfio 53248 0 > > vfio_mdev 16384 0 > > mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio > > vfio_iommu_type1 32768 0 > > vfio 36864 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 > > nvidia 19501056 10 nvidia_vgpu_vfio > > > > Both vfio_mdev and vfio_iommu_type1 modules are loaded now! > > > > Alex, I was told you may be able to help. I can see in the kernel module > > dependencies that mdev soft depends on vfio_mdev. How is it possible that > > vfio_mdev gets loaded when nvidia_vgpu_vfio is inserted manually, while not > > after boot although nvidia_vgpu_info is present? Any idea, can there be > > something wrong with initramfs or anything else? BTW, it used to work in > > RHEL 7. > > In the second case, you're showing that the kernel module dependencies are > all correct. My guess would be that some sort of local configuration on > this system is causing the nvidia modules to get loaded from the initramfs > but the soft dependencies aren't present. Is there anything under > /etc/dracut.conf.d that might explain this? Or perhaps inspect the > initrfram fs with lsinitrd? Cc Zhiyi who likely also has experience here. Yes, I also hit this issue (In reply to Guo, Zhiyi from comment #21) > (In reply to Alex Williamson from comment #20) > > (In reply to Milan Zamazal from comment #19) > > > [Alex, we have trouble with vfio_mdev module not being loaded automatically > > > on a vGPU host.] > > > > > > Looking at Nisim's environment, which should be basically a freshly > > > installed RHEL 8 machine with Nvidia drivers installed from rpm: > > > > > > - After reboot, vfio_mdev module is not loaded although nvidia_vgpu_vfio is: > > > > > > # lsmod | egrep '(vfio|mdev)' > > > nvidia_vgpu_vfio 53248 0 > > > nvidia 19501056 10 nvidia_vgpu_vfio > > > mdev 20480 1 nvidia_vgpu_vfio > > > vfio 36864 1 nvidia_vgpu_vfio > > > > > > - Let's remove nvidia_vgpu_vfio, looks all right: > > > > > > # modprobe -r nvidia_vgpu_vfio > > > # lsmod | egrep '(vfio|mdev)' > > > > > > - Let's insert nvidia_vgpu_vfio again: > > > > > > # modprobe nvidia_vgpu_vfio > > > # lsmod | egrep '(vfio|mdev)' > > > nvidia_vgpu_vfio 53248 0 > > > vfio_mdev 16384 0 > > > mdev 20480 2 vfio_mdev,nvidia_vgpu_vfio > > > vfio_iommu_type1 32768 0 > > > vfio 36864 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1 > > > nvidia 19501056 10 nvidia_vgpu_vfio > > > > > > Both vfio_mdev and vfio_iommu_type1 modules are loaded now! > > > > > > Alex, I was told you may be able to help. I can see in the kernel module > > > dependencies that mdev soft depends on vfio_mdev. How is it possible that > > > vfio_mdev gets loaded when nvidia_vgpu_vfio is inserted manually, while not > > > after boot although nvidia_vgpu_info is present? Any idea, can there be > > > something wrong with initramfs or anything else? BTW, it used to work in > > > RHEL 7. > > > > In the second case, you're showing that the kernel module dependencies are > > all correct. My guess would be that some sort of local configuration on > > this system is causing the nvidia modules to get loaded from the initramfs > > but the soft dependencies aren't present. Is there anything under > > /etc/dracut.conf.d that might explain this? Or perhaps inspect the > > initrfram fs with lsinitrd? Cc Zhiyi who likely also has experience here. > > Yes, I also hit this issue opps, submit the unfinished comment by mistake.. Yes, I also hit this issue in my environment. I think this is something cannot be reproduced with 8.2.1 tree RHEL-8.2.1-20200508.n.0(tested with tesla V100) and happen to recent eng release(RHEL-8.2.1-20200608.n.0, tested with tesla T4 + sr-iov mode). But try to downgrade the kernel to the one included with RHEL-8.2.1-20200508.n.0(4.18.0-193.2.1.el8_2), issue still happen /etc/dracut.conf.d is empty (as well as /etc/dracut.conf). But after unpacking the initramfs, I can see a difference in module dependencies. On the main system: 4.18.0-193.7.1.el8_2.x86_64/modules.order:kernel/drivers/vfio/mdev/vfio_mdev.ko 4.18.0-193.7.1.el8_2.x86_64/modules.dep:kernel/drivers/vfio/mdev/vfio_mdev.ko.xz: kernel/drivers/vfio/mdev/mdev.ko.xz kernel/drivers/vfio/vfio.ko.xz 4.18.0-193.7.1.el8_2.x86_64/modules.softdep:softdep mdev post: vfio_mdev While in initramfs: 4.18.0-193.7.1.el8_2.x86_64/modules.order:kernel/drivers/vfio/mdev/vfio_mdev.ko 4.18.0-193.7.1.el8_2.x86_64/modules.softdep:softdep mdev post: vfio_mdev So while the soft dependency is defined there, it's missing in modules.dep. When I generate an initrd image manually, the dependency is present in modules.dep. Alex, what else can I check? dracut appears to be failing us here and I don't see that this is a regression. The regression might simply be that something was installed that caused the initramfs for the kernel to be regenerated that wasn't previously. For a workaround you can create the following:
# cat /etc/dracut.conf.d/nvidia.conf
omit_drivers+="nvidia"
Then rebuild initramfs with 'dracut -f --regenerate-all'. On to the problem...
dracut seems to be identifying nvidia_vgpu_vfio as a module that it needs to add to the initramfs, presumably because of the alias:
# modinfo nvidia_vgpu_vfio
filename: /lib/modules/4.18.0-193.el8.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko
version: 440.87
supported: external
license: MIT
rhelversion: 8.2
srcversion: 5D4064D3E109D020922911D
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: nvidia,mdev,vfio
name: nvidia_vgpu_vfio
vermagic: 4.18.0-167.el8.x86_64 SMP mod_unload modversions
The nvidia module is similar (with a bunch of parm entries omitted here):
# modinfo nvidia
filename: /lib/modules/4.18.0-193.el8.x86_64/weak-updates/nvidia/nvidia.ko
alias: char-major-195-*
version: 440.87
supported: external
license: NVIDIA
rhelversion: 8.2
srcversion: EDD83534DD78C3B1B5A0F6E
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: ipmi_msghandler
name: nvidia
vermagic: 4.18.0-167.el8.x86_64 SMP mod_unload modversions
Our starting point is here:
depends: nvidia,mdev,vfio
The mdev and vfio modules are direct dependencies, those get added to the initramfs. The trouble starts here:
# modinfo mdev
filename: /lib/modules/4.18.0-193.el8.x86_64/kernel/drivers/vfio/mdev/mdev.ko.xz
softdep: post: vfio_mdev
# modinfo vfio
filename: /lib/modules/4.18.0-193.el8.x86_64/kernel/drivers/vfio/vfio.ko.xz
softdep: post: vfio_iommu_type1 vfio_iommu_spapr_tce
So our dependent modules have soft dependencies. This seems to be partial fixed by:
commit c38f9e980c1ee03151dd1c6602907c6228b78d30
Author: Harald Hoyer <harald>
Date: Tue Dec 4 10:02:45 2018 +0100
install/dracut-install.c: install module dependencies of dependencies
The trouble is that this carries forward a poor (imo) assumption made here when softdep support was first introduced:
commit 4cdee66c8ed5f82bbd0638e30d867318343b0e6c
Author: Jeremy Linton <lintonrjeremy>
Date: Mon Jul 2 23:25:05 2018 -0500
dracut-install: Support modules.softdep
Dracut uses the module deps to determine module dependencies
but that only works for modules with hard symbolic dependencies.
Some modules have dependencies created via callback API's or other
methods which aren't reflected in the modules.dep but rather in
modules.softdep through the use of "pre:" and "post:" commands
created in the kernel with MODULE_SOFTDEP().
Since in dracut we are only concerned about early boot, this patch
only looks at the pre: section of modules which are already being
inserted in the initrd under the assumption that the pre: section
lists dependencies required for the functionality of the module being
installed in the initrd.
That latter paragraph tries to make the argument that only pre: softdeps are required for functionality of the module, but we can see here that's not the case. In the case of these post: softdeps, the kernel module is making a request module call to provide the remainder of the functionality. In the case of mdev, the vfio-mdev driver is what bridges mdev devices into the vfio ecosystem. In the case of vfio, the softdep IOMMU backend drivers provides the functionality that actually makes vfio devices useful. If the module is not available when the kernel module makes a request for it, who would be responsible for manually loading that module later?
So in addition to pulling the functionality of c38f9e980c1e into RHEL, I think we also need something like:
diff --git a/install/dracut-install.c b/install/dracut-install.c
index 3d64ed7a..57f4c557 100644
--- a/install/dracut-install.c
+++ b/install/dracut-install.c
@@ -1484,6 +1484,8 @@ static int install_dependent_modules(struct kmod_list *modlist)
ret = kmod_module_get_softdeps(mod, &modpre, &modpost);
if (ret == 0)
ret = install_dependent_modules(modpre);
+ if (ret == 0)
+ ret = install_dependent_modules(modpost);
}
} else {
log_error("dracut_install '%s' '%s' ERROR", path, &path[kerneldirlen]);
@@ -1547,6 +1549,8 @@ static int install_module(struct kmod_module *mod)
ret = kmod_module_get_softdeps(mod, &modpre, &modpost);
if (ret == 0)
ret = install_dependent_modules(modpre);
+ if (ret == 0)
+ ret = install_dependent_modules(modpost);
}
return ret;
This completes what 4cdee66c8ed5 should have done originally so that we have pre: and post: softdep modules installed, both for the directly included module, but also for the dependent modules thanks to c38f9e980c1e.
Moving to RHEL8/dracut to accept or refute this solution.
Another piece of the mystery here is why we don't see this issue more regularly. In my testing with the GRID 10.2 GA driver I'm ONLY able to reproduce when I use the rpm install. When I use the run file install dracut never includes the nvidia modules in the initramfs therefore the modules are only loaded from the filesystem where all dependencies are available. It appears this is due to where the modules are installed. When using the rpm file, the modules reside here: /lib/modules/`uname -r`/weak-updates/nvidia/nvidia-vgpu-vfio.ko /lib/modules/`uname -r`/weak-updates/nvidia/nvidia.ko When using the run file, the modules are instead installed here: /lib/modules/`uname -r`/kernel/drivers/video/nvidia-vgpu-vfio.ko /lib/modules/`uname -r`/kernel/drivers/video/nvidia.ko If I use an rpm install and move the modules from the former location to the latter location (and update dependencies with 'depmod -a'), then dracut generates an initramfs WITHOUT the nvidia modules. If I use a run file install and move the modules to the "extra" directory for the kernel and update depmod, dracut generates an initramfs WITH the nvidia modules. I'm tempted to suspect this is due to: /etc/depmod.d/dist.conf: # # depmod.conf # # override default search ordering for kmod packaging search updates extra built-in weak-updates The comment and man page suggest this should only control ordering, for example as I read it, to allow modules in these directories to override modules that might be provided elsewhere. However, if I comment out this directive, the nvidia modules in the extra directory disappear from my initramfs. To test the opposite, my system includes an audio device making use of the snd-hda-intel driver. This driver does not exist in my initramfs by default, but if I copy the module to the extra directory, update depmod, and regenerate, it does now appear in the initramfs. If I do the same with a sound driver for which I don't have hardware, snd-emu10k1, the module does not appear in the initramfs. My hypothesis is therefore that any modules in the search path with an alias matching installed hardware will make it to the initramfs. Perhaps dracut folks can confirm this. Lukáši, what do you think about Alex's analysis? Do you have any idea how to proceed? It would be helpful to know about your plans and/or estimates so that we can handle the issue on the RHV side accordingly. I talked to Harald, dracut upstream, and he is fine with the patch Alex mentioned. I've made some slight modification, since the original version did not install the post modules, when there was an error with the pre ones. https://github.com/dracutdevs/dracut/pull/848 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (dracut bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4473 *** Bug 1898664 has been marked as a duplicate of this bug. *** |
Description of problem: After adding Nvidia vGPU instance using WebAdmin -> VM -> host devices -> manage vGPU button or using edit VM -> custom properties -> mdev_type, the VM failed to run with the next vdsm.log errors: 2020-06-11 15:04:14,007+0300 ERROR (vm/6099c96f) [virt.vm] (vmId='6099c96f-d79d-47ae-b39f-9489bc552cf0') The vm start process failed (vm:871) Traceback (most recent call last): . . libvirt.libvirtError: internal error: Process exited prior to exec: libvirt: error : failed to access '/sys/bus/mdev/devices/e1f27070-b062-4ea3-a689-89e37a56f677/iommu_group': No such file or directory 2020-06-11 15:04:18,533+0300 ERROR (jsonrpc/1) [root] Couldn't parse NVDIMM device data (hostdev:755) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/common/hostdev.py", line 753, in list_nvdimms data = json.loads(output) File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) -------------------------- vGPU Nvidia drivers are installed and Nvidia service is running. also, it is possible to see vGPU instances in the host, for example: # /home/nsimsolo/vgpu_instances1.sh mdev_type: nvidia-11 --- description: num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0B mdev_type: nvidia-12 --- description: num_heads=2, frl_config=60, framebuffer=512M, max_resolution=2560x1600, max_instance=16 --- name: GRID M60-0Q mdev_type: nvidia-13 --- description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=8 --- name: GRID M60-1A mdev_type: nvidia-14 --- description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=8 --- name: GRID M60-1B ---------------- This issue is not related to emulated machine type (issue occured on pc-i440fx and Q35) Version-Release number of selected component (if applicable): ovirt-engine-4.4.1.2-0.10.el8ev vdsm-4.40.19-1.el8ev.x86_64 libvirt-daemon-6.0.0-22.module+el8.2.1+6815+1c792dc8.x86_64 qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64 Nvidia host drivers (Tesla M60): NVIDIA-vGPU-rhel-8.2-450.36.01.x86_64 How reproducible: 100% Steps to Reproduce: 1. Browse Webadmin -> click on VM name -> host devices tab -> manage vGPU, select Nvidia instane and click "save" button. 2. Run VM 3. Actual results: VM failed to run Expected results: VM should run with attached vGPU device. Additional info: vdsm.log and engine.log attached