Bug 1943141 - vGPU with SecureBoot and Nvidia enrolled key: NVRAM file got truncated after host crash.
Summary: vGPU with SecureBoot and Nvidia enrolled key: NVRAM file got truncated after ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.50.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.4.6
: 4.40.60.3
Assignee: Tomáš Golembiovský
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-25 13:06 UTC by Nisim Simsolo
Modified: 2021-11-04 19:28 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.40.60.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-14 07:30:24 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
/var/log/messages (Mar 21 12:34:31 lion01 kernel: Reserving 256MB of memory at 1680MB for crashkernel (System RAM: 262032MB) (10.65 KB, application/x-xz)
2021-03-25 13:18 UTC, Nisim Simsolo
no flags Details
vdsm.log (701.94 KB, application/x-xz)
2021-03-25 13:22 UTC, Nisim Simsolo
no flags Details
engine.log (3.18 MB, application/gzip)
2021-03-25 13:27 UTC, Nisim Simsolo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 114015 0 master MERGED virt: optionally forbid reading of empty external data 2021-03-29 16:34:01 UTC

Description Nisim Simsolo 2021-03-25 13:06:30 UTC
Description of problem:
After a host crash, the NVRAM file got truncated which caused to VM running failure with the next vdsm.log ERROR:
2021-03-21T17:18:56.905464Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000
2021-03-21 19:18:57,973+0200 INFO  (vm/9a0bced3) [virt.vm] (vmId='9a0bced3-3858-404d-9d4d-b57d693509ea') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2021-03-21T17:18:56.905325Z qemu-kvm: system firmware block device  has invalid size 0
-------------------------------------------
Setup used during this issue:
1. RHEL 8 VM UEFI SecureBoot, with vGPU installed and Nvidia public key enrolled.
2. Windows 10VM UEFI secure boot, with vGPU installed and Nvidia public key enrolled.
-------------------------------------------
After host crashed, the running failure occurred only on RHEL8 VM (maybe because I enrolled Nvidia key with mokutil).
-------------------------------------------
This issue was not reproduced with the next tests:
host reboot and cold restart
vdsm restarts
leaving the setup running with vGPU VMs for the night.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.5.9-0.1.el8ev
vdsm-4.40.50.8-1.el8ev.x86_64
qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64
libvirt-daemon-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64
host: NVIDIA-vGPU-rhel-8.3-460.32.04.x86_64
VM: NVIDIA-Linux-x86_64-460.56-grid

How reproducible:
hard to reproduce.

Steps to Reproduce:
1. Install secured boot RHEL8 VM with vGPU and enrolled Nvidia keys.
2. Install secured boot Windows 10 VM with vGPU and enrolled Nvidia keys.
3. host crashed.

Actual results:
NVRAM file is truncated after host crash and RHEL8 VM failed to run again.

Expected results:

Additional info:
vdsm.log, engine.log and /var/log/messages attached.

Comment 1 Nisim Simsolo 2021-03-25 13:18:37 UTC
Created attachment 1766304 [details]
/var/log/messages (Mar 21 12:34:31 lion01 kernel: Reserving 256MB of memory at 1680MB for crashkernel (System RAM: 262032MB)

Comment 2 Nisim Simsolo 2021-03-25 13:22:01 UTC
Created attachment 1766305 [details]
vdsm.log

Comment 3 Nisim Simsolo 2021-03-25 13:27:03 UTC
Created attachment 1766306 [details]
engine.log

Comment 5 Nisim Simsolo 2021-05-06 13:17:51 UTC
Verified (issue could not reproduced):
ovirt-engine-4.4.6.6-0.10.el8ev
vdsm-4.40.60.6-1.el8ev.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
libvirt-daemon-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
host Nvidia drivers: NVIDIA-vGPU-rhel-8.4-460.73.02.x86_64
VM Nvidia drivers(for Windows and Linux): GRID 12.0 GA


Note You need to log in before you can comment on or make changes to this bug.