Bug 1943141

Summary: vGPU with SecureBoot and Nvidia enrolled key: NVRAM file got truncated after host crash.
Product: [oVirt] vdsm Reporter: Nisim Simsolo <nsimsolo>
Component: CoreAssignee: Tomáš Golembiovský <tgolembi>
Status: CLOSED CURRENTRELEASE QA Contact: Nisim Simsolo <nsimsolo>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.40.50.8CC: ahadas, bugs, nsimsolo
Target Milestone: ovirt-4.4.6Flags: pm-rhel: ovirt-4.4+
Target Release: 4.40.60.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.40.60.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-14 07:30:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages (Mar 21 12:34:31 lion01 kernel: Reserving 256MB of memory at 1680MB for crashkernel (System RAM: 262032MB)
none
vdsm.log
none
engine.log none

Description Nisim Simsolo 2021-03-25 13:06:30 UTC
Description of problem:
After a host crash, the NVRAM file got truncated which caused to VM running failure with the next vdsm.log ERROR:
2021-03-21T17:18:56.905464Z qemu-kvm: info: its size must be a non-zero multiple of 0x1000
2021-03-21 19:18:57,973+0200 INFO  (vm/9a0bced3) [virt.vm] (vmId='9a0bced3-3858-404d-9d4d-b57d693509ea') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2021-03-21T17:18:56.905325Z qemu-kvm: system firmware block device  has invalid size 0
-------------------------------------------
Setup used during this issue:
1. RHEL 8 VM UEFI SecureBoot, with vGPU installed and Nvidia public key enrolled.
2. Windows 10VM UEFI secure boot, with vGPU installed and Nvidia public key enrolled.
-------------------------------------------
After host crashed, the running failure occurred only on RHEL8 VM (maybe because I enrolled Nvidia key with mokutil).
-------------------------------------------
This issue was not reproduced with the next tests:
host reboot and cold restart
vdsm restarts
leaving the setup running with vGPU VMs for the night.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.5.9-0.1.el8ev
vdsm-4.40.50.8-1.el8ev.x86_64
qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64
libvirt-daemon-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64
host: NVIDIA-vGPU-rhel-8.3-460.32.04.x86_64
VM: NVIDIA-Linux-x86_64-460.56-grid

How reproducible:
hard to reproduce.

Steps to Reproduce:
1. Install secured boot RHEL8 VM with vGPU and enrolled Nvidia keys.
2. Install secured boot Windows 10 VM with vGPU and enrolled Nvidia keys.
3. host crashed.

Actual results:
NVRAM file is truncated after host crash and RHEL8 VM failed to run again.

Expected results:

Additional info:
vdsm.log, engine.log and /var/log/messages attached.

Comment 1 Nisim Simsolo 2021-03-25 13:18:37 UTC
Created attachment 1766304 [details]
/var/log/messages (Mar 21 12:34:31 lion01 kernel: Reserving 256MB of memory at 1680MB for crashkernel (System RAM: 262032MB)

Comment 2 Nisim Simsolo 2021-03-25 13:22:01 UTC
Created attachment 1766305 [details]
vdsm.log

Comment 3 Nisim Simsolo 2021-03-25 13:27:03 UTC
Created attachment 1766306 [details]
engine.log

Comment 5 Nisim Simsolo 2021-05-06 13:17:51 UTC
Verified (issue could not reproduced):
ovirt-engine-4.4.6.6-0.10.el8ev
vdsm-4.40.60.6-1.el8ev.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
libvirt-daemon-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
host Nvidia drivers: NVIDIA-vGPU-rhel-8.4-460.73.02.x86_64
VM Nvidia drivers(for Windows and Linux): GRID 12.0 GA