Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1622451

Summary: 【OVMF】guest kernel crash when do the GPU passthrough under OVMF
Product: Red Hat Enterprise Linux 7 Reporter: Michael <choma>
Component: ovmfAssignee: Laszlo Ersek <lersek>
Status: CLOSED NOTABUG QA Contact: FuXiangChun <xfu>
Severity: medium Docs Contact:
Priority: high    
Version: 7.6CC: alex.williamson, chayang, choma, jinzhao, juzhang, michen, zhguo
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-27 13:15:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The FULL boot log when passthrough the GPU under the OVMF. none

Description Michael 2018-08-27 08:45:23 UTC
Created attachment 1478893 [details]
The FULL boot log when passthrough the GPU under the OVMF.

Description of problem:
Guest kernel crash when do the GPU passthrough under OVMF. 

The VNC can not see the Guest thus I used console to connect the guest. The console can not connect the guest and the error as follow: 
"
!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000007D7A7522, CS  - 0000000000000038, RFLAGS - 0000000000010202
RAX  - 000000007D79D940, RCX - AFAFAFAF6C617470, RDX - 0000000000000004
RBX  - 000000007E000198, RSP - 000000007EF7F6E0, RBP - 0000000000000000
RSI  - 0000000000000011, RDI - 000000007DD80111
R8   - 000000007DD80F98, R9  - 0000000000000000, R10 - 000000007D79C8E0
R11  - 000000007DD80918, R12 - 0000000000000000, R13 - 0000000000000000
R14  - 0000000000000000, R15 - 0000000000000000
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007EC01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007EBEDA98 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007E16C018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007EF7F340
!!!! Find image based on IP(0x7D7A7522) (No PDB)  (ImageBase=000000007D79C000, EntryPoint=000000007D79E1E8) !!!!
"

Version-Release number of selected component (if applicable):
[1] kernel:3.10.0-938.el7.x86_64
[2] qemu-kvm-rhev-2.12.0-11.el7.x86_64
[3] OVMF-20180508-3.gitee3198e672e2.el7.noarch


How reproducible:
100%


Steps to Reproduce:
1.unbind the GPU card;

#lspci
05:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [Quadro K5000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)

# lspci -s 05:00.0 -n
05:00.0 0300: 10de:11ba (rev a1)
# lspci -s 05:00.1 -n
05:00.1 0403: 10de:0e0a (rev a1)

#modprobe vfio-pci
#modprobe vfio
#modprobe vfio_iommu_type1
#echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

#echo "10de 11ba" > /sys/bus/pci/drivers/vfio-pci/new_id
#echo 0000:05:00.0 > /sys/bus/pci/devices/0000\:05\:00.0/driver/unbind
#echo 0000:05:00.0 > /sys/bus/pci/drivers/vfio-pci/bind


#echo "10de 0e0a" > /sys/bus/pci/drivers/vfio-pci/new_id
#echo 0000:05:00.1 > /sys/bus/pci/devices/0000\:05\:00.1/driver/unbind
#echo 0000:05:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

# ls /sys/bus/pci/drivers/vfio-pci/
0000:05:00.0  0000:05:00.1  bind  module  new_id  remove_id  uevent  unbind


2.Boot the guest and add the GPU device. Make sure the guest can be connected by console. 

Add "console=tty0 console=ttyS0,11520" to guest kernel line
Add "-serial unix:/tmp/console,server,nowait" to qemu command line. 

#/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults \ 
-smp 4,cores=2,threads=2,sockets=1 -m 4G -name vm1  \
-global driver=cfi.pflash01,property=secure,value=on -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on \
-boot menu=on,splash-time=12000  -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 \ -monitor stdio \
-drive file=/home/choma/rhel-ovmf-7.6-choma/rhel-ovmf-7.6-choma.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=guest-img,id=os-disk,bootindex=1  \
-vnc :1 -vga qxl \
-debugcon file:/home/test/ovmf.log -global isa-debugcon.iobase=0x402 \ 
-netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0 \ 
-device vfio-pci,host=05:00.0,id=GPU1,addr=05.0 \
-serial unix:/tmp/console,server,nowait


3.connect the console
(in host) #nc -U /tmp/console  (Maybe need to wait 10 second)


Actual results:
The guest is crash 

Expected results:
The guest can boot successfuly and the GPU device can be seen inside the guest.

Additional info:
The issus only happen in GPU device and under the OVMF. If using the seabios do the GPU passthrough, the guest is work well. If under the OVMF using NIC device instead of the GPU, the guest also can work well. 

The FUll boot log is in attachment.

Comment 2 Laszlo Ersek 2018-08-27 09:40:32 UTC
(+Alex)

Thank you Michael for diligently capturing the serial port log and the QEMU debug port log. In combination, these tell me that the crash doesn't occur in the guest kernel but in the guest firmware (OVMF), and more closely in the UEFI driver that comes from the assigned device's option ROM. Here's the log from when the driver is loaded:

> [Security] 3rd party image[7E001C28] can be loaded after EndOfDxe: PciRoot(0x0)/Pci(0x5,0x0)/Offset(0xF200,0x1EFFF).
> InstallProtocolInterface: [EfiLoadedImageProtocol] 7E000340
> Loading driver at 0x0007D79C000 EntryPoint=0x0007D79E1E8 
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 7E109C18
> ProtectUefiImageCommon - 0x7E000340
>   - 0x000000007D79C000 - 0x000000000001CDE0
> InstallProtocolInterface: [EfiDriverBindingProtocol] 7D79C380
> InstallProtocolInterface: [EfiComponentName2Protocol] 7D79C638
> InstallProtocolInterface: [EfiDriverSupportedEfiVersionProtocol] 7D79C650

Note "Loading driver at 0x0007D79C000 EntryPoint=0x0007D79E1E8", and compare:

> !!!! Find image based on IP(0x7D7A7522) (No PDB)  (ImageBase=000000007D79C000, EntryPoint=000000007D79E1E8) !!!!

So this looks like a bug in the GPU driver.

What happens if you disable the option ROM? Please try to modify the QEMU command line as follows:

  -device vfio-pci,host=05:00.0,id=GPU1,addr=05.0,romfile=''
                                                  ^^^^^^^^^^

Thanks.

Comment 3 Michael 2018-08-27 12:05:05 UTC
Hi Laszlo Ersek:

Thank you for your information. 

First of all, I reproduced this Bug. Then, I disabled the option ROM as you said. The issue is gone and the Bug is verified. 



Also I can support another information. 

In this Bug, I unbind the GPU [Quadro K5000]. If I change the type to [Quadro P2000] and repeat the same step, the guest can work well. Thus, maybe the new type fix the problem. 


Thanks.

Comment 4 Laszlo Ersek 2018-08-27 13:15:19 UTC
OK, so this is NOTABUG for OVMF; I'm updating the status accordingly.

Comment 6 FuXiangChun 2018-08-28 01:43:40 UTC
Alex, 

Please take a look at this issue,  Is it GPU driver or VFIO's bug? Thanks.

Comment 7 Alex Williamson 2018-09-02 20:32:45 UTC
Sounds like a bad option ROM to me with a clear workaround to disable the on-card ROM.  Look for a firmware update or be satisfied that it works on Pascal based cards, Kepler cards are old at this point.  VFIO only supplies to on-card ROM to the VM, and IME cards that manage to fault OVMF will generally fault on bare metal too and rely on the CSM/Legacy support of firmware to use the legacy image rather than the UEFI image.  I don't see that there's anything to do in vfio here.