Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1626053

Summary:

q35+OVMF guest which boot with "-device vfio-pci,host=04:00.0,id=GPU1,addr=05.0 \" and NVIDIA gpu can't display screen after installing NVIDIA driver

Product:

Red Hat Enterprise Linux 7

Reporter:

liunana <nanliu>

Component:

qemu-kvm-rhev

Assignee:

Alex Williamson <alex.williamson>

Status:

CLOSED NOTABUG

QA Contact:

Guo, Zhiyi <zhguo>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

7.6

CC:

alex.williamson, chayang, jinzhao, juzhang, lersek, michen, nanliu, virt-maint, zhguo

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-21 17:31:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
journalctl log	none
log of q35 + seabios	none
log of q35 + ovmf	none
dmesg log	none
Xorg.0.log	none
ovmf debug log	none

Description liunana 2018-09-06 13:14:43 UTC

Description of problem:
q35+OVMF guest which boot with "-device vfio-pci,host=04:00.0,id=GPU1,addr=05.0 \" and NVIDIA gpu can't display screen after installing NVIDIA driver, and guest booting with -machine pc and booting with -machine q35 both are fine. 

Version-Release number of selected component (if applicable):
host:
#uname -r 
3.10.0-938.el7.x86_64

# rpm -qa | grep qemu
qemu-kvm-common-rhev-2.12.0-11.el7.x86_64
libvirt-daemon-driver-qemu-4.5.0-7.el7.x86_64
qemu-kvm-rhev-debuginfo-2.12.0-11.el7.x86_64
qemu-img-rhev-2.12.0-11.el7.x86_64
qemu-kvm-tools-rhev-2.12.0-11.el7.x86_64
qemu-kvm-rhev-2.12.0-11.el7.x86_64

OVMF-20180508-3.gitee3198e672e2.el7.noarch.rpm

guest:
# rpm -qa | grep kernel
kernel-devel-3.10.0-940.el7.x86_64
kernel-tools-libs-3.10.0-940.el7.x86_64
kernel-3.10.0-940.el7.x86_64
kernel-headers-3.10.0-940.el7.x86_64
kernel-tools-3.10.0-940.el7.x86_64

# lspci | grep VGA
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:05.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)


How reproducible:100%


Steps to Reproduce:
1. boot guest with command [1],enable screen sharing.
2. boot guest with command [2],runlevel = 3 
3. install NVIDIA driver with steps below :
   Edit /etc/default/grub. Append the following  to “GRUB_CMDLINE_LINUX”
     rd.driver.blacklist=nouveau nouveau.modeset=0

    Generate a new grub configuration to include the above changes.
         grub2-mkconfig -o /boot/grub2/grub.cfg 
     or  grub2-mkconfig -o /boot/grub2/grubenv
    
    Edit/create /etc/modprobe.d/blacklist.conf and append:
      blacklist nouveau

    reboot guest and download NVIDIA driver :
    wget http://us.download.nvidia.com/XFree86/Linux-x86_64/390.87/NVIDIA-Linux-x86_64-390.87.run
    sh NVIDIA-Linux-x86_64-390.87.run

    reboot guest, and download edid and configure /etc/Xll/xorg.conf
    cd /etc/X11
     wget http://10.66.9.237/gpu-testing/edid/edid-p2415q.bin
    cat /etc/X11/xorg.conf
      output [3]

    reboot guest,and remote-viewer guest by screen sharing.

Actual results: can't connect display by screen sharing and /var/log/Xorg.0.log has failed logs [4] and screen is booting with end line output:
  Started Update UTMP about System Runlevel changes...cates......d or progress polling....


Expected results: guest boot normally and can connect screen by screen sharing


Additional info:
[1]
/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults \
-smp 2,cores=2,threads=1,sockets=1 -m 4G -name gpu  \
-global driver=cfi.pflash01,property=secure,value=on -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on \
-boot menu=on,splash-time=12000 -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 \
-monitor stdio \
-drive file=/home/4-92-ovmf/rhel76.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=guest-img,id=os-disk,bootindex=1 \
-vnc :2 \
-vga qxl \
-debugcon file:/home/test/ovmf2.log \
-global isa-debugcon.iobase=0x402 \
-netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=48:0f:cf:61:f9:89 \
-serial unix:/tmp/console,server,nowait \


[2]
/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults \
-smp 2,cores=2,threads=1,sockets=1 -m 4G -name gpu  \
-global driver=cfi.pflash01,property=secure,value=on -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on \
-boot menu=on,splash-time=12000 -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 \
-monitor stdio \
-drive file=/home/4-92-ovmf/rhel76.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=guest-img,id=os-disk,bootindex=1 \
-vnc :2 \
-vga qxl \
-debugcon file:/home/test/ovmf2.log \
-global isa-debugcon.iobase=0x402 \
-netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=48:0f:cf:61:f9:89 \
-serial unix:/tmp/console,server,nowait \
-device vfio-pci,host=04:00.0,id=GPU1,addr=05.0 \


[3]
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 390.87  (buildmeister@swio-display-x64-rhel04-14)  Tue Aug 21 17:33:38 PDT 2018

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
    FontPath        "/usr/share/fonts/default/Type1"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/input/mice"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:0:5:0"  # Sample: "PCI:0:2:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Option         "UseDisplayDevice" "DFP-0"
    Option         "ConnectedMonitor" "DFP-0"
    Option         "CustomEDID" "DFP-0:/etc/X11/edid-p2415q.bin"
    Option         "Coolbits" "5"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection



[4]
[    16.783] (II) NVIDIA(0): Validated MetaModes:
[    16.783] (II) NVIDIA(0):     "DFP-0:nvidia-auto-select"
[    16.783] (II) NVIDIA(0): Virtual screen size determined to be 3840 x 2160
[    16.790] (--) NVIDIA(0): DPI set to (184, 182); computed from "UseEdidDpi" X config
[    16.790] (--) NVIDIA(0):     option
[    16.792] (II) NVIDIA: Using 24576.00 MB of virtual memory for indirect memory
[    16.792] (II) NVIDIA:     access.
[    16.810] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[    16.810] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[    16.810] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[    16.810] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[    16.810] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[    16.810] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[    16.810] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[    16.810] (II) NVIDIA(0):     Config Options in the README.
[    16.813] (EE) NVIDIA(0): Failed to allocate software rendering cache surface: out of
[    16.813] (EE) NVIDIA(0):     memory.
[    16.813] (EE) NVIDIA(0):  *** Aborting ***
[    16.824] (EE)
Fatal server error:
[    16.824] (EE) NVIDIA: A GPU exception occurred during X server initialization(EE)
[    16.824] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[    16.824] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    16.824] (EE)
[    16.933] (EE) Server terminated with error (1). Closing log file.

Comment 2 liunana 2018-09-06 13:17:03 UTC

Created attachment 1481301 [details]
journalctl log

Comment 3 Guo, Zhiyi 2018-09-06 13:46:07 UTC

Hi Laszlo,

   Could you help to look at this issue?

BR/
Zhiyi, Guo

Comment 4 Laszlo Ersek 2018-09-06 14:50:37 UTC

(1) I don't have the slightest idea what "screen sharing" means, or how it
is configured in the RHEL guest.

(2) As far as I understand, the proprietary NVIDIA graphics driver works in
the guest if we use i440fx (with SeaBIOS only), or if we use q35 with
SeaBIOS. It doesn't work if we use q35 with OVMF. Is my understanding
correct?

If so:

- Do we officially support the proprietary NVIDIA graphics driver with
assigned GPUs? (The guest dmesg from comment 2 reports tainting, so I
kinda doubt it -- we provide a kABI guarantee so external modules will
continue loading, but we don't take responsibility for what they do.)

- If we do support this setup, then let's ask NVIDIA for help. Their
proprietary driver is proprietary. We can't see what their code does, and
why the guest firmware could be a problem.

NVIDIA GPU assignment works reliably for me in Windows guests, but I've
never tried with Linux guests.

Googling the error message "Failed to allocate software rendering cache
surface" leads to
<https://devtalk.nvidia.com/default/topic/691565/linux/geforce-driver-problem-on-centos-6-4-with-xen-installed/post/4278494/#4278494>,
which is eminently unhelpful.

I suggest capturing the Linux guest dmesg for both Q35+SeaBIOS and Q35+OVMF,
and comparing them. Same for the X.org logfile. I vaguely suspect that the
guest memory map (which is different between SeaBIOS and OVMF) somehow
affects the allocation of the "software rendering cache surface", in the
proprietary NVIDIA guest driver. But that's really just a guess.

(

I also have a side comment, about the QEMU command lines in comment 0: the
option

-drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on

is bogus. I see that you guys at least added "readonly=on", so the guest at
least wouldn't modify the system-wide "OVMF_VARS.fd" file. Nevertheless,
regarding the UEFI runtime variable services, this option remains broken.

You *really* have to create a writeable copy of "OVMF_VARS.fd" that is
private to the domain, and use that. In this particular scenario, you
(re)boot the domain four times, and the modifications to non-volatile UEFI
variables survive none of that.

I don't know why this is so hard to implement; I shouldn't have to explain
it so frequently. Libvirt handles this automatically BTW; maybe use libvirt
for testing?

Furthermore, it's irrelevant whether the incorrect option plays any role in
the symptom presently seen. QEMU should not be launched with known broken
options, for testing any feature.

)

Thanks.

Comment 5 Laszlo Ersek 2018-09-06 14:51:36 UTC

Ugh, I meant to clear needinfo.

Comment 7 liunana 2018-09-07 09:08:08 UTC

According to  Laszlo Ersek's suggest ,I can still reproduce it by simple steps below,meanwhile I attached dmesg log of q35+seabios and q35+ovmf to the attachments:

1. boot guest with Commands:
/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults \
-smp 2,cores=2,threads=1,sockets=1 -m 4G -name gpu  \
-global driver=cfi.pflash01,property=secure,value=on -drive file=/home/1-ovmf/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0 -drive file=/home/1-ovmf/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-boot menu=on,splash-time=12000 -drive file=/home/1-ovmf/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 \
-monitor stdio \
-drive file=/home/1-ovmf/rhel76.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=guest-img,id=os-disk,bootindex=1 \
-vnc :2 \
-vga qxl \
-debugcon file:/home/test/ovmf0.log \
-global isa-debugcon.iobase=0x402 \
-netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=48:0f:cf:61:f9:82 \
-serial unix:/tmp/console,server,nowait \
-device vfio-pci,host=04:00.0,id=GPU1,addr=05.0 \


2. install NVIDIA driver and runlevel 5


3. 
Then guest has booting screen and we can get error logs from /var/log/Xorg.0.log:

[    16.668] (II) NVIDIA: Using 24576.00 MB of virtual memory for indirect memory
[    16.668] (II) NVIDIA:     access.
[    16.699] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[    16.699] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[    16.699] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[    16.699] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[    16.699] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[    16.699] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[    16.699] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[    16.699] (II) NVIDIA(0):     Config Options in the README.
[    16.702] (EE) NVIDIA(0): Failed to allocate software rendering cache surface: out of
[    16.702] (EE) NVIDIA(0):     memory.
[    16.702] (EE) NVIDIA(0):  *** Aborting ***
[    16.719] (EE)
Fatal server error:
[    16.719] (EE) NVIDIA: A GPU exception occurred during X server initialization(EE)

Comment 8 liunana 2018-09-07 09:09:12 UTC

Created attachment 1481532 [details]
log of q35 + seabios

Comment 9 liunana 2018-09-07 09:09:54 UTC

Created attachment 1481534 [details]
log of q35 + ovmf

Comment 10 liunana 2018-09-07 09:57:33 UTC

(In reply to Laszlo Ersek from comment #4)
> (1) I don't have the slightest idea what "screen sharing" means, or how it
>     is configured in the RHEL guest.


Sorry for my unclear info,It's a gnome component,now this bug we can ignore it on windows guest. 





> (2) As far as I understand, the proprietary NVIDIA graphics driver works in
>     the guest if we use i440fx (with SeaBIOS only), or if we use q35 with
>     SeaBIOS. It doesn't work if we use q35 with OVMF. Is my understanding
>     correct?


Yes,It can be reproduce with q35+OVMF ,and screen with NVIDIA GPU works well with q35+seabios.




> - Do we officially support the proprietary NVIDIA graphics driver with
>   assigned GPUs? (The guest dmesg from comment 2 reports tainting, so I
>   kinda doubt it -- we provide a kABI guarantee so external modules will
>   continue loading, but we don't take responsibility for what they do.)
> 
> - If we do support this setup, then let's ask NVIDIA for help. Their
>   proprietary driver is proprietary. We can't see what their code does, and
>   why the guest firmware could be a problem.
> 

Maybe this can consult Alex,he is clearer about this.




> Googling the error message "Failed to allocate software rendering cache
> surface" leads to
> <https://devtalk.nvidia.com/default/topic/691565/linux/geforce-driver-
> problem-on-centos-6-4-with-xen-installed/post/4278494/#4278494>,
> which is eminently unhelpful.
> 

Yeah ,thanks for that,I also do some research myself all the time, and havn't find some correct solution for now.





> I suggest capturing the Linux guest dmesg for both Q35+SeaBIOS and Q35+OVMF,
> and comparing them. Same for the X.org logfile. I vaguely suspect that the
> guest memory map (which is different between SeaBIOS and OVMF) somehow
> affects the allocation of the "software rendering cache surface", in the
> proprietary NVIDIA guest driver. But that's really just a guess.


Two dmesg logs have been attached to the attachments,for I can't tell the guest memory map correctly,please help to check this,thanks.




> (
>   -drive
> file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on
> 
> is bogus. I see that you guys at least added "readonly=on", so the guest at
> least wouldn't modify the system-wide "OVMF_VARS.fd" file. Nevertheless,
> regarding the UEFI runtime variable services, this option remains broken.


Sorry for that,I have retest bug with the copied OVMF file option commands.


And If there need any information or log of this bug,please tell me.

Comment 11 Laszlo Ersek 2018-09-07 11:48:57 UTC

The message "using 24576.00 MB of virtual memory for indirect memory access"
suggests, I guess, that the driver attempts to mmap a 24GB MMIO BAR?

Hmm... from comment 8 (SeaBIOS):

grep 0000:00:05.0 log-q35+seabios.doc

> pci 0000:00:05.0: [10de:1c30] type 00 class 0x030000
> pci 0000:00:05.0: reg 0x10: [mem 0xfc000000-0xfcffffff]
> pci 0000:00:05.0: reg 0x14: [mem 0xe0000000-0xefffffff 64bit pref]
> pci 0000:00:05.0: reg 0x1c: [mem 0xf0000000-0xf1ffffff 64bit pref]
> pci 0000:00:05.0: reg 0x24: [io  0xc000-0xc07f]
> pci 0000:00:05.0: reg 0x30: [mem 0xfd000000-0xfd07ffff pref]
> vgaarb: device added:
>         PCI:0000:00:05.0,decodes=io+mem,owns=io+mem,locks=none
> vgaarb: bridge control possible 0000:00:05.0
> vgaarb: device changed decodes:
>         PCI:0000:00:05.0,olddecodes=io+mem,decodes=none:owns=io+mem
> vgaarb: transferring owner from PCI:0000:00:05.0 to PCI:0000:00:01.0
> [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:05.0 on minor 1
> nvidia 0000:00:05.0: irq 32 for MSI/MSI-X

from comment 9 (OVMF):

> pci 0000:00:05.0: [10de:1c30] type 00 class 0x030000
> pci 0000:00:05.0: reg 0x10: [mem 0x98000000-0x98ffffff]
> pci 0000:00:05.0: reg 0x14: [mem 0x800000000-0x80fffffff 64bit pref]
> pci 0000:00:05.0: reg 0x1c: [mem 0x810000000-0x811ffffff 64bit pref]
> pci 0000:00:05.0: reg 0x24: [io  0x6000-0x607f]
> pci 0000:00:05.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
> vgaarb: device added:
>         PCI:0000:00:05.0,decodes=io+mem,owns=io+mem,locks=none
> vgaarb: bridge control possible 0000:00:05.0
> pci 0000:00:05.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no
>                   compatible bridge window
> pci 0000:00:05.0: BAR 6: assigned [mem 0x99080000-0x990fffff pref]
> vgaarb: device changed decodes:
>         PCI:0000:00:05.0,olddecodes=io+mem,decodes=none:owns=io+mem
> vgaarb: transferring owner from PCI:0000:00:05.0 to PCI:0000:00:01.0
> [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:05.0 on minor 1
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X

BAR 6 (at header offset 0x30, size 512KB) is the ROM BAR, so I don't think
it matters. The other MMIO BARs (16MB, 256MB, 32MB) seem to be assigned
successfully, and they are nowhere near 24GB.

Furthermore, the differences seem to be:

- SeaBIOS maps the 64-bit MMIO BARs under 4GB,

- SeaBIOS maps the ROM BAR too (the guest kernel doesn't have to patch it
  up),

- different IRQ (likely irrelevant -- grepping the logs for IRQ assignments,
  e.g. with "grep -E 'irq [0-9]+ for MSI/MSI-X$'", there are some
  differences; for example the OVMF VM seems to have two AHCI controllers)

- different number of IRQ-related messages (possibly triggered by multiple
  X.org startup attempts?)

Comment 12 Laszlo Ersek 2018-09-07 13:43:41 UTC

I installed a new RHEL-7.6 (20180907.0-Server-x86_64) guest on my
workstation, with my GTX750 assigned. The proprietary nvidia driver's
installation succeeded (390.87), and I've got a 1920x1200 physical monitor
working fine. For this I used a Fedora 27 host (4.17.14-102.fc27.x86_64) and
an upstream OVMF build (@ 98257f982072).

I didn't use "CustomEDID". The only thing I had to add to the nvidia-xconfig
output was "BusID", under the "Device" section.

[   363.303] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[   363.303] (==) NVIDIA(0): RGB weight 888
[   363.303] (==) NVIDIA(0): Default visual is TrueColor
[   363.303] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[   363.303] (**) NVIDIA(0): Enabling 2D acceleration
[   364.099] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:2:0:0
[   364.099] (--) NVIDIA(0):     CRT-0
[   364.099] (--) NVIDIA(0):     DFP-0 (boot)
[   364.099] (--) NVIDIA(0):     DFP-1
[   364.100] (--) NVIDIA(0):     DFP-2
[   364.100] (--) NVIDIA(0):     DFP-3
[   364.109] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 750 (GM107-A) at PCI:2:0:0 (GPU-0)
[   364.109] (--) NVIDIA(0): Memory: 1048576 kBytes
[   364.109] (--) NVIDIA(0): VideoBIOS: 82.07.25.00.50
[   364.109] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[   364.123] (--) NVIDIA(GPU-0): CRT-0: disconnected
[   364.123] (--) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[   364.123] (--) NVIDIA(GPU-0):
[   364.151] (--) NVIDIA(GPU-0): Idek Iiyama PLE2400WS (DFP-0): connected
[   364.151] (--) NVIDIA(GPU-0): Idek Iiyama PLE2400WS (DFP-0): Internal TMDS
[   364.151] (--) NVIDIA(GPU-0): Idek Iiyama PLE2400WS (DFP-0): 340.0 MHz maximum pixel clock
[   364.151] (--) NVIDIA(GPU-0):
[   364.152] (--) NVIDIA(GPU-0): DFP-1: disconnected
[   364.152] (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
[   364.152] (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
[   364.152] (--) NVIDIA(GPU-0):
[   364.152] (--) NVIDIA(GPU-0): DFP-2: disconnected
[   364.152] (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
[   364.152] (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
[   364.152] (--) NVIDIA(GPU-0):
[   364.152] (--) NVIDIA(GPU-0): DFP-3: disconnected
[   364.152] (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
[   364.152] (--) NVIDIA(GPU-0): DFP-3: 960.0 MHz maximum pixel clock
[   364.152] (--) NVIDIA(GPU-0):
[   364.163] (==) NVIDIA(0):
[   364.163] (==) NVIDIA(0): No modes were requested; the default mode "nvidia-auto-select"
[   364.163] (==) NVIDIA(0):     will be used as the requested mode.
[   364.163] (==) NVIDIA(0):
[   364.164] (II) NVIDIA(0): Validated MetaModes:
[   364.164] (II) NVIDIA(0):     "DFP-0:nvidia-auto-select"
[   364.164] (II) NVIDIA(0): Virtual screen size determined to be 1920 x 1200
[   364.183] (--) NVIDIA(0): DPI set to (93, 92); computed from "UseEdidDpi" X config
[   364.183] (--) NVIDIA(0):     option
[   364.187] (II) NVIDIA: Using 6144.00 MB of virtual memory for indirect memory
[   364.187] (II) NVIDIA:     access.
[   364.199] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[   364.199] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[   364.199] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[   364.199] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[   364.199] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[   364.199] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[   364.199] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[   364.199] (II) NVIDIA(0):     Config Options in the README.
[   364.235] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select"
[   364.296] (==) NVIDIA(0): Disabling shared memory pixmaps
[   364.296] (==) NVIDIA(0): Backing store enabled
[   364.296] (==) NVIDIA(0): Silken mouse enabled
[   364.296] (**) NVIDIA(0): DPMS enabled
[   364.297] (II) Loading sub module "dri2"
[   364.297] (II) LoadModule: "dri2"
[   364.297] (II) Module "dri2" already built-in
[   364.297] (II) NVIDIA(0): [DRI2] Setup complete
[   364.297] (II) NVIDIA(0): [DRI2]   VDPAU driver: nvidia

Comment 13 Laszlo Ersek 2018-09-07 13:48:05 UTC

(I did specify:

    <kvm>
      <hidden state='on'/>
    </kvm>

which is in fact necessary in my setup.)

Comment 14 Laszlo Ersek 2018-09-07 13:54:14 UTC

The error message "Failed to allocate software rendering cache surface" comes from "/usr/lib64/xorg/modules/drivers/nvidia_drv.so", according to the "strings" utility.

Comment 15 Laszlo Ersek 2018-09-07 13:58:16 UTC

I notice that in QE's case, the pixel count (3840 x 2160) is approx. four times my pixel count (1920 x 1200). The same ratio is observable between the virtual memory sizes used for "indirect memory access" (24576 MB vs. 6144 MB).

Comment 16 Laszlo Ersek 2018-09-07 14:18:19 UTC

After looking at the guest kernel dmesg (in the OVMF case) in comment 9, I
see six runs of the following:

> nvidia 0000:00:05.0: irq 31 for MSI/MSI-X
> NVRM: GPU at PCI:0000:00:05: GPU-c2d1e1a5-f729-f869-21cf-f94397570eac
> NVRM: GPU Board Serial Number: 0324016107690
> NVRM: Xid (PCI:0000:00:05): 31, Ch 00000008, engmask 00000101, intr 10000000

According to <https://docs.nvidia.com/deploy/xid-errors/index.html>, XID 31
means "GPU memory page fault", marked as both "Driver Error" and "User App
Error". More details are provided at
<https://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_2>:

> XID 31: Fifo: MMU Error
>
> This event is logged when a fault is reported by the MMU, such as when an
> illegal address access is made by an applicable unit on the chip Typically
> these are application-level bugs, but can also be driver bugs or hardware
> bugs.
>
> When this event is logged, NVIDIA recommends the following:
>
>     Run the application in cuda-gdb or cuda-memcheck , or
>     Run the application with CUDA_DEVICE_WAITS_ON_EXCEPTION=1 and then
>     attach later with cuda-gdb, or
>     File a bug if the previous two come back inconclusive to eliminate
>     potential NVIDIA driver or hardware bug.
>
> Note: The cuda-memcheck tool instruments the running application and
> reports which line of code performed the illegal read.

Does this card+driver combination work on a physical UEFI install?

Comment 17 Alex Williamson 2018-09-07 15:50:08 UTC

Can this be reproduced without the step of using a custom EDID AND with a physical monitor connected?  We generally recommend QE to use either a physically connected monitor or dummy EDID plug rather than a software workaround for EDID.

Comment 19 Alex Williamson 2018-09-07 20:45:52 UTC

(In reply to Alex Williamson from comment #17)
> Can this be reproduced without the step of using a custom EDID AND with a
> physical monitor connected?  We generally recommend QE to use either a
> physically connected monitor or dummy EDID plug rather than a software
> workaround for EDID.

Another experiment perhaps more related to OVMF vs SeaBIOS, does the problem disappear if the option ",rombar=0" is added to the vfio-pci device?

Comment 20 liunana 2018-09-14 05:41:28 UTC

(In reply to Alex Williamson from comment #19)
> (In reply to Alex Williamson from comment #17)
> > Can this be reproduced without the step of using a custom EDID AND with a
> > physical monitor connected?  We generally recommend QE to use either a
> > physically connected monitor or dummy EDID plug rather than a software
> > workaround for EDID.
> 
Yes,It still can reproduce it without using a custom EDID.But I can't be sure that whether the physical monitor is found by driver.
So if there is no log like "can't find X screen" in /var/log/Xorg.0.log ,can we be sure that driver find the physical monitor ? 



> Another experiment perhaps more related to OVMF vs SeaBIOS, does the problem
> disappear if the option ",rombar=0" is added to the vfio-pci device?

No,the problem doesn't disappear and It seems no different.

Comment 21 Alex Williamson 2018-09-14 23:05:49 UTC

(In reply to liunana from comment #20)
> (In reply to Alex Williamson from comment #19)
> > (In reply to Alex Williamson from comment #17)
> > > Can this be reproduced without the step of using a custom EDID AND with a
> > > physical monitor connected?  We generally recommend QE to use either a
> > > physically connected monitor or dummy EDID plug rather than a software
> > > workaround for EDID.
> > 
> Yes,It still can reproduce it without using a custom EDID.But I can't be
> sure that whether the physical monitor is found by driver.
> So if there is no log like "can't find X screen" in /var/log/Xorg.0.log ,can
> we be sure that driver find the physical monitor ? 

Please provide the Xorg log file of this configuration.  The monitor probing should be evident in the log unless the error occurs before that probing.  Additionally, the error is:

"Failed to allocate software rendering cache surface: out of memory"

Does increasing the VM memory size change anything?  We don't really know where this allocation draws from, but a 4G is rather minimal.


> > Another experiment perhaps more related to OVMF vs SeaBIOS, does the problem
> > disappear if the option ",rombar=0" is added to the vfio-pci device?
> 
> No,the problem doesn't disappear and It seems no different.

Ok.

Some notes:

The host bridge apertures are different between SeaBIOS and OVMF:

SeaBIOS:
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [mem 0x280000000-0xa7fffffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]

64bit window: 32G
32bit window: 0.98G

OVMF:
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x90000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [mem 0x800000000-0x97fffffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]

64bit window: 6G
32bit window: 1.73G

SeaBIOS device mappings:
pci 0000:00:05.0: [10de:1c30] type 00 class 0x030000
pci 0000:00:05.0: reg 0x10: [mem 0xfc000000-0xfcffffff]
pci 0000:00:05.0: reg 0x14: [mem 0xe0000000-0xefffffff 64bit pref]
pci 0000:00:05.0: reg 0x1c: [mem 0xf0000000-0xf1ffffff 64bit pref]
pci 0000:00:05.0: reg 0x24: [io  0xc000-0xc07f]
pci 0000:00:05.0: reg 0x30: [mem 0xfd000000-0xfd07ffff pref]

Note that only the 32bit window is used.

OVMF:
pci 0000:00:05.0: [10de:1c30] type 00 class 0x030000
pci 0000:00:05.0: reg 0x10: [mem 0x98000000-0x98ffffff]
pci 0000:00:05.0: reg 0x14: [mem 0x800000000-0x80fffffff 64bit pref]
pci 0000:00:05.0: reg 0x1c: [mem 0x810000000-0x811ffffff 64bit pref]
pci 0000:00:05.0: reg 0x24: [io  0x6000-0x607f]
pci 0000:00:05.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]

Note OVMF did use the 64bit window, but didn't program the option ROM.  Linux fixed it later:

pci 0000:00:05.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
pci 0000:00:05.0: BAR 6: assigned [mem 0x99080000-0x990fffff pref]

Comment 22 Laszlo Ersek 2018-09-16 08:49:46 UTC

(In reply to Alex Williamson from comment #21)

> "Failed to allocate software rendering cache surface: out of memory"
>
> Does increasing the VM memory size change anything?  We don't really know
> where this allocation draws from, but a 4G is rather minimal.

Good point! I assumed the memory sizes logged by "nvidia_drv.so" were
allocated from MMIO resources owned by the card.

However, that can't be the case. Consulting wikipedia:

- According to [1], the Quadro P2000 (GP106GL) has 5GB of on-board memory.
  That could never satisfy the 24GB allocation reported in comment 0.
  However, the card does work under SeaBIOS, with the same resolution,
  apparently. (Note we don't have a QEMU command line using SeaBIOS in this
  BZ thus far, so we can't be sure whether *that* VM was launched with only
  4GB of guest RAM as well, or with significantly more.)

- According to [2], my GTX750 can't have more than 4GB on-board memory. (I
  can't check my card or my VM configuration right now.) The 6GB logged in
  comment 12 clearly doesn't fit in that. Nonetheless, the GTX750 works with
  OVMF just fine for me. AFAIR, my VM has at least 8GB RAM though.

[1] https://en.wikipedia.org/wiki/Nvidia_Quadro#Quadro
[2] https://en.wikipedia.org/wiki/GeForce_700_series#GeForce_700_(7xx)_series

Comment 23 Laszlo Ersek 2018-09-16 14:35:20 UTC

I have to correct myself -- my VM in question only has 4GB or RAM, and no swap. I don't know where the 6GB seen in the X server log comes from (unless it's a memory overcommit trick of Linux).

Comment 24 liunana 2018-09-17 06:03:15 UTC

Created attachment 1483876 [details]
dmesg log

Comment 25 liunana 2018-09-17 06:03:49 UTC

Created attachment 1483877 [details]
Xorg.0.log

Comment 26 liunana 2018-09-17 06:06:13 UTC

(In reply to Alex Williamson from comment #21)
> (In reply to liunana from comment #20)
> > (In reply to Alex Williamson from comment #19)
> > > (In reply to Alex Williamson from comment #17)
> > > > Can this be reproduced without the step of using a custom EDID AND with a
> > > > physical monitor connected?  We generally recommend QE to use either a
> > > > physically connected monitor or dummy EDID plug rather than a software
> > > > workaround for EDID.
> > > 
> > Yes,It still can reproduce it without using a custom EDID.But I can't be
> > sure that whether the physical monitor is found by driver.
> > So if there is no log like "can't find X screen" in /var/log/Xorg.0.log ,can
> > we be sure that driver find the physical monitor ? 
> 
> Please provide the Xorg log file of this configuration.  The monitor probing
> should be evident in the log unless the error occurs before that probing. 
> Additionally, the error is:
> 
This time I boot guest with -m 8G and -device vfio-pci,host=04:00.0,id=GPU1,addr=05.0,rombar=0,It seems no difference...

The log of Xorg.0.doc and dmesg.doc have been attached to attachments,please check that.

Comment 27 Laszlo Ersek 2018-09-18 09:59:51 UTC

Please capture the OVMF debug log as well (in the failing case) and attach it to the BZ. You can find the instructions in "/usr/share/doc/OVMF/README". Thanks.

Comment 28 liunana 2018-09-18 11:57:10 UTC

Created attachment 1484340 [details]
ovmf debug log

Comment 29 liunana 2018-09-18 12:00:05 UTC

(In reply to Laszlo Ersek from comment #27)
> Please capture the OVMF debug log as well (in the failing case) and attach
> it to the BZ. You can find the instructions in "/usr/share/doc/OVMF/README".
> Thanks.

:-）

Please check attachment of "ovmf debug log".

Comment 30 Laszlo Ersek 2018-09-18 23:39:42 UTC

(click "unwrap comments" near the top)

Thanks!

Comment 28 looks inconsistent with comment 9.

From comment 9:

> [    0.571313] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
> ...
> [    0.574518] pci_bus 0000:00: root bus resource [mem 0x800000000-0x97fffffff window]

As pointed out by Alex in comment 21, this is a 6GB aperture.

From comment 28 however:

> PciBus: Resource Map for Root Bridge PciRoot(0x0)
> ...
> Type =  Mem64; Base = 0x800000000;      Length = 0x12200000;    Alignment = 0xFFFFFFF
>    Base = 0x800000000;  Length = 0x10000000;    Alignment = 0xFFFFFFF;  Owner = PCI [00|05|00:14]; Type = PMem64
>    Base = 0x810000000;  Length = 0x2000000;     Alignment = 0x1FFFFFF;  Owner = PCI [00|05|00:1C]; Type = PMem64
>    Base = 0x812000000;  Length = 0x100000;      Alignment = 0xFFFFF;    Owner = PPB [00|01|00:**]; Type = PMem64
>    Base = 0x812100000;  Length = 0x100000;      Alignment = 0xFFFFF;    Owner = PPB [00|03|00:**]; Type = PMem64

This is just 290MB.

I have no idea why QEMU, through the ACPI DSDT that it generates, tells the
guest kernel that 6GB is needed. It's not.

Specifically the assigned device (at 00:05.0) has the following BARs:

> PciBus: Discovered PCI @ [00|05|00]
>    BAR[0]: Type =  Mem32; Alignment = 0xFFFFFF; Length = 0x1000000;     Offset = 0x10
>    BAR[1]: Type = PMem64; Alignment = 0xFFFFFFF;        Length = 0x10000000;    Offset = 0x14
>    BAR[2]: Type = PMem64; Alignment = 0x1FFFFFF;        Length = 0x2000000;     Offset = 0x1C
>    BAR[3]: Type =   Io32; Alignment = 0x7F;     Length = 0x80;  Offset = 0x24

16MB (32-bit non-prefetchable), 256MB (64-bit prefetchable), and 32MB
(64-bit prefetchable), as I wrote in comment 11.

... Hm. QEMU commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit
PCI hole", 2017-11-16) explains it. (Part of v2.11.0.)

On Q35, the low RAM split is at 2GB, so when the guest is started with 4GB
total RAM, the RAM ends in GPA space at 6GB. This is what
pc_pci_hole64_start() returns, most likely.

Then q35_host_get_pci_hole64_end() sets "hole64_end" to 38GB, from
"hole64_start" being 6GB (see above) and "mch.pci_hole64_size" defaulting to
32GB (see Q35_PCI_HOST_HOLE64_SIZE_DEFAULT).

Because the 38GB end address is larger than the firmware-programmed one:
0x8_1220_0000 (=32GB+290MB), q35_host_get_pci_hole64_end() returns the
former. This is what we see in the guest kernel dmesg (0x97fffffff=38GB-1).

Two remarks:

* q35_host_get_pci_hole64_end() doesn't precisely live up to its leading
  comment. Because, currently the extension is not relative to
  q35_host_get_pci_hole64_start(), but to pc_pci_hole64_start(). In other
  words, the extension ignores the base address programmed by the firmware.
  OVMF sets that base address to 32GB, not to 6GB, hence the extension to
  the end address 38GB only provides 6GB extra, not the intended 32GB. IMO,
  this should be fixed as follows:

> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 02f95765880a..ea413f40955c 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -138,7 +138,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
>  {
>      PCIHostState *h = PCI_HOST_BRIDGE(obj);
>      Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> -    uint64_t hole64_start = pc_pci_hole64_start();
> +    uint64_t hole64_start = q35_host_get_pci_hole64_start();
>      Range w64;
>      uint64_t value, hole64_end;
>

* I guess I understand now why the guest kernel dmesg says what is says
  (namely the 0x97fffffff inclusive end address) -- but I still have no idea
  why that is a problem for the NVIDIA guest driver. :/

NaNa Liu, can you please add the following option, and retry (with the
original 4GB guest RAM size):

  -global q35-pcihost.pci-hole64-size=58G

This option should mitigate the above QEMU bug (i.e., mitigate the above
behavior that I claim is a bug), and it should produce the following in the
guest dmesg:

> pci_bus 0000:00: root bus resource [mem 0x800000000-0xfffffffff window]

And then the real question is whether that will matter to the NVIDIA driver.

Comment 31 Laszlo Ersek 2018-09-19 00:00:11 UTC

Side comment:

- I totally can't see why, or even how, the nvidia guest driver would
  allocate unused PCI MMIO space in the aperture of the root bridge. That
  space isn't decoded by any device at all.

- Fixing QEMU commit 9fa99d2519cb now, as I suggest in comment 30, might not
  be worth the effort. Namely, its goal is reserving 64-bit MMIO for hotplug
  purposes.

  - But the root bridge (pcie.0) doesn't support hot-plug anyway (see
    "docs/pcie.txt"), so enlarging its aperture *directly* doesn't make much
    sense.

  - PCIe hotplug into root ports and downstream ports does work, but for
    those we have the dedicated resource reservation capability now, and
    then the firmware will include those reservations in the root bridge's
    programmed aperture too.

  - The same applies to the PCIe-to-PCI bridge.

  - And the same should apply even to PCI-PCI bridges soon (see
    "[Qemu-devel] [PULL 4/7] hw/pci: add PCI resource reserve capability to
    legacy PCI bridge", msgid <20180907215109.146867-5-mst>).

Comment 32 Alex Williamson 2018-09-20 23:04:48 UTC

I was finally able to reproduce this on a Quadro K4000 using the QEMU commandline provided in comment 0.  Note that the SeaBIOS command line was never provided, but I can now spot the following difference between them:

SeaBIOS:
smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (fam: 06, model: 2a, stepping: 01)

OVMF:
smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+ (fam: 06, model: 0d, stepping: 03)

The command line in comment 0 does not specify a CPU type, defaulting to qemu64.  If I modify the command line that reproduces to include '-cpu host', Xorg starts without issue.

NaNa, please verify that this works in your configuration as well.

I suspect this should be closed NOTABUG unless someone wants to go to bat with NVIDIA for a business case why they should debug and add support for a virtual CPU model that customers likely do not use.  Or alternatively I suppose if someone wants to spend too much time iterating through CPU flags to try to determine which one(s) NVIDIA relies on.

Comment 33 liunana 2018-09-21 05:34:19 UTC

(In reply to Alex Williamson from comment #32)
> I was finally able to reproduce this on a Quadro K4000 using the QEMU
> commandline provided in comment 0.  Note that the SeaBIOS command line was
> never provided, but I can now spot the following difference between them:
> 
> SeaBIOS:
> smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (fam: 06, model: 2a,
> stepping: 01)
> 
> OVMF:
> smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+ (fam: 06, model: 0d,
> stepping: 03)
> 
> The command line in comment 0 does not specify a CPU type, defaulting to
> qemu64.  If I modify the command line that reproduces to include '-cpu
> host', Xorg starts without issue.
> 
> NaNa, please verify that this works in your configuration as well.
> 


Sorry,I attended a training program about two days.

Yes! It works well with EDID,thank you very much! 
But dirver still can't find the physical monitor even there is a monitor on NVIDIA card. I need to figure out it.
So is NVIDIA GPU pass-through related with CPU type ?

Comment 34 Alex Williamson 2018-09-21 17:31:28 UTC

Curiosity got the better of me, the nvidia driver depends on PAT support on the processor, which at first glance seems to be an entirely reasonable thing in 2018.  The qemu64 CPU model can be made to work using:

  -cpu qemu64,model=15,+pat

We have to bump the model or else Linux will disable PAT support with a note about errata in Pentium III and Core Duo/Solo CPUs.  PAT is supported on Pentium 4 and Core 2 or later processors.

Closing this, qemu64 is not a suitable CPU model for the NVIDIA driver.