Bug 1661147
Summary: | guest shows black screen after uninstalling qxl WDDM-DOD driver on UEFI (ovmf) platform --Fast Train | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | liunana <nanliu> | ||||
Component: | spice-qxl-wddm-dod | Assignee: | ybendito | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | SPICE QE bug list <spice-qe-bugs> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | --- | CC: | ailan, chayang, dblechte, gzaidman, jinzhao, juzhang, lersek, lijin, michal.skrivanek, nanliu, rbalakri, tpelka, uril, virt-maint, vrozenfe, wyu, xiagao, ybendito, zhguo | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.3 | ||||||
Hardware: | x86_64 | ||||||
OS: | Windows | ||||||
Whiteboard: | |||||||
Fixed In Version: | spice-qxl-wddm-dod-0.19-0 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-08-04 05:03:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Spice | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1713700, 1851845 | ||||||
Attachments: |
|
Description
liunana
2018-12-20 07:50:10 UTC
(In reply to liunana from comment #0) > Description of problem: > guest shows after uninstall qxl driver on windows 2019 guest and latest > windows 10 guest,old version(1803) windows 10 guest works well > 1803 appears to refer to a windows update, so my best guess is that this is something to be fixed in the drivers. Changing component to virtio-win for further investigation. Hi liunana, Could you provide full qemu cli? BTW, is this issue still reproducible without any virtio devices? Thanks. (In reply to lijin from comment #3) > Hi liunana, > > Could you provide full qemu cli? > sure, full qemu commands: /usr/libexec/qemu-kvm -name win10 -M pc -enable-kvm \ -cpu SandyBridge \ -monitor stdio \ -nodefaults -rtc base=utc \ -m 4G \ -smp 2,sockets=4,cores=1,threads=1,maxcpus=4 \ -global isa-debugcon.iobase=0x402 \ -object secret,id=sec0,data=redhat \ -drive file=/home/2-w10-pc/luks.qcow2,encrypt.format=luks,encrypt.key-secret=sec0,if=none,id=drive_image1,snapshot=off,aio=threads,cache=none,format=qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1 \ -device virtio-net-pci,mac=70:5a:0f:38:cd:d3,id=idhRa7sf,vectors=4,netdev=idNIlYmb -netdev tap,id=idNIlYmb,vhost=on \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/2-w10-pc/virtio-win-prewhql-0.1-163.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \ -device ich9-usb-uhci6 \ -device usb-tablet,id=mouse \ -device qxl-vga,id=vga1 \ -spice port=5900,disable-ticketing \ -device virtio-serial-pci,id=virtio-serial1 \ -chardev spicevmc,id=charchannel0,name=vdagent \ -device virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \ > > BTW, is this issue still reproducible without any virtio devices? > It can't be reproduced without any virtio devices,guest's performance is normal after uninstalling qxl driver. (In reply to lijin from comment #3) > Hi liunana, > > Could you provide full qemu cli? > Whole qemu commands without any virtio devices are as follow: /usr/libexec/qemu-kvm -name win2019 -M pc -enable-kvm \ -cpu SandyBridge \ -monitor stdio \ -nodefaults -rtc base=utc \ -m 4G \ -smp 4,sockets=4,cores=1,threads=1 \ -global isa-debugcon.iobase=0x402 \ -object secret,id=sec0,data=redhat \ -drive file=/home/6-w2019-ide/luks.qcow2,encrypt.format=luks,encrypt.key-secret=sec0,if=none,id=drive_image1,snapshot=off,aio=threads,cache=none,format=qcow2 \ -device ide-hd,id=image1,drive=drive_image1,bootindex=1 \ -device e1000e,mac=70:5a:0f:38:cd:d6,id=idhRa7sf,netdev=idNIlYmb -netdev tap,id=idNIlYmb,vhost=on \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/iso/ISO/Win2019/en_windows_server_2019_x64_dvd_4cb967d8.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \ -device ich9-usb-uhci6 \ -device usb-tablet,id=mouse \ -device qxl-vga,id=vga1 \ -spice port=5900,disable-ticketing,ipv6 \ wyu, Could you reproduce this issue and check if it's a regression? Thanks. (In reply to liunana from comment #4) > (In reply to lijin from comment #3) > > Hi liunana, > > > > Could you provide full qemu cli? > > > sure, full qemu commands: > > /usr/libexec/qemu-kvm -name win10 -M pc -enable-kvm \ > -cpu SandyBridge \ > -monitor stdio \ > -nodefaults -rtc base=utc \ > -m 4G \ > -smp 2,sockets=4,cores=1,threads=1,maxcpus=4 \ > -global isa-debugcon.iobase=0x402 \ > -object secret,id=sec0,data=redhat \ > -drive > file=/home/2-w10-pc/luks.qcow2,encrypt.format=luks,encrypt.key-secret=sec0, > if=none,id=drive_image1,snapshot=off,aio=threads,cache=none,format=qcow2 > -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1 \ > -device > virtio-net-pci,mac=70:5a:0f:38:cd:d3,id=idhRa7sf,vectors=4,netdev=idNIlYmb > -netdev tap,id=idNIlYmb,vhost=on \ > -drive > id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/ > home/2-w10-pc/virtio-win-prewhql-0.1-163.iso \ > -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \ > -device ich9-usb-uhci6 \ > -device usb-tablet,id=mouse \ > -device qxl-vga,id=vga1 \ > -spice port=5900,disable-ticketing \ > -device virtio-serial-pci,id=virtio-serial1 \ > -chardev spicevmc,id=charchannel0,name=vdagent \ > -device > virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0, > name=com.redhat.spice.0 \ > The same command line as above, but I cannot reproduce this issue. Any other special configuration? qemu-kvm-3.1.0-1.module+el8+2538+1516be75.x86_64 kernel-4.18.0-57.el8.x86_64 iso version: en_windows_server_2019_x64_dvd_4cb967d8.iso virtio-win: virtio-win-prewhql-0.1-163.iso qxl driver: spice-qxl-wddm-dod-0.18-1 > > > > > > BTW, is this issue still reproducible without any virtio devices? > > > It can't be reproduced without any virtio devices,guest's performance is > normal after uninstalling qxl driver. I can reproduce this issue with and without virtio device in ovmf but cannot reproduce with seabios boot cml: /usr/libexec/qemu-kvm -name ovmf+win10 -M q35 -enable-kvm \ -cpu SandyBridge \ -monitor stdio \ -nodefaults -rtc base=utc \ -m 4G \ -smp 2,sockets=4,cores=1,threads=1,maxcpus=4 \ -drive file=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 \ -drive file=OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=off \ -debugcon file:/home/1-win10/ovmf.log \ -global isa-debugcon.iobase=0x402 \ -drive file=/home/kvm_autotest_root/images/win10-64-virtio.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -device ich9-usb-uhci6 \ -device usb-tablet,id=mouse \ -vga qxl \ -spice port=5913,disable-ticketing \ qemu-kvm-3.1.0-1.module+el8+2538+1516be75.x86_64 kernel-4.18.0-57.el8.x86_64 iso : win10/win2016 virtio-win: virtio-win-prewhql-0.1-163.iso qxl driver: spice-qxl-wddm-dod-0.18-1 edk2-ovmf-20180508gitee3198e672e2-8.el8.noarch (In reply to Yu Wang from comment #7) > (In reply to liunana from comment #4) > > (In reply to lijin from comment #3) > > > Hi liunana, > > > > > > Could you provide full qemu cli? > > > > > sure, full qemu commands: > > > > /usr/libexec/qemu-kvm -name win10 -M pc -enable-kvm \ > > -cpu SandyBridge \ > > -monitor stdio \ > > -nodefaults -rtc base=utc \ > > -m 4G \ > > -smp 2,sockets=4,cores=1,threads=1,maxcpus=4 \ > > -global isa-debugcon.iobase=0x402 \ > > -object secret,id=sec0,data=redhat \ > > -drive > > file=/home/2-w10-pc/luks.qcow2,encrypt.format=luks,encrypt.key-secret=sec0, > > if=none,id=drive_image1,snapshot=off,aio=threads,cache=none,format=qcow2 > > -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1 \ > > -device > > virtio-net-pci,mac=70:5a:0f:38:cd:d3,id=idhRa7sf,vectors=4,netdev=idNIlYmb > > -netdev tap,id=idNIlYmb,vhost=on \ > > -drive > > id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/ > > home/2-w10-pc/virtio-win-prewhql-0.1-163.iso \ > > -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \ > > -device ich9-usb-uhci6 \ > > -device usb-tablet,id=mouse \ > > -device qxl-vga,id=vga1 \ > > -spice port=5900,disable-ticketing \ > > -device virtio-serial-pci,id=virtio-serial1 \ > > -chardev spicevmc,id=charchannel0,name=vdagent \ > > -device > > virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0, > > name=com.redhat.spice.0 \ > > > The same command line as above, but I cannot reproduce this issue. > Any other special configuration? > > qemu-kvm-3.1.0-1.module+el8+2538+1516be75.x86_64 > kernel-4.18.0-57.el8.x86_64 > iso version: en_windows_server_2019_x64_dvd_4cb967d8.iso > virtio-win: virtio-win-prewhql-0.1-163.iso > qxl driver: spice-qxl-wddm-dod-0.18-1 > > I can't reproduced it with seabios either,Seems it can be reproduced with ovmf + qxl easily. According to comment#8 and comment#9 , it seems not related virtio device (virtio-win), it related to ovmf, qemu-kvm or qxl. Could you help to check and change this component to the right one? Thanks Yu Wang It is plausible that the removal of the native QXL driver triggers a symptom that is only visible when using OVMF. Without a native graphics driver, Windows uses different paths to display video -- on UEFI, it uses direct framebuffer access, inherited from the firmware; with BIOS, it uses the VGA BIOS services. Some ideas for narrowing down the symptom: (1) As far as I remember, the native QXL driver still (?) doesn't satisfy the new driver signing requirements (with SB enabled); therefore, when SB is enabled, Windows Server 2016 (at the least) rejects the native QXL driver, and falls back to the basic Microsoft display driver, even while the QXL driver is installed just fine. If this is still the case, then it happens to offer another way to test the QXL->basic transition, but *without* uninstalling the QXL driver. That is, install the QXL driver with SB disabled, then -- rather than uninstall the QXL driver -- enable SB. Windows should -- again, to my latest knowledge -- revert to the basic driver. I wonder if the "black screen" issue pops up in that case too. If it doesn't (i.e. the basic / builtin driver works in this case), then it might suggest the symtom is somehow tied to the QXL driver uninstall path. (2) In the "black screen" status, we should check whether the guest is otherwise responsive: (2a) I always install Cygwin in my Windows guests (and configure sshd too), for easier file transfer from the host. If we can ping and/or log into the guest while the screen is dead, we could collect some information. (2b) A more flexible variant of the above would be enabling remote desktop access while the display works fine, and then triggering the symptom. If RDP still works, we could go in the Event Log and/or Device Manager, and see if there is an error code associated with the basic display driver. Upon reading through the bug again, some contradictions appear. * Comment 4 and comment 5 quote "-M pc", which excludes OVMF as a factor. However, comment 8 and comment 9 state the issue is "q35 + OVMF" only. Were the initial comments 4 and 5 incorrect? * Generally the BZ talks about v1809-based Windows builds, such as Windows Server 2019 and Windows 10. However, comment 8 states "win2016" as well. Can we please clarify whether "win2016" refers to a Windows build that *precedes* v1809? Thanks! I've installed a new domain from "en_windows_10_enterprise_ltsc_2019_x64_dvd_be3c8ffb.iso" ("Windows 10 Enterprise LTSC 2019 (x64) - DVD (English)"). I've confirmed it is v1809-based, via Settings | System | About. I've installed Cygwin and ssh access works. I've also enabled rdesktop connections, with the help of the following two articles: - https://docs.microsoft.com/en-us/windows-server/remote/remote-desktop-services/clients/remote-desktop-allow-access - https://blog.syskit.com/credssp-required-by-server-solutions On the host side, I use (note -- this is still my RHEL-7.5.z Workstation based laptop, with select virt components updated from Brew to the latest RHEL7 development builds): - kernel: 3.10.0-862.15.1.el7.x86_64 - qemu-kvm-rhev-2.12.0-20.el7.x86_64 - libvirt-daemon-4.5.0-10.el7.x86_64 - OVMF-20180508-4.gitee3198e672e2.el7.noarch Regarding virtio-win, I expose the ISO from the latest RHEL8 build to the guest, virtio-win-1.9.6-6.el8. And, because virtio-win does not contain the QXL WDDM driver (see bug 1218784), I got spice-qxl-wddm-dod-0.18-1 manually from Brew, and installed the driver successfully from that zip file. (This is admittedly a somewhat "Frankenstein" setup; if it fails to reproduce the symptom in the first place, I'll attempt to tweak it.) The steps I plan to do next were described in comment 12. (In reply to Laszlo Ersek from comment #12) > (1) As far as I remember, the native QXL driver still (?) doesn't satisfy > the new driver signing requirements (with SB enabled); therefore, when SB > is enabled, Windows Server 2016 (at the least) rejects the native QXL > driver, and falls back to the basic Microsoft display driver, even while > the QXL driver is installed just fine. > > If this is still the case, then it happens to offer another way to test > the QXL->basic transition, but *without* uninstalling the QXL driver. This idea doesn't work; after enabling SB in the domain described in comment 16, and rebooting the guest, Confirm-SecureBootUEFI reports True in PowerShell, however the QXL DOD driver *continues* to work fine. (From a different perspetive, this is actually what we want, but for the present BZ, it means that idea (1) can't help.) (In reply to Laszlo Ersek from comment #16) > after enabling SB in the domain described in comment 16 typo, I meant comment 15 I'm confirming the reported behavior, using the domain from comment 15. In order to "uninstall" the QXL DOD, I used Device Manager | Roll Back Driver. The rollback is successful, but the display goes dark immediately. Using ssh (comment 12 / (2a)) and rdesktop (comment 12 / (2b)) however, everything works fine. I'll try to collect more information in this state. Seen with Device Manager through rdesktop, the Microsoft Display Adapter is now reporting an error: - General | Device status: > Windows has stopped this device because it has reported problems. (Code > 43) - Events: > Device not started (BasicDisplay) > Device PCI\VEN_1B36&DEV_0100&SUBSYS_11001AF4&REV_04\3&2411e6fe&0&08 had a > problem starting. > > Driver Name: display.inf > Class Guid: {4d36e968-e325-11ce-bfc1-08002be10318} > Service: BasicDisplay > Lower Filters: > Upper Filters: > Problem: 0x15 > Problem Status: 0x0 - Resources: > This device isn't using any resources because it has a problem. ... I've saved the full event log, related to the device, using Events | View All Events, to a file. I'll attach it soon. Created attachment 1518143 [details] windows event log for comment 20 A web search suggests that "Problem: 0x15" stands for ERROR_NOT_READY ("The device is not ready"), and that it is associated with resource conflicts. A resource conflict would be consistent with this behavior's UEFI-specific nature. When the OS inherits the framebuffer address from the firmware (i.e., when the OS boot loader carries it over from the boottime-only Graphics Output Protocol), that address generally points into an MMIO BAR of a PCI(E) device. Now, if the OS starts driving the device with a native driver (such as the QXL DOD), possibly even re-assigning PCI resources first, then the originally inherited framebuffer address may entirely lose its meaning. In other words, there would be no returning to the UEFI framebuffer at OS runtime, save a reboot. Indeed: I rebooted the domain (following comment 20) still via "rdesktop", and post reboot, the Microsoft Basic Display Adapter resumed functioning properly. Honestly, I would actually call the "black screen" in this scenario *expected* behavior. I don't know how it would be possible, per design, to prevent it from happening, right after the native GPU driver is removed, but the OS is not rebooted yet. I wonder why this problem does *not* occur with Windows builds that precede v1809. In my opinion, the way to solve is the following: when the video driver is rolled back, Windows should say that the change will take effect only after the next boot, and it should offer to reboot at once. (Note: some other GPU drivers do just that, when rolled back: for example, when NVIDIA's GTX driver is rolled back, the rollback only takes effect after next reboot -- and windows warns the user about this.) Can we change the QXL DOD driver so that Windows is aware of this requirement? I.e. that no in-place rollback is supported? Aha! Now we're getting somewhere. See the following WDDM test case: WDDM PnPStop Test - Update and Rollback (Windows 10 HLK) https://msdn.microsoft.com/en-us/office/dn942114(v=vs.90) The documentation says, > This automated test updates the driver to the same version, and then rolls > back to the version that is currently installed. By doing so, it verifies > that the driver does not hold any resources, and allows for clean driver > updates and rollbacks without restarting the system. Note: "without restarting the system". Is it possible that the QXL DOD driver does not release all resources (e.g. MMIO PCI BARs) when it is rolled back, despite advertizing itself as such? ... The following commit, between v0.17 and v0.18 of the QXL DOD <https://github.com/daynix/qxl-wddm-dod>, looks somewhat relevant: commit 9b55ed704196a5747a2beede179d9a6f82ccf20e Author: yuri.benditovich <yuri.benditovich> Date: Thu May 25 16:06:37 2017 +0300 qxl-wddm-dod: Fix unmapping of physical memory https://bugzilla.redhat.com/show_bug.cgi?id=1454866 Due to wrong addresses passed to class driver, it never does unmapping of physical memory, causing a leak of virtual address range. On x86 systems the device fails to start due to failure to map physical memory range after 10-50 cycles of disable-enable. Signed-off-by: Yuri Benditovich <yuri.benditovich> Acked-by: Frediano Ziglio <fziglio> (Note: I don't claim that the above commit is incorrect, just that it touches a plausibly related area.) I'm CC'ing Yuri, and changing the Component field to "spice-qxl-wddm-dod", for further analysis. Thanks. What I currently see on my setups: 1.Driver uninstallation consistently causes black screen on ovmf with 2019 and also on much older Win10. 2.I do not reproduce any problem with 2019 on '-m pc', the uninstall works exactly as on Win10-1803 and earlier (no black screen) It looks like we have 2 different problems here. First I will try to dig into the problem that I's able to reproduce (i.e. with ovmf), probably it is good idea to create Regarding the originally reported problem (with -m pc): Other virtio device do not have any relation to the QXL and should not affect it except of virtio-serial that used by spice agent. So, can you please try it as in comment #4, but before driver uninstallation stop the spice agent (from administrator command line: 'sc stop vdservice', then verify that it is stopped: 'sc query vdservice'). Whether the driver uninstall still causes black screen? (In reply to Laszlo Ersek from comment #14) > Upon reading through the bug again, some contradictions appear. > > * Comment 4 and comment 5 quote "-M pc", which excludes OVMF as a factor. > However, comment 8 and comment 9 state the issue is "q35 + OVMF" only. > > Were the initial comments 4 and 5 incorrect? > > * Generally the BZ talks about v1809-based Windows builds, such as Windows > Server 2019 and Windows 10. However, comment 8 states "win2016" as well. Sorry for my mistake, it is win10 and win2019. I wrote the wrong guest. > > Can we please clarify whether "win2016" refers to a Windows build that > *precedes* v1809? > > Thanks! (In reply to ybendito from comment #26) > It looks like we have 2 different problems here. > First I will try to dig into the problem that I's able to reproduce (i.e. > with ovmf), probably it is good idea to create > > Regarding the originally reported problem (with -m pc): > Other virtio device do not have any relation to the QXL and should not > affect it except of virtio-serial that used by spice agent. > So, can you please try it as in comment #4, but before driver uninstallation > stop the spice agent (from administrator command line: 'sc stop vdservice', > then verify that it is stopped: 'sc query vdservice'). > Whether the driver uninstall still causes black screen? hi, I execute command 'sc stop vdservice' inside windows(1809) 10 guest, and there is some info: [SC] OpenService FAILED 1060: The specified service does not exist as an installed service. seems this service doesn't be installed default Besides, I'm sorry that I can't reproduce this bug with "-M pc" anymore, maybe I do some operations incorrect before so that screen shows black. But It's easily to reproduce this bug with ovmf. (In reply to ybendito from comment #26) > What I currently see on my setups: > 1.Driver uninstallation consistently causes black screen on ovmf with 2019 > and also on much older Win10. > 2.I do not reproduce any problem with 2019 on '-m pc', the uninstall works > exactly as on Win10-1803 and earlier (no black screen) > > It looks like we have 2 different problems here. > First I will try to dig into the problem that I's able to reproduce (i.e. > with ovmf), probably it is good idea to create Please note (everyone) that symptoms #1 and #2 are not comparable. The OVMF packages that we provide in RHEL7 and RHEL8 *require* the use of the Q35 machine type. So, with "-m pc" (case #2), two factors change at once, relative to case #1: machine type *and* guest firmware. Therefore, for full coverage, three cases exist: - OVMF (necessarily with q35) - SeaBIOS with q35 - SeaBIOS with pc I do agree that investigating the reliably reproducible symptom (ie, OVMF/q35) is a good choice; I just wanted to point out that testing with "pc" necessarily removes OVMF from the picture, and that doesn't help narrowing down the root cause (without explicitly testing SeaBIOS/q35). Thanks, Laszlo (In reply to Laszlo Ersek from comment #29) > (In reply to ybendito from comment #26) > > What I currently see on my setups: > > 1.Driver uninstallation consistently causes black screen on ovmf with 2019 > > and also on much older Win10. > > 2.I do not reproduce any problem with 2019 on '-m pc', the uninstall works > > exactly as on Win10-1803 and earlier (no black screen) > > > > It looks like we have 2 different problems here. > > First I will try to dig into the problem that I's able to reproduce (i.e. > > with ovmf), probably it is good idea to create > > Please note (everyone) that symptoms #1 and #2 are not comparable. The OVMF > packages that we provide in RHEL7 and RHEL8 *require* the use of the Q35 > machine type. So, with "-m pc" (case #2), two factors change at once, > relative to case #1: machine type *and* guest firmware. > > Therefore, for full coverage, three cases exist: > - OVMF (necessarily with q35) > - SeaBIOS with q35 > - SeaBIOS with pc Thanks for your reminder, I agree with you and always follow these rules strictly. > > I do agree that investigating the reliably reproducible symptom (ie, > OVMF/q35) is a good choice; I just wanted to point out that testing with > "pc" necessarily removes OVMF from the picture, and that doesn't help > narrowing down the root cause (without explicitly testing SeaBIOS/q35). yes, I do use qemu command "-M q35" when boot guest with OVMF packages. I've checked the driver with q35 without ovmf and there is no problem. So, the problem exists only with ovmf and it is not limited by 2019 and latest Win10 and exists on all Windows revision as I can see. So I'm changing the name of the BZ to reflect exact problem (this also will help to target it properly). (In reply to ybendito from comment #32) > Fix posted > https://gitlab.freedesktop.org/spice/win32/qxl-wddm-dod/commit/ > 61193f9cab26484036c61146cf8a5aa7c088f225 Amazing! :) Thank you. Yuri, a comment for the ExGetFirmwareEnvironmentVariable() call in the IsUefiMode() function: It seems that you intend to tell apart a "variable not found" error from a "function not implemented" error. The comment says, // on UEFI system the status is STATUS_VARIABLE_NOT_FOUND This is actually not possible to guarantee, *unless* you own that variable. (If you own the variable, then you can decide whether you expect it to exist or not.) "Ownership" of a UEFI variable is tracked by vendor (aka "namespace") GUID. A variable name counts as unique only within that namespace. See the VendorGuid param in <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-exgetfirmwareenvironmentvariable>. The important question is how vendor GUIDs are "handed out". Simple: run "uuidgen", and whatever you get is now your vendor guid (well, one of the many you can generate for yourself). Also importantly, the zero guid, or any other guid that you generate non-(pseudo-)randomly, carries the risk of conflict, so those should never be used. Long story short: I suggest replacing GUID guid = {}; with an actual GUID that you generate for yourself with "uuidgen". (Note: I realize that the patch currently implements *exactly* what the "Remarks" section recommends, at <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-exgetfirmwareenvironmentvariable#remarks>. Unfortunately, that recommendation is wrong. It should not suggest a dummy GUID such as "{00000000-0000-0000-0000-000000000000}" instead, it should suggest a dummy GUID that you generate for yourself, following best practices, with sufficient randomness, such as the "uuidgen" utility I've now filed <https://github.com/MicrosoftDocs/windows-driver-docs-ddi/issues/354> about this documentation issue. Thanks.) (In reply to Laszlo Ersek from comment #34) > Yuri, a comment for the ExGetFirmwareEnvironmentVariable() call in the > IsUefiMode() function: > > It seems that you intend to tell apart a "variable not found" error from a > "function not implemented" error. The comment says, > > // on UEFI system the status is STATUS_VARIABLE_NOT_FOUND > > This is actually not possible to guarantee, *unless* you own that variable. > (If you own the variable, then you can decide whether you expect it to exist > or not.) > > "Ownership" of a UEFI variable is tracked by vendor (aka "namespace") GUID. > A variable name counts as unique only within that namespace. See the > VendorGuid param in > <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ > nf-wdm-exgetfirmwareenvironmentvariable>. > > The important question is how vendor GUIDs are "handed out". Simple: run > "uuidgen", and whatever you get is now your vendor guid (well, one of the > many you can generate for yourself). Also importantly, the zero guid, or any > other guid that you generate non-(pseudo-)randomly, carries the risk of > conflict, so those should never be used. > > Long story short: I suggest replacing > > GUID guid = {}; > > with an actual GUID that you generate for yourself with "uuidgen". > > (Note: I realize that the patch currently implements *exactly* what the > "Remarks" section recommends, at > <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ > nf-wdm-exgetfirmwareenvironmentvariable#remarks>. Unfortunately, that > recommendation is wrong. It should not suggest > > a dummy GUID such as "{00000000-0000-0000-0000-000000000000}" > > instead, it should suggest > > a dummy GUID that you generate for yourself, following best practices, > with sufficient randomness, such as the "uuidgen" utility > > I've now filed > <https://github.com/MicrosoftDocs/windows-driver-docs-ddi/issues/354> about > this documentation issue. > > Thanks.) IIUC, you describe possible scenario when the UEFI finds {NULL_GUID + variable}. Even in this case the returned value might be 'success' or some kind of error if the variable is larger than provided buffer; the return code will still be different from STATUS_NOT_IMPLEMENTED (which in the code designates non-UEFI system), so we still recognize the UEFI correctly. I agree that this is good idea to provide unique GUID (and probably also unique variable name, like "{the same GUID}", then we for sure shall receive STATUS_VARIABLE_NOT_FOUND. (In reply to ybendito from comment #35) > IIUC, you describe possible scenario when the UEFI finds {NULL_GUID + > variable}. Correct. > Even in this case the returned value might be 'success' or some kind of error > if the variable is larger than provided buffer; the return code will still be > different from STATUS_NOT_IMPLEMENTED (which in the code designates non-UEFI > system), so we still recognize the UEFI correctly. Also correct :) It's just general good practice to keep GUIDs actually unique. > I agree that this is good idea to provide unique GUID (and probably also > unique variable name, like "{the same GUID}", then we for sure shall receive > STATUS_VARIABLE_NOT_FOUND. The variable name can be anything we like, as long as the GUID is unique. If the GUID was generated with enough randomness, then the variable name cannot diminish that, and also need not make it more random. Thank you! spice-qxl-wddm-dod-0.19-0 https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=881523 Take Fixed in spice-qxl-wddm-dod-0.19-0 We are hitting this on virtio-win latest installer with qxldod 19.2 on seabios (In reply to Gal Zaidman from comment #43) > We are hitting this on virtio-win latest installer with qxldod 19.2 on > seabios Definitely not exactly this. This BZ is dedicated to UEFI machine only and the problem happens due the driver. If you hit this on seabios please provide more info about the exact host and qemu. Similar behavior (due to different component) several years ago on seabios and it was caused by qemu bug. (In reply to ybendito from comment #44) > (In reply to Gal Zaidman from comment #43) > > We are hitting this on virtio-win latest installer with qxldod 19.2 on > > seabios > > Definitely not exactly this. This BZ is dedicated to UEFI machine only Do you want to open a separate one for seabios? > and the problem happens due the driver. It can be due to the driver. I hit it when I uninstalled the driver with its msi. I installed the virtio-win drivers/agents (which triggers the qxldod msi installer) and tried uninstalling just the driver with the msi. I got a black screen and only when I rebooted the machine I could log into the device manager and saw that the display driver is recognized as "microsoft generic display adapter". > If you hit this on seabios please provide more info about the exact host and > qemu. You can see the bug that we opened: https://bugzilla.redhat.com/show_bug.cgi?id=1851845 adding xiagao, can you provide more information on the host and qemu ? > Similar behavior (due to different component) several years ago on seabios > and it was caused by qemu bug. Hit the same issue on win10/win2019 with seabios, refer to this bug: Bug 1851845 - [virtio-win-installer] Hit black screen on win2019/win10 after uninstalling qxl WDDM-DOD driver with seabios test version: (host) kernel-4.18.0-214.el8.x86_64 qemu-img-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64 seabios-bin-1.13.0-1.module+el8.3.0+6124+819ee737.noarch (guest) os : win2019,win10(2004) virtio-win: virtio-win-1.9.12-1.iso qxl driver: spice-qxl-wddm-dod-0.19-2 steps: 1. boot windows 2019 guest or latest version(2004) windows 10 guest 2. install qxl driver 3. uninstall qxl driver after step 3, hit black screen. Note that all the issues with seabios are not related to this BZ. |