Bug 1314600 - [virtio-win][svvp][ws2016][uefi]job "UEFI GOP mode test" failed
[virtio-win][svvp][ws2016][uefi]job "UEFI GOP mode test" failed
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win (Show other bugs)
7.3
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Dmitry Fleytman
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-03 21:50 EST by lijin
Modified: 2017-05-11 01:37 EDT (History)
6 users (show)

See Also:
Fixed In Version: 0.4-1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-05-11 01:37:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
"UEFI GOP mode test" job log (9.42 MB, application/zip)
2016-03-03 21:50 EST, lijin
no flags Details

  None (edit)
Description lijin 2016-03-03 21:50:31 EST
Created attachment 1133026 [details]
"UEFI GOP mode test" job log

Description of problem:
when use uefi,svvp job "UEFI GOP mode test" failed

Version-Release number of selected component (if applicable):
qxlwddm-0.1-12
qemu-kvm-rhev-2.3.0-31.el7_2.7.x86_64
kernel-3.10.0-350.el7.x86_64
virtio-win-1.8.0-4.el7.noarch
seabios-1.7.5-11.el7.x86_64
Guest : windows server 2016 technical preview 4
HLK version:10.1.10586.0
HLK playlist version:1.9

How reproducible:
100%

Steps to Reproduce:
1.boot guest with uefi:
/usr/libexec/qemu-kvm --nodefaults --nodefconfig -m 8G -smp 8 -boot menu=on -cpu Nehalem,+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic,hv_time -machine pc,iommu=on -drive file=/usr/share/virtio-win/virtio-win.iso,id=drive-cd1,media=cdrom,if=none -device ide-drive,drive=drive-cd1,id=cd111,bus=ide.0,unit=0 -uuid 0e55aa0d-9ce3-43a1-936b-a6879b95778e -smbios type=1,manufacturer="Red Hat",product="Red Hat Enterprise OpenStack & Red Hat Enterprise Virtualization",version=7Server-0.1,serial="44454C4C-5700-1058-804B-B7 C04 F483258_00:21:9b:58:d2:90",uuid=6ea97abc-6ff3-4eed-a340-9823615b347b -usb -device usb-tablet,id=tablet0 -object iothread,id=thread0 -drive file=win2016-SUT-uefi.qcow2,if=none,id=drive-virtio0-0-0,format=qcow2,werror=stop,rerror=stop,cache=none,serial=number -device virtio-blk-pci,iothread=thread0,drive=drive-virtio0-0-0,id=virti0-0-0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup1,queues=4 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e2:52:5e:62:77,addr=0x04,mq=on,vectors=10 -name win2016R2-SUT -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device usb-ehci,id=ehci0 -drive file=usb-storage-uefi.raw,if=none,id=drive-usb-2-0,media=disk,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -device usb-storage,bus=ehci0.0,drive=drive-usb-2-0,id=usb-2-0,removable=on -rtc base=localtime,clock=host,driftfix=slew -chardev socket,id=b111a,path=/tmp/monitor-win2016-sut,server,nowait -mon chardev=b111a,mode=readline -monitor stdio \
-drive file=OVMF-test/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=OVMF-test/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -drive id=cdrom1,if=none,cache=none,snapshot=off,aio=threads,file=OVMF-test/UefiShell.iso,media=cdrom -device ide-cd,drive=cdrom1,id=ide-cd1 \
-vnc :0 -vga cirrus \
#-spice disable-ticketing,port=5900 -vga qxl-

2.submit job in hlk

Actual results:
1.when boot with vnc/cirrus,the resolution is 800*600(can not change to ohter values),job failed as following:
Start Test 3/4/2016 10:48:07.681 AM System without Internal panel mode requirements 
Error 3/4/2016 10:48:07.685 AM Failure: Expecting 1024 x 768 resolution during boot with no available EDID 
File:   base\test\boot\uefilogo\graphics.c Line: 182 
Error Type:   NT_STATUS 
Error Code:   0xc0000001 
Error Text:   Error 0xc0000001 
End Test 3/4/2016 10:48:07.704 AM System without Internal panel mode requirements 
Result:   Fail 

2.when boot with qxl/spice,the resolution is 1024*768(this is the minimum resolution),job failed as following:
Start Test 3/4/2016 11:05:25.050 AM System with Internal panel mode requirements 
Error 3/4/2016 11:05:25.054 AM Failure: Boot resolution (800x600) does not match preferred (0x0) monitor resolution 
File:   base\test\boot\uefilogo\graphics.c Line: 124 
Error Type:   NT_STATUS 
Error Code:   0xc0000001 
Error Text:   Error 0xc0000001 
End Test 3/4/2016 11:05:25.060 AM System with Internal panel mode requirements 
Result:   Fail 

Expected results:
job can pass

Additional info:
when boot guest with bios,job PASS as "Server SKU not running UEFI mode, skipping tests."
Comment 3 Laszlo Ersek 2016-03-18 18:38:56 EDT
(In reply to lijin from comment #0)

> Actual results:
> 1.when boot with vnc/cirrus,the resolution is 800*600(can not change to
> ohter values),job failed as following:
> Start Test 3/4/2016 10:48:07.681 AM System without Internal panel mode
> requirements 
> Error 3/4/2016 10:48:07.685 AM Failure: Expecting 1024 x 768 resolution
> during boot with no available EDID 

Cirrus should not be used for UEFI + Windows at all.

stdvga is possible but still not recommended. QXL is recommended.

> 2.when boot with qxl/spice,the resolution is 1024*768(this is the minimum
> resolution),job failed as following:
> Start Test 3/4/2016 11:05:25.050 AM System with Internal panel mode
> requirements 
> Error 3/4/2016 11:05:25.054 AM Failure: Boot resolution (800x600) does not
> match preferred (0x0) monitor resolution 
> File:   base\test\boot\uefilogo\graphics.c Line: 124

The above error message doesn't tell me anything. What is "preferred (0x0) monitor resolution"? There is no preferred monitor resolution in a virtual machine. Can we look at the source code of the test (base\test\boot\uefilogo\graphics.c)? Is it perhaps complaining about the lack of EFI_EDID_DISCOVERED_PROTOCOL / EFI_EDID_ACTIVE_PROTOCOL?

I tried to look at attachment 1133026 [details]. It's a ZIP file of TXT files, XML files, and binary blobs. None of it is comprehensible. I don't think I can do anything here until I get a detailed, human-readable error message, or access to the source code of the test.
Comment 15 Laszlo Ersek 2016-03-21 10:07:16 EDT
While I'm waiting for dashboard account creation, I also logged in to the remote test server with rdesktop, and reviewed the test failure more closely. I'm noticing (and it's also captured in comment 0) that the test that fails says:

  System with Internal panel mode requirements

Our virtual machines have no internal panel. The "Integrated display" requirements from "System.Server.Firmware.UEFI.GOP.Display" don't apply.
Comment 16 Laszlo Ersek 2016-03-21 19:22:33 EDT
(CC Vadim)

After spending the day installing, reinstalling, and re-re-re-reinstalling
these gosh-darn Windows Server 2016 Tech Preview 4 guests, and the Win 10
HLK Controller and Test Client packages on them, the two virtual machines
finally connected, and I managed to re-run the test myself.

I think setting this up was one of the most frustrating work experiences
I've ever had.

The issue reproduces for me. Here's how I tried to remedy it:

* I changed OVMF to install the EDID Discovered and EDID Active Protocols on
  the same handle as the Graphics Output Protocol. The contents that I used
  for these EDID protocols follows the UEFI spec: I zeroed everything out,
  which -- according to the UEFI spec -- means that no EDID information is
  available from the display device.

* In addition, I changed the UEFI boot resolution to 1024x768 (this is
  actually configurable from within OVMF).

The test results didn't change. I'm still getting the following:

> Native  Resolution: 0x0
> Current Resolution: 1024x768
> Start: System with Internal panel mode requirements, TUID=
> Error: 0xc0000001, Error 0xc0000001
> 	Failure: Boot resolution (1024x768) does not match preferred (0x0)
>                monitor resolution
> 	File=base\test\boot\uefilogo\graphics.c Line=124
> End: Fail, System with Internal panel mode requirements, TUID=, Repro=

Now let me quote what this test case is supposed to verify:

> This job verifies a device implements the GOP mode protocol in accordance
> with WHCK requirements.
>
> The following is verified:
>
> Mode Selection
>
> 1. Once UEFI has determined which display to enabled to display the Pre-OS
>    screen, it must select the mode to apply based on the following logic
>    a. System with an Integrated display(Laptop, All In One, Tablet): The
>       display must always be set to its native resolution and native
>       timing
>    b. System without an Integrated display (desktop):
>       i.   UEFI must attempt to set the native resolution and timing of
>            the display by obtaining it from the EDID.
>       ii.  If that is not supported, UEFI must select an alternate mode
>            that matches the same aspect ratio as the native resolution of
>            the display.
>       iii. At the minimum, UEFI must set a mode of 1024 x 768
>       iv.  If the display device does not provide an EDID, UEFI must set a
>            mode of 1024 x 768
>   c. The firmware must always use a 32 bit linear frame buffer to display
>      the Pre-OS screen
>   d. PixelsPerScanLine must be equal to the HorizontalResolution.
>   e. PixelFormat must be PixelBlueGreenRedReserved8BitPerColor. Note that
>      a physical frame buffer is required; PixelBltOnly is not supported.

OVMF satisfies requirements (c), (d) and (e).

Requirement (a) doesn't apply, because the virtual machine does not have an
integrated display.

Requirement (b) applies, and OVMF satisfies it with sub-case (iv). (I also
tested a resolution of 1280x800, with identical -- failing -- results.)

However! Note that the test output says:

    "System with Internal panel mode requirements"

The job is trying to verify requirement (a) against OVMF, but that
requirement doesn't apply at all, because the virtual machine is not a
laptop or tablet. It has a (virtual) *discrete* PCI graphics card.

So, this test job is bogus. Until Microsoft tells us exactly what HLK
derives from that the VM has an integrated display, there's nothing I can
do.

... I also grepped the QXLDOD source code for EDID. I see that the
QxlDod::QueryDeviceDescriptor() function returns
STATUS_MONITOR_NO_MORE_DESCRIPTOR_DATA, and not
STATUS_MONITOR_NO_DESCRIPTOR. The difference is, according to MSDN:

> STATUS_MONITOR_NO_DESCRIPTOR
>
>   The child device identified by ChildUid is connected to a monitor that
>   does not support an EDID descriptor.
>
> STATUS_MONITOR_NO_MORE_DESCRIPTOR_DATA
>
>   The child device identified by ChildUid is connected to a monitor that
>   does support an EDID descriptor, but the descriptor does not have the
>   EDID extension block specified by the DescriptorOffset and
>   DescriptorLength members of DeviceDescriptor.

Maybe Windows notices this discrepancy between the firmware and the native
OS driver; i.e., that the firmware says "monitor has no EDID", while the
native OS driver says "firmware does have EDID, just never the extension
block(s) you are asking for". I dunno.

... I also notice that the native OS driver has a (bit)field called
"CURRENT_BDD_MODE.Flags.IsInternal", with the comment

> 1 if it was determined (i.e. through ACPI) that an internal panel is being
> driven

However, this flag is never set to TRUE.

... I do find it interesting that with Cirrus (which has no native OS
driver, only the inherited GOP framebuffer), the error message is different:

    "System without Internal panel mode requirements"

At least in that case the job correctly sees that the VM has no internal
panel. The QXL DOD might mean some difference after all.
Comment 18 Vadim Rozenfeld 2016-03-21 19:51:44 EDT
(In reply to Laszlo Ersek from comment #16)
> (CC Vadim)
> 
> After spending the day installing, reinstalling, and re-re-re-reinstalling
> these gosh-darn Windows Server 2016 Tech Preview 4 guests, and the Win 10
> HLK Controller and Test Client packages on them, the two virtual machines
> finally connected, and I managed to re-run the test myself.
> 
> I think setting this up was one of the most frustrating work experiences
> I've ever had.
> 
> The issue reproduces for me. Here's how I tried to remedy it:
> 
> * I changed OVMF to install the EDID Discovered and EDID Active Protocols on
>   the same handle as the Graphics Output Protocol. The contents that I used
>   for these EDID protocols follows the UEFI spec: I zeroed everything out,
>   which -- according to the UEFI spec -- means that no EDID information is
>   available from the display device.
> 
> * In addition, I changed the UEFI boot resolution to 1024x768 (this is
>   actually configurable from within OVMF).
> 
> The test results didn't change. I'm still getting the following:
> 
> > Native  Resolution: 0x0
> > Current Resolution: 1024x768
> > Start: System with Internal panel mode requirements, TUID=
> > Error: 0xc0000001, Error 0xc0000001
> > 	Failure: Boot resolution (1024x768) does not match preferred (0x0)
> >                monitor resolution
> > 	File=base\test\boot\uefilogo\graphics.c Line=124
> > End: Fail, System with Internal panel mode requirements, TUID=, Repro=
> 
> Now let me quote what this test case is supposed to verify:
> 
> > This job verifies a device implements the GOP mode protocol in accordance
> > with WHCK requirements.
> >
> > The following is verified:
> >
> > Mode Selection
> >
> > 1. Once UEFI has determined which display to enabled to display the Pre-OS
> >    screen, it must select the mode to apply based on the following logic
> >    a. System with an Integrated display(Laptop, All In One, Tablet): The
> >       display must always be set to its native resolution and native
> >       timing
> >    b. System without an Integrated display (desktop):
> >       i.   UEFI must attempt to set the native resolution and timing of
> >            the display by obtaining it from the EDID.
> >       ii.  If that is not supported, UEFI must select an alternate mode
> >            that matches the same aspect ratio as the native resolution of
> >            the display.
> >       iii. At the minimum, UEFI must set a mode of 1024 x 768
> >       iv.  If the display device does not provide an EDID, UEFI must set a
> >            mode of 1024 x 768
> >   c. The firmware must always use a 32 bit linear frame buffer to display
> >      the Pre-OS screen
> >   d. PixelsPerScanLine must be equal to the HorizontalResolution.
> >   e. PixelFormat must be PixelBlueGreenRedReserved8BitPerColor. Note that
> >      a physical frame buffer is required; PixelBltOnly is not supported.
> 
> OVMF satisfies requirements (c), (d) and (e).
> 
> Requirement (a) doesn't apply, because the virtual machine does not have an
> integrated display.
> 
> Requirement (b) applies, and OVMF satisfies it with sub-case (iv). (I also
> tested a resolution of 1280x800, with identical -- failing -- results.)
> 
> However! Note that the test output says:
> 
>     "System with Internal panel mode requirements"
> 
> The job is trying to verify requirement (a) against OVMF, but that
> requirement doesn't apply at all, because the virtual machine is not a
> laptop or tablet. It has a (virtual) *discrete* PCI graphics card.
> 
> So, this test job is bogus. Until Microsoft tells us exactly what HLK
> derives from that the VM has an integrated display, there's nothing I can
> do.
> 
> ... I also grepped the QXLDOD source code for EDID. I see that the
> QxlDod::QueryDeviceDescriptor() function returns
> STATUS_MONITOR_NO_MORE_DESCRIPTOR_DATA, and not
> STATUS_MONITOR_NO_DESCRIPTOR. The difference is, according to MSDN:
> 
> > STATUS_MONITOR_NO_DESCRIPTOR
> >
> >   The child device identified by ChildUid is connected to a monitor that
> >   does not support an EDID descriptor.
> >
> > STATUS_MONITOR_NO_MORE_DESCRIPTOR_DATA
> >
> >   The child device identified by ChildUid is connected to a monitor that
> >   does support an EDID descriptor, but the descriptor does not have the
> >   EDID extension block specified by the DescriptorOffset and
> >   DescriptorLength members of DeviceDescriptor.
> 
> Maybe Windows notices this discrepancy between the firmware and the native
> OS driver; i.e., that the firmware says "monitor has no EDID", while the
> native OS driver says "firmware does have EDID, just never the extension
> block(s) you are asking for". I dunno.
> 
> ... I also notice that the native OS driver has a (bit)field called
> "CURRENT_BDD_MODE.Flags.IsInternal", with the comment
> 
> > 1 if it was determined (i.e. through ACPI) that an internal panel is being
> > driven
> 
> However, this flag is never set to TRUE.
> 
> ... I do find it interesting that with Cirrus (which has no native OS
> driver, only the inherited GOP framebuffer), the error message is different:
> 
>     "System without Internal panel mode requirements"
> 
> At least in that case the job correctly sees that the VM has no internal
> panel. The QXL DOD might mean some difference after all.

I will look into this problem shortly. Just need some time for building a test system.
Comment 19 Laszlo Ersek 2016-03-21 20:01:46 EDT
Update:

I removed the QXL DOD driver from within the guest (i.e., the HLK test
system), using the Roll Back Driver button in Device Manager. The video
device is again recognized as a Microsoft Basic Display Adapter only. Note
that I didn't change the graphics type in the domain XML, it remans QXL.

This time the UEFI GOP mode test *passed*. The UEFI resolution is 1024x768,
and the EDID protocols (with zero contents) are installed.

> Native  Resolution: 0x0
> Current Resolution: 1024x768
> Start: System without Internal panel mode requirements, TUID=
> Pass: 1024 x 768 resolution during boot with no available EDID
> End: Pass, System without Internal panel mode requirements, TUID=, Repro=

Next test: try it without the EDID protocols... this test passes too

Next test: same, but change the initial firmware resolution back to
800x600... this fails with the same error message as the Cirrus test.

So, we do have a bit of information:


device  UEFI        EDID    native OS
        resolution  protos  driver     result
------  ----------  ------  ---------  -------------------------------------
QXL     1024x768    yes     QXLDOD     job expects internal panel, fails
QXL     1024x768    yes     basic fb   success
QXL     1024x768    no      basic fb   success
QXL      800x600    no      basic fb   job expects 1024x768, fails

The moral is the following:

- The QXLDOD driver makes the test job think that the virtual machine
  has an internal display panel. I don't know why this happens, but
  it is incorrect, and if possible, the QXLDOD driver should not make
  this impression.

- The OVMF default resolution should be changed to 1024x768. It should take
  a trivial two-liner patch, but I don't think it's worth doing until the
  internal panel misunderstanding is fixed. Thus, for now I'll leave this
  bug assigned to virtio-win.
Comment 20 Laszlo Ersek 2016-03-21 20:04:26 EDT
(In reply to Vadim Rozenfeld from comment #18)

> I will look into this problem shortly. Just need some time for building a
> test system.

Thanks Vadim!
Comment 21 Laszlo Ersek 2016-03-22 10:00:20 EDT
I think I may have a halfway-educated guess at the QXLDOD problem. See "qxldod/QxlDod.cpp", method QxlDod::QueryChildRelations():
      pChildRelations[ChildIndex].ChildCapabilities.Type.VideoOutput.InterfaceTechnology = (DeviceId == 0) ? D3DKMDT_VOT_INTERNAL : D3DKMDT_VOT_HD15;

where

  D3DKMDT_VOT_HD15

    Indicates that the video output device connects to an external display
    device through an HD15 (VGA) connector.

  D3DKMDT_VOT_INTERNAL

    Indicates that the video output device connects internally to a display
    device (for example, the internal connection in a laptop computer).

    This constant value is not a bit-field value. Instead, it's a standalone
    video output type.

https://msdn.microsoft.com/en-us/library/windows/hardware/ff546605%28v=vs.85%29.aspx

Apparently D3DKMDT_VOT_INTERNAL was "always" there in the QXLDOD source; D3DKMDT_VOT_HD15 was added in commit 861b2d2d444f2 ("add multi-monitor support in QXL mode").

Can we use D3DKMDT_VOT_HD15 unconditionally? Or replace D3DKMDT_VOT_INTERNAL with D3DKMDT_VOT_OTHER?

  D3DKMDT_VOT_OTHER

    Indicates that the video output device connects to an external display
    device through a connector that is not one of the types that is indicated
    by the following values in this enumeration.

(Emphasis being on "external display device".)

Thanks!
Comment 22 Laszlo Ersek 2016-04-13 11:46:01 EDT
Vadim, do you have any news? Should I assign this BZ to you? Thanks.
Comment 23 Vadim Rozenfeld 2016-04-14 02:16:49 EDT
(In reply to Laszlo Ersek from comment #22)
> Vadim, do you have any news? Should I assign this BZ to you? Thanks.

It's mine.

Best regards,
Vadim.
Comment 24 lijin 2016-09-14 02:42:03 EDT
this job is excluded after using latest hlk playlist,postpone to rhel7.4
Comment 26 Yu Wang 2016-11-01 02:19:29 EDT
Hi

Use the latest qxl driver(qxlwddm-0.4-1), still failed and error info is the same as before.

Since this job has been excluded after using latest hlk playlist, and it is still failed this time, could we close this bug as it is not in our job list?

Error info:

Start Test 11/2/2016 4:01:54.165 AM System without Internal panel mode requirements 
Error 11/2/2016 4:01:54.165 AM Failure: Expecting 1024 x 768 resolution during boot with no available EDID 
File:   base\test\boot\uefilogo\graphics.c Line: 182 
Error Type:   NT_STATUS 
Error Code:   0xc0000001 
Error Text:   Error 0xc0000001 
End Test 11/2/2016 4:01:54.166 AM System without Internal panel mode requirements 
Result:   Fail 

Thanks
Yu Wang
Comment 27 lijin 2017-05-11 01:37:26 EDT
close this bug as it's removed from latest test suit.

Note You need to log in before you can comment on or make changes to this bug.