Bug 1457662 - Windows guest cannot boot with interrupt remapping (VT-d)
Summary: Windows guest cannot boot with interrupt remapping (VT-d)
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Amnon Ilan
QA Contact: jingzhao
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-01 06:36 UTC by Peter Xu
Modified: 2018-04-11 00:25 UTC (History)
13 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-04-11 00:23:04 UTC


Attachments (Terms of Use)
Proof-of-concept fix (1.99 KB, application/mbox)
2017-06-15 12:22 UTC, Ladi Prosek
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 None None None 2018-04-11 00:25 UTC

Description Peter Xu 2017-06-01 06:36:44 UTC
Description of problem:

Windows guest with interrupt remapping enabled is not bootable. A black window shows:

Your PC needs to restart.
Please hold down the power button.
Error Code: 0x0000005C
Parameters:
0x0000000000007000
0x0000000000000003
0x0000000000000000
0x0000000000000000

Version-Release number of selected component (if applicable):

Guest OS: Windows 2012 R2
Qemu: upstream latest

How reproducible:

100%

Steps to Reproduce:

Boot guest with following parameter:

$bin -M q35,accel=kvm,kernel-irqchip=split -m 4G \
        -snapshot -monitor stdio \
        -device intel-iommu,intremap=on \
        -spice addr=0.0.0.0,port=5900,disable-ticketing \
        /images/windows2012-r2.raw


Actual results:

The guest hangs at boot.

Expected results:

The guest can boot.

Additional info:

QEMU debug info:

(vtd)vtd_reset:
(vtd)vtd_mem_read: addr 0x10 size 4 val 0xf00f1a
(vtd)vtd_mem_read: addr 0x10 size 4 val 0xf00f1a
(vtd)vtd_mem_read: addr 0x38 size 4 val 0x80000000
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x0
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x0
(vtd)vtd_mem_read: addr 0x34 size 4 val 0x0
(vtd)vtd_mem_read: addr 0x34 size 4 val 0x0
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x0
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x0
(vtd)vtd_mem_write: DMAR_IRTA_REG write addr 0xb8, size 8, val 0x190807
(vtd)vtd_mem_read: addr 0xb8 size 8 val 0x190807
(vtd)vtd_mem_read: addr 0xb8 size 4 val 0x190807
(vtd)vtd_mem_read: addr 0xb8 size 8 val 0x190807
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x0
(vtd)vtd_mem_write: DMAR_GCMD_REG write addr 0x18, size 4, val 0x1000000
(vtd)vtd_handle_gcmd_write: value 0x1000000 status 0x0
(vtd)vtd_handle_gcmd_sirtp: set Interrupt Remap Table Pointer
(vtd)vtd_interrupt_remap_table_setup: int remap table addr 0x190000 size 256
(vtd)vtd_mem_read: addr 0x1f size 4 val 0x1
(vtd)vtd_mem_write: DMAR_IQA_REG write addr 0x90, size 8, val 0x10a000
(vtd)vtd_mem_write: DMAR_IQT_REG write addr 0x88, size 8, val 0x20
(vtd)vtd_handle_iqt_write: set iq tail 2
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000
(vtd)vtd_mem_write: DMAR_GCMD_REG write addr 0x18, size 4, val 0x5000000
(vtd)vtd_handle_gcmd_write: value 0x5000000 status 0x1000000
(vtd)vtd_handle_gcmd_qie: Queued Invalidation Enable on
(vtd)vtd_handle_gcmd_qie: error: can't enable Queued Invalidation: tail 2
(vtd)vtd_handle_gcmd_sirtp: set Interrupt Remap Table Pointer
(vtd)vtd_interrupt_remap_table_setup: int remap table addr 0x190000 size 256
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000
(vtd)vtd_mem_read: addr 0x1c size 4 val 0x1000000

Comment 2 Peter Xu 2017-06-01 06:44:08 UTC
From the debug tracing, it's strange that the Windows guest firstly set IQ_TAIL to 2:

(vtd)vtd_handle_iqt_write: set iq tail 2

Then it tries to enable QI:

(vtd)vtd_handle_gcmd_qie: error: can't enable Queued Invalidation: tail 2

While this is forbidden by VT-d spec (chap 6.5.2):

"""
To enable queued invalidations, software must:

Ensure all invalidation requests submitted to hardware through the
register-based invalidation registers are completed. (i.e. no pending
invalidation requests in hardware).

- Initialize the Invalidation Queue Tail Register (see Section 10.4.22)
  to zero.

- Setup the IQ address and size through the Invalidation Queue Address
  Register (see Section 10.4.23).

- Enable the queued invalidation interface through the Global Command
  Register (see Section 10.4.4).

When enabled, hardware sets the QIES field in the Global Status Register (see Section 10.4.5).
"""

I am not familiar with how should we move on with this, since from what I captured it seems a Windows bug.

Peter

Comment 3 Peter Xu 2017-06-01 06:47:18 UTC
Amnon, 

Since it might be a Windows OS bug, do you have suggestion on how we should proceed with it? 

Thanks in adnvance.

Comment 11 Ladi Prosek 2017-06-15 12:20:06 UTC
I was able to reproduce this and can only confirm Peter's analysis in comment 2. It seems to work fine on Windows 10 so reporting this to Microsoft is probably pointless. Windows 10 trace (excerpt):

(vtd)vtd_handle_iqt_write: set iq tail 0
...
(vtd)vtd_handle_gcmd_qie: Queued Invalidation Enable on
(vtd)vtd_handle_gcmd_qie: DMAR_IQA_REG 0x10d000
(vtd)vtd_handle_gcmd_qie: Invalidation Queue addr 0x10d000 size 256
(vtd)vtd_handle_gcmd_sirtp: set Interrupt Remap Table Pointer
(vtd)vtd_interrupt_remap_table_setup: int remap table addr 0x10e000 size 256
...
(vtd)vtd_handle_iqt_write: set iq tail 1


Peter, would you be fine with making QEMU tolerant to the incorrect order of register writes performed by older Windows? As far as I can tell, we currently silently fail the attempt to enable queued invalidations so we're not going to lose anything by being more permissive.

I can successfully boot Windows Server 2012 R2 with this patch:

--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1477,13 +1477,15 @@ static inline bool vtd_queued_inv_disable_check(IntelIOMMUState *s)
            (s->iq_last_desc_type == VTD_INV_DESC_WAIT);
 }
 
+static void vtd_fetch_inv_desc(IntelIOMMUState *s);
+
 static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
 {
     uint64_t iqa_val = vtd_get_quad_raw(s, DMAR_IQA_REG);
 
     VTD_DPRINTF(INV, "Queued Invalidation Enable %s", (en ? "on" : "off"));
     if (en) {
-        if (vtd_queued_inv_enable_check(s)) {
+        if (true) {
             s->iq = iqa_val & VTD_IQA_IQA_MASK;
             /* 2^(x+8) entries */
             s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8);
@@ -1493,9 +1495,12 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
                         s->iq, s->iq_size);
             /* Ok - report back to driver */
             vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_QIES);
-        } else {
-            VTD_DPRINTF(GENERAL, "error: can't enable Queued Invalidation: "
-                        "tail %"PRIu16, s->iq_tail);
+        }
+
+        if (s->iq_tail != 0 &&
+            !(vtd_get_long_raw(s, DMAR_FSTS_REG) & VTD_FSTS_IQE)) {
+                VTD_DPRINTF(INV, "iq tail is already set, processing queue");
+                vtd_fetch_inv_desc(s);
         }
     } else {
         if (vtd_queued_inv_disable_check(s)) {

Comment 12 Ladi Prosek 2017-06-15 12:22 UTC
Created attachment 1288050 [details]
Proof-of-concept fix

Attaching the patch.

Comment 15 Peter Xu 2017-06-16 08:32:01 UTC
(In reply to Ladi Prosek from comment #11)
> I was able to reproduce this and can only confirm Peter's analysis in
> comment 2. It seems to work fine on Windows 10 so reporting this to
> Microsoft is probably pointless. Windows 10 trace (excerpt):
> 
> (vtd)vtd_handle_iqt_write: set iq tail 0
> ...
> (vtd)vtd_handle_gcmd_qie: Queued Invalidation Enable on
> (vtd)vtd_handle_gcmd_qie: DMAR_IQA_REG 0x10d000
> (vtd)vtd_handle_gcmd_qie: Invalidation Queue addr 0x10d000 size 256
> (vtd)vtd_handle_gcmd_sirtp: set Interrupt Remap Table Pointer
> (vtd)vtd_interrupt_remap_table_setup: int remap table addr 0x10e000 size 256
> ...
> (vtd)vtd_handle_iqt_write: set iq tail 1
> 
> 
> Peter, would you be fine with making QEMU tolerant to the incorrect order of
> register writes performed by older Windows? As far as I can tell, we
> currently silently fail the attempt to enable queued invalidations so we're
> not going to lose anything by being more permissive.
> 
> I can successfully boot Windows Server 2012 R2 with this patch:
> 
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1477,13 +1477,15 @@ static inline bool
> vtd_queued_inv_disable_check(IntelIOMMUState *s)
>             (s->iq_last_desc_type == VTD_INV_DESC_WAIT);
>  }
>  
> +static void vtd_fetch_inv_desc(IntelIOMMUState *s);
> +
>  static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
>  {
>      uint64_t iqa_val = vtd_get_quad_raw(s, DMAR_IQA_REG);
>  
>      VTD_DPRINTF(INV, "Queued Invalidation Enable %s", (en ? "on" : "off"));
>      if (en) {
> -        if (vtd_queued_inv_enable_check(s)) {
> +        if (true) {
>              s->iq = iqa_val & VTD_IQA_IQA_MASK;
>              /* 2^(x+8) entries */
>              s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8);
> @@ -1493,9 +1495,12 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s,
> bool en)
>                          s->iq, s->iq_size);
>              /* Ok - report back to driver */
>              vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_QIES);
> -        } else {
> -            VTD_DPRINTF(GENERAL, "error: can't enable Queued Invalidation: "
> -                        "tail %"PRIu16, s->iq_tail);
> +        }
> +
> +        if (s->iq_tail != 0 &&
> +            !(vtd_get_long_raw(s, DMAR_FSTS_REG) & VTD_FSTS_IQE)) {
> +                VTD_DPRINTF(INV, "iq tail is already set, processing
> queue");
> +                vtd_fetch_inv_desc(s);
>          }
>      } else {
>          if (vtd_queued_inv_disable_check(s)) {

Hello, Ladi,

Good to know that this patch works!

I verified it on my buggy windows. It boots well.

However I would suggest we use a property for it, rather than allowing it by default. Like, can we introduce a x-buggy-qtail bit for this? Then old windows users can turn this on if they want to use it.

Also, new Windows users won't be affected as well even without this bit. That's good news.

How do you think?

(If you like, you can post a patch directly for it upstream)

Thanks!

Peter

Comment 16 Ladi Prosek 2017-06-16 08:56:59 UTC
Hi Peter,

(In reply to Peter Xu from comment #15)
> Hello, Ladi,
> 
> Good to know that this patch works!
> 
> I verified it on my buggy windows. It boots well.
> 
> However I would suggest we use a property for it, rather than allowing it by
> default. Like, can we introduce a x-buggy-qtail bit for this? Then old
> windows users can turn this on if they want to use it.

The problem with adding a new property is that we would also need to add support for it to libvirt and RHV/OpenStack and they would need to set it if they know that the guest is running affected versions of Windows.

That's unnecessarily complex. Unless there is a real concern that changing the behavior can break someone, I would advise against it.

I think that in general QEMU often chooses to emulate real hardware's quirks, even if they are in conflict with the spec. I'm assuming that real chips do allow the formally incorrect order of operations, otherwise Windows would be very broken -- would it be easy for us to verify that? Thanks!

> Also, new Windows users won't be affected as well even without this bit.
> That's good news.
> 
> How do you think?
> 
> (If you like, you can post a patch directly for it upstream)
> 
> Thanks!
> 
> Peter

Comment 17 Peter Xu 2017-06-16 09:20:06 UTC
(In reply to Ladi Prosek from comment #16)
> Hi Peter,
> 
> (In reply to Peter Xu from comment #15)
> > Hello, Ladi,
> > 
> > Good to know that this patch works!
> > 
> > I verified it on my buggy windows. It boots well.
> > 
> > However I would suggest we use a property for it, rather than allowing it by
> > default. Like, can we introduce a x-buggy-qtail bit for this? Then old
> > windows users can turn this on if they want to use it.
> 
> The problem with adding a new property is that we would also need to add
> support for it to libvirt and RHV/OpenStack and they would need to set it if
> they know that the guest is running affected versions of Windows.
> 
> That's unnecessarily complex. Unless there is a real concern that changing
> the behavior can break someone, I would advise against it.
> 
> I think that in general QEMU often chooses to emulate real hardware's
> quirks, even if they are in conflict with the spec. I'm assuming that real
> chips do allow the formally incorrect order of operations, otherwise Windows
> would be very broken -- would it be easy for us to verify that? Thanks!

Makes sense to me. Then let's do what you suggested for this one. Thanks,

Peter

Comment 18 Ladi Prosek 2017-06-16 11:47:33 UTC
Posted a polished version of the patch upstream:

https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg03836.html

Comment 19 Ladi Prosek 2017-07-11 09:20:47 UTC
Committed upstream as 8991c460be5a0811194fd4d2b49ba7146a23526b

Comment 21 jingzhao 2017-10-11 07:57:44 UTC
Reproduce the issue on qemu-kvm-rhev-2.9.0-16.el7_4.9 and verified it on qemu-kvm-rhev-2.10.0-1.el7.x86_64

According to above test result, changed the bz to verified

Thanks
Jing

Comment 24 errata-xmlrpc 2018-04-11 00:23:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.