Bug 1377155

Summary: "HID Button over Interrupt Driver" not working properly in windows10 and win2016
Product: Red Hat Enterprise Linux 7 Reporter: lijin <lijin>
Component: qemu-kvm-rhevAssignee: Yan Vugenfirer <yvugenfi>
Status: CLOSED WORKSFORME QA Contact: FuXiangChun <xfu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: ailan, chayang, craigcrawford1988, hhuang, imammedo, jan.public, juzhang, knoel, lersek, lijin, michen, ovasik, pbonzini, redhat-bugzilla, rvkagan, virt-maint, yvugenfi, zhguo
Target Milestone: rcKeywords: Regression, Reopened, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
On early versions of Windows 10 and Windows Server 2016 there is Microsofts bug that wrongly installs "HID Button over interrupt driver" over ACPI0010 device that is used as CPU container descriptor. This behavior will interfere with CPU hot-add feature. On later versions of Windows (Windows 10 release 1803 and Windows Server 2019 preview build 10.0.17623.0) Microsoft fixed this bug.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-16 15:42:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1558351, 1445603, 1473046    
Attachments:
Description Flags
HID Button over Interrupt Driver on windows 2016
none
ACPI0010 is no longer wrongly identified as HID button none

Description lijin 2016-09-19 01:14:37 UTC
Created attachment 1202283 [details]
HID Button over Interrupt Driver on windows 2016

Description of problem:
"HID Button over Interrupt Driver" device in win10 and win2016 not working properly(code31) which make rhel7.3 svvp test fail.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-20.el7.x86_64&qemu-kvm-rhev-2.6.0-25.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot win10 or win2016 guest:
/usr/libexec/qemu-kvm -name 122BLKWIN2016 -enable-kvm -m 4G -smp 4 -nodefconfig -nodefaults -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:4c:23:6d:a7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:1 -vga cirrus -cdrom virtio-win-prewhql-126.iso -monitor stdio -qmp tcp:0:4445,server,nowait -fda virtio-win-prewhql-126.vfd -object iothread,id=thread0 -drive file=win2016-iso.raw,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device virtio-blk-pci,iothread=thread0,drive=drive-ide0-0-0,id=ide0-0-0

2.check in device manager:


Actual results:
"HID Button over Interrupt Driver" device is marked an yellow exclamation´╝îthe device status is "The device is not working properly because Windows cannot load the drivers required for this device(Code31)".please check the attachment.

THis device leads to rhel7.3 svvp test failure,so I set test blocker keyword.

Expected results:
the device can work normally

Additional info:
rhel7.2 qemu-kvm-rhev-2.3.0-31.el7.x86_64 does NOT hit this issue;
qemu-kvm-rhev-2.6.0-16.el7.x86_64 does NOT hit this issue

Comment 3 juzhang 2016-09-20 01:14:49 UTC
Hi Lijin,

Could you reply comment2?

Best Regards,
Junyi

Comment 5 Igor Mammedov 2016-09-20 08:48:28 UTC
Issue is caused by ACPI0010 device which is by spec "Processor container device".
It's a regression in Windows10/2016.

Looks like MS is trying to extend their acpi impl. to support more recent ACPI specs (which is good thing) but they made a typo and ACPI0011 "Generic Buttons Device" became ACPI0010.


Windows 8 and older doesn't have this issue as they do not have ACPI0010 description in their INF files and treat the device node according to _CID field PNP0A05 "generic container device".

Amnon,

Is there a way to escalate bug to MS get it fixed there instead of trying to
workaround it in QEMU?

Comment 12 Yan Vugenfirer 2016-10-05 14:05:24 UTC
(In reply to Igor Mammedov from comment #11)
> (In reply to Yan Vugenfirer from comment #10)
> > Using MS instructions the device can be removed and the test pass.
> 
> Care put instructions here just for the record?

According to MS the solution is to remove the device:

1. Uninstall the existed driver package for this device, if can check delete button, please check it too.

2. Download the PsTools from
https://technet.microsoft.com/en-us/sysinternals/pstools.aspx

3. Unzip PsTools, and locate the folder of PsTools, then input in CMD (Run as Administrator):

psexec -s -i regedit.exe

4. Delete subkey ACPI0010 under HKLM\SYSTEM\DriverDataBase\DriverPackage\hiinterrupt.inf_x86_<some instance ID>\Descriptors\ACPI0010

And delete subkey "bidinterrupt.inf" under HKLM\SYSTEM\DriverDataBase\DeviceIds\ACPI\ACPI0010

5. Rescan device to see the effect.

Comment 15 Michael 2016-11-17 10:05:39 UTC
I'm using QEMU/KVM in UNRAID - release here http://lime-technology.com/forum/index.php?topic=53689.0

The issue described above exists in the latest release candidate (RC4) of this product. Since RC4 is using latest linux/qemu builds available, is this still a 'known' issue? Is there anything I can do in Windows 10 to get the driver loaded correctly?

Please advise.

Comment 17 Yan Vugenfirer 2016-11-17 14:32:14 UTC
(In reply to Michael from comment #15)
> I'm using QEMU/KVM in UNRAID - release here
> http://lime-technology.com/forum/index.php?topic=53689.0
> 
> The issue described above exists in the latest release candidate (RC4) of
> this product. Since RC4 is using latest linux/qemu builds available, is this
> still a 'known' issue? Is there anything I can do in Windows 10 to get the
> driver loaded correctly?
> 
> Please advise.

Check comment #12 for the solution:

According to MS the solution is to remove the device:

1. Uninstall the existed driver package for this device, if can check delete button, please check it too.

2. Download the PsTools from
https://technet.microsoft.com/en-us/sysinternals/pstools.aspx

3. Unzip PsTools, and locate the folder of PsTools, then input in CMD (Run as Administrator):

psexec -s -i regedit.exe

4. Delete subkey ACPI0010 under HKLM\SYSTEM\DriverDataBase\DriverPackage\hiinterrupt.inf_x86_<some instance ID>\Descriptors\ACPI0010

And delete subkey "bidinterrupt.inf" under HKLM\SYSTEM\DriverDataBase\DeviceIds\ACPI\ACPI0010

5. Rescan device to see the effect.

Comment 18 Michael 2016-11-18 14:44:23 UTC
thank you for the fix. Problem is now resolved.

Comment 19 Igor Mammedov 2016-12-22 13:14:29 UTC
Amnon,

Could you help to escalate the issue to Microsoft and so they fix it
as that regression breaks cpu-hotplug on qemu-2.7 and later with
Windows 10/Windows Server 2016.


MS released WS20016 with know issue and
Here is the first report from early adopters that start hitting it:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03049.html

But it should affect us as well since 7.3.

Reopening BZ as issue hasn't been fixed.

Comment 20 Yan Vugenfirer 2016-12-27 10:57:43 UTC
(In reply to Igor Mammedov from comment #19)
> Amnon,
> 
> Could you help to escalate the issue to Microsoft and so they fix it
> as that regression breaks cpu-hotplug on qemu-2.7 and later with
> Windows 10/Windows Server 2016.
> 
> 
> MS released WS20016 with know issue and
> Here is the first report from early adopters that start hitting it:
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03049.html
> 
> But it should affect us as well since 7.3.
> 
> Reopening BZ as issue hasn't been fixed.

But should it be used for hot plug?

https://patchwork.kernel.org/patch/8337331/

Can you please prepare detailed description how ACPI0010 is used for hot plug support in QEMU, so we will have the material for discussion with MS?

Comment 21 Igor Mammedov 2016-12-27 12:29:54 UTC
(In reply to Yan Vugenfirer from comment #20)
> (In reply to Igor Mammedov from comment #19)
> > Amnon,
> > 
> > Could you help to escalate the issue to Microsoft and so they fix it
> > as that regression breaks cpu-hotplug on qemu-2.7 and later with
> > Windows 10/Windows Server 2016.
> > 
> > 
> > MS released WS20016 with know issue and
> > Here is the first report from early adopters that start hitting it:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03049.html
> > 
> > But it should affect us as well since 7.3.
> > 
> > Reopening BZ as issue hasn't been fixed.
> 
> But should it be used for hot plug?
> 
> https://patchwork.kernel.org/patch/8337331/
> 
> Can you please prepare detailed description how ACPI0010 is used for hot
> plug support in QEMU, so we will have the material for discussion with MS?

ACPI0010 container isn't used for hotplug itself (it's static) but it contains processor devices, which are present or/and could hot(un)plugged.
Problem we are facing is that incorrect ACPI0010 entry in hiinterrupt driver description makes Windows ignore devices contained by ACPI0010 container (cpus). Later ACPI0010 container(s) will be used more extensively for topology description purposes and we will have the same issue with Windows for ARM.

Manually removing wrong driver binding as per comment 17 and rebooting helps to workaround the issue. But things should just work by default, as they do with earlier Windows releases, to avoid regression reports from customers who would see hotplug not working anymore for no apparent reasons.

Comment 22 Yan Vugenfirer 2016-12-28 13:12:34 UTC
(In reply to Igor Mammedov from comment #21)
> (In reply to Yan Vugenfirer from comment #20)
> > (In reply to Igor Mammedov from comment #19)
> > > Amnon,
> > > 
> > > Could you help to escalate the issue to Microsoft and so they fix it
> > > as that regression breaks cpu-hotplug on qemu-2.7 and later with
> > > Windows 10/Windows Server 2016.
> > > 
> > > 
> > > MS released WS20016 with know issue and
> > > Here is the first report from early adopters that start hitting it:
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03049.html
> > > 
> > > But it should affect us as well since 7.3.
> > > 
> > > Reopening BZ as issue hasn't been fixed.
> > 
> > But should it be used for hot plug?
> > 
> > https://patchwork.kernel.org/patch/8337331/
> > 
> > Can you please prepare detailed description how ACPI0010 is used for hot
> > plug support in QEMU, so we will have the material for discussion with MS?
> 
> ACPI0010 container isn't used for hotplug itself (it's static) but it
> contains processor devices, which are present or/and could hot(un)plugged.
> Problem we are facing is that incorrect ACPI0010 entry in hiinterrupt driver
> description makes Windows ignore devices contained by ACPI0010 container
> (cpus). Later ACPI0010 container(s) will be used more extensively for
> topology description purposes and we will have the same issue with Windows
> for ARM.
> 
> Manually removing wrong driver binding as per comment 17 and rebooting helps
> to workaround the issue. But things should just work by default, as they do
> with earlier Windows releases, to avoid regression reports from customers
> who would see hotplug not working anymore for no apparent reasons.

But if the ACPI0010 is only static description of the topology, are we sure the the hot plug is not working because of it? Should't this miss-behaviour  cause wrong amount of CPUs in the system in the first place?

Comment 23 Igor Mammedov 2016-12-29 09:11:22 UTC
(In reply to Yan Vugenfirer from comment #22)
> (In reply to Igor Mammedov from comment #21)
> > (In reply to Yan Vugenfirer from comment #20)
> > > (In reply to Igor Mammedov from comment #19)
> > > > Amnon,
> > > > 
> > > > Could you help to escalate the issue to Microsoft and so they fix it
> > > > as that regression breaks cpu-hotplug on qemu-2.7 and later with
> > > > Windows 10/Windows Server 2016.
> > > > 
> > > > 
> > > > MS released WS20016 with know issue and
> > > > Here is the first report from early adopters that start hitting it:
> > > > https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03049.html
> > > > 
> > > > But it should affect us as well since 7.3.
> > > > 
> > > > Reopening BZ as issue hasn't been fixed.
> > > 
> > > But should it be used for hot plug?
> > > 
> > > https://patchwork.kernel.org/patch/8337331/
> > > 
> > > Can you please prepare detailed description how ACPI0010 is used for hot
> > > plug support in QEMU, so we will have the material for discussion with MS?
> > 
> > ACPI0010 container isn't used for hotplug itself (it's static) but it
> > contains processor devices, which are present or/and could hot(un)plugged.
> > Problem we are facing is that incorrect ACPI0010 entry in hiinterrupt driver
> > description makes Windows ignore devices contained by ACPI0010 container
> > (cpus). Later ACPI0010 container(s) will be used more extensively for
> > topology description purposes and we will have the same issue with Windows
> > for ARM.
> > 
> > Manually removing wrong driver binding as per comment 17 and rebooting helps
> > to workaround the issue. But things should just work by default, as they do
> > with earlier Windows releases, to avoid regression reports from customers
> > who would see hotplug not working anymore for no apparent reasons.
> 
> But if the ACPI0010 is only static description of the topology, are we sure
> the the hot plug is not working because of it?
Yep, I'm sure. If I remove wrong HID driver binding then hotplug works as expected. (i.e. Windows sees CPUs in Device manager and is able to hot(un)plug them).

Alternatively moving Processor devices out of ACPI0010 container scope would also help but then we won't be able to use ACPI0010 for layout description.
Hence I'm not willing to 'fix' QEMU behaving per spec to workaround Win10 bug.

> Should't this miss-behaviour 
> cause wrong amount of CPUs in the system in the first place?
There is MADT ACPI table that enumerate cpus present at boot so Windows onlines
boot cpus using it. However to be operational hotplug requires Processor devices in DSDT with _STA and _EJ0 methods. And these devices are inside of ACPI0010 container. It looks to me as the wrong HID driver binding causes Windows not to see processor devices and therefore breaks CPU hotplug.

Comment 29 lijin 2017-04-19 08:00:03 UTC
following two rhel7.4 svvp job will fail if do not uninstall the device as comment#12:
System - Sleep with IO Before and After (Reliability SysFund)
System - PNP (disable and enable) with IO Before and After (Reliability)

The error message is:
WDTF_SIMPLE_IO : - Open(HID Button over Interrupt Driver ACPI\ACPI0010\2&DABA3FF&1 ) Failed : Device is reporting a problem code (Status Flags=0x1802400 (DN_HAS_PROBLEM DN_DISABLEABLE DN_NT_ENUMERATOR DN_NT_DRIVER) Problem Code=1f (CM_PROB_FAILED_ADD)) HRESULT=0x80004005

Comment 31 Roman Kagan 2017-05-02 17:15:02 UTC
FWIW we (@virtuozzo.com) are also hitting this.

Some more data points:

1) hidinterrupt driver takes over (even though it fails) ACPI0010 node and thus hides its children -- the CPUs -- from the regular ACPI enumeration.  So there are no processors in the Windows device tree (see Device Manager, like on the original screenshot, or "!devnode 0 1" in the debugger)

2) hidinterrupt.inf as shipped with Windows 2016 has both ACPI0011 and ACPI0010; the record for the latter is preceded with a comment "This Id is not to be used. It will be removed once everyone has stopped using it."  So I guess the typo was not in the driver but in the ACPI tables of some device(s) which the driver wanted to support despite the bug.

Comment 33 Craig 2017-07-28 12:51:20 UTC
Device manager -> Right click HID Button over Interrupt Driver -> Click Update Driver Software...

Click "Browse my computer for driver software"

Click "Let me pick from a list of device drivers on my computer"

Select "Generic Bus" and click Next.

Click Close and the device should now be operating fine.

Comment 34 Craig 2017-07-28 12:58:08 UTC
Generic Bus driver: http://imgur.com/a/bpagt

Comment 38 Ladi Prosek 2017-11-21 14:43:09 UTC
(In reply to Igor Mammedov from comment #19)
> Amnon,
> 
> Could you help to escalate the issue to Microsoft and so they fix it
> as that regression breaks cpu-hotplug on qemu-2.7 and later with
> Windows 10/Windows Server 2016.

I haven't been able to get cpu-hotplug work on WS2016. I tried earlier QEMU without the problematic ACPI device as well as recent ones plus the workaround in #12.

The VM reboots right after issuing the cpu-add command, no BSOD, no apparent error anywhere.

This is odd. If I remember correctly, Roman and Denis of Virtuozzo have indicated that hotplug was broken. On the other hand, it looks like people have been doing it [1].

I'm posting this in the hope that we establish whether it really is broken and this "HID Button over Interrupt" issue has been masking another bug, or it works but requires maybe a specific steps which I failed to follow.

My setup:
Windows Server 2016 (tried both Standard and Datacenter)
qemu with -smp 2,maxcpus=8,sockets=8,cores=1,threads=1

Then:
(qemu) cpu-add 2
and Windows crashes.


[1] https://github.com/virtio-win/kvm-guest-drivers-windows/issues/195

Comment 39 Ladi Prosek 2017-11-21 15:17:25 UTC
(In reply to Ladi Prosek from comment #38)
> I'm posting this in the hope that we establish whether it really is broken
> and this "HID Button over Interrupt" issue has been masking another bug, or
> it works but requires maybe a specific steps which I failed to follow.

Mystery solved, cpu-hotplug works only with >2GB of RAM. It is tracked in bug 1445603. Thank you Yan for the pointer!

Comment 40 Laszlo Ersek 2017-11-21 20:07:29 UTC
FWIW, also don't expect VCPU hotplug to work with OVMF (when the latter is built with -D SMM_REQUIRE, which is the only thing we do in RHEL7). See bug 1454803.

Comment 44 Robert Scheck 2018-05-07 10:17:13 UTC
Not sure if it helps somehow, but with Windows 10, Version 1703 and 1709
it still claimed that "HID Button over Interrupt Driver" is "not working
properly", but since updating to 1803, the issue disappeared by itself.

Comment 45 Yan Vugenfirer 2018-05-16 15:09:10 UTC
(In reply to Robert Scheck from comment #44)
> Not sure if it helps somehow, but with Windows 10, Version 1703 and 1709
> it still claimed that "HID Button over Interrupt Driver" is "not working
> properly", but since updating to 1803, the issue disappeared by itself.

Thanks!

Indeed it looks that MS fixed the bug. The entry in machine.inf is:

%ACPI0010_Desc% = NO_DRV_X_PNP, *ACPI0010     ; Generic ACPI Processor Container Device/

Comment 46 Yan Vugenfirer 2018-05-16 15:10:47 UTC
Created attachment 1437425 [details]
ACPI0010 is no longer wrongly identified as HID button

In Windows 10 version 1803 - ACPI0010 is no longer wrongly identified as HID button

Comment 47 Yan Vugenfirer 2018-05-16 15:42:26 UTC
Same in Windows Server 2019 preview (10.0.17623.0)