Bug 1441938
Summary: | When boot windows guest with two numa nodes and pc-dimm assigned to the second node, the dimm cannot be recognized by the guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yumei Huang <yuhuang> |
Component: | qemu-kvm-rhev | Assignee: | Virtualization Maintenance <virt-maint> |
Status: | CLOSED ERRATA | QA Contact: | Yumei Huang <yuhuang> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.4 | CC: | chayang, imammedo, jinzhao, juzhang, knoel, michen, mrezanin, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.10.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 00:16:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1473046 |
Description
Yumei Huang
2017-04-13 06:55:39 UTC
I can reproduce this with Windows Server 2016 as well. Reverting the following commit (thanks Igor for the pointer!) fixes it for 2016 but not for 2008R2 and 2012R. In fact, it makes older Windows even more broken - assigning the DIMM to node 0 then doesn't work either. commit cec65193d41099519f14fb744440eeabbfa6e4e3 Author: Igor Mammedov <imammedo> Date: Mon Jun 2 15:25:28 2014 +0200 pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole Needed for Windows to use hotplugged memory device, otherwise it complains that server is not configured for memory hotplug. Tests shows that aftewards it uses dynamically provided proximity value from _PXM() method if available. So if nothing else, we could add a switch to disable the old workaround because whatever was broken in Windows before is now fixed and the workaround is counterproductive. I'll see if I can debug the relevant Windows code and check what Hyper-V does. Maybe we can come up with something that would work for all Windows versions. Thinking more about cec65193d, I recall that it's used in linux too, if guest has been started with less then 4G of present memory: x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN so we can't just remove it. Thanks, that makes sense. Reading the ACPI spec, I wonder if the most correct thing to do isn't declaring multiple MEM_AFFINITY_HOTPLUGGABLE regions, one for each NUMA node and then plugging DIMMs into the respective region at run-time. I understand that it would require changes to the current model, though, and would have its drawbacks. I'm having trouble understanding Windows ACPI internals due to indirections and asynchrony. Windows reads the acpi-mem-hotplug region in a very generic looking routine running in a separate kernel thread. ACPI!WriteSystemIO+0x1c842 ACPI!AccessBaseField+0x236 ACPI!WriteFieldObj+0x14e ACPI!RunContext+0x1e0 ACPI!InsertReadyQueue+0x403 ACPI!RestartCtxtPassive+0x2f ACPI!ACPIWorkerThread+0xed nt_fffff80084285000!PspSystemThreadStartup+0x41 nt_fffff80084285000!KiStartSystemThread+0x16 However, I have confirmed that this trivial change: diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index afcadac..9e56d4e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2411,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) if (hotplugabble_address_space_size) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, pcms->hotplug_memory.base, - hotplugabble_address_space_size, 0, + hotplugabble_address_space_size, pcms->numa_nodes - 1, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); } Fixes this BZ on all OSes I've tested - 2008R2, 2012R2, 2016. And also fixes hot-plug on 2016 and 2012R, no longer requiring the first DIMM to be plugged into node 0. 2008R2 is still sensitive to the hot-plug order and this changes is not making it any worse. Igor, do you see any problem with declaring the MEM_AFFINITY_HOTPLUGGABLE region for the last node instead of first? if above won't break case in comment 4 and makes windows happier, then I'm fine with it. Fix posted: https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg05467.html I have verified that Linux enables SWIOTLB just like before. The fix has been merged upstream as ede24a0 pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry Verify: qemu-kvm-rhev-2.10.0-4.el7 kernel-3.10.0-765.el7.x86_64 Test with win2016, win2012r2, win2008r2 guest, whenever the dimm is assigned to the first node or the second node in qemu cmdline, it can be recognized by guest. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |