Hide Forgot
Description of problem: when memory hot-plugged kernel onlines only some last memory sections of hoplugged range if udev is configured to online them as movable. Version-Release number of selected component (if applicable): kernel-3.10.0-356 systemd-219-19 How reproducible: 100% Steps to Reproduce: 1. run RHEL7.2 guest and modify /usr/lib/udev/rules.d/40-redhat.rule as follows: from: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online" to: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online_movable" then install the latest kernel or update initrd image by running: dracut -f configure kernel to use serial as console and shutdown guest On host, install qemu-kvm-rhev-2.3.0-31.el7_2.8 or later 2. start guest with following CLI: /usr/libexec/qemu-kvm -enable-kvm -m 2G,slots=16,maxmem=16G -smp 2 -numa node -object memory-backend-ram,id=m1,size=1G -device pc-dimm,id=d1,memdev=m1 -nographic rhel72-image-file Actual results: in guest: #cat /sys/devices/system/memory/memory32/state offline and it's the same till #cat /sys/devices/system/memory/memory37/state offline and only sections 38, 39 are onlined It's possible to online sections as movable manually only if it's done in reverse order, like: echo online_movable > /sys/devices/system/memory/memory37/state echo online_movable > /sys/devices/system/memory/memory36/state ... if onlining is done out of that order is fails, however section could be onlined successfully in any order if it's onlined as not movable i.e.: echo online > /sys/devices/system/memory/memory32/state Expected results: memory sections should be onlined as movable successfully in any order, i.e. the same behaviour when 'online' is echoed into 'state'. Additional info: Upstream kernel also broken the same way.
FWIW, online_movable works in any order on memory blocks from memory0 to memory31. From memory32 up, the reported problem is exhibited.
Issue is not virt specific, baremetal also should be affected, to reproduce one need physically hotplug a memory module. (i.e. prereq to trigger issue is that memory module is not present at boot time). Here is what happens: echo online_movable > memoryXX/online triggers following code path: memory_subsys_online -> memory_block_change_state -> memory_block_action -> online_pages(): if (online_type == MMOP_ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) { if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) return -EINVAL; } where move_pfn_range_right() fails following check: /* the move out part mast at the right most of @z1 */ if (zone_end_pfn(z1) > end_pfn) goto out_fail; since we are trying to online as movable not the last section in ZONE_NORMAL. Here is what makes hotplugged memory end up in ZONE_NORMAL: acpi_memory_enable_device() -> add_memory -> add_memory_resource -> -> arch/x86/mm/init_64.c :w /* * Memory is added always to NORMAL zone. This means you will never get * additional DMA/DMA32 memory. */ int arch_add_memory(int nid, u64 start, u64 size, bool for_device) { struct pglist_data *pgdat = NODE_DATA(nid); struct zone *zone = pgdat->node_zones + zone_for_memory(nid, start, size, ZONE_NORMAL, for_device); i.e. all hot-plugged memory modules always go to ZONE_NORMAL.
Issue here is not in current movable zone design and that it design allows to move only memory section on border between zones. Issue is that if movable zone doesn't exists in hotplug time, zone_for_memory() defaults to ZONE_NORMAL. As result all all hotplugged memory sections go to ZONE_NORMAL except the last one for which valid_zones == "NORMAL MOVABLE". i.e. it is also in ZONE_NORMAL but could be moved to ZONE_MOVABLE. Idea I've have had to fix it is to allow caller of add_memory() to specify default zone. That way ACPI hotplug acpi_memory_enable_device() could pick to which zone hotplugged memory goes in. Like if corresponding ACPI node is removable /has _EJ0 method/, pass to ad_memory() as default ZONE_MOVABLE, if hotplugged memory is not removable set default to ZONE_NORMAL. That way removable memory goes by default to ZONE_MOVABLE and onlining with echo [online|online_movable] > memoryXX/online makes it go to ZONE_MOVABLE by default without failing onlining memory sections. Since I've been preemted and don't have time to work on this bug in near future, I'll attach WIP patch that does above and fixes this particular issue. Whoever picks this up post something upstream pls CC me on it so I could help with review.
Created attachment 1146332 [details] PATCH add removable memory sections to ZONE_MOVABLE by default. it's just a proof of concept [i.e. on t ready to post upstream as is]