Bug 1314306

Summary: memory hotplug: kernel fails to online some memory sections as movable
Product: Red Hat Enterprise Linux 7 Reporter: Igor Mammedov <imammedo>
Component: kernelAssignee: Andrea Arcangeli <aarcange>
kernel sub component: Memory Management QA Contact: Yumei Huang <yuhuang>
Status: CLOSED NEXTRELEASE Docs Contact:
Severity: high    
Priority: high CC: aarcange, chayang, dkelson, hhuang, imammedo, jniu, juzhang, knoel, liwan, lwoodman, mgoldboi, michal.skrivanek, msekleta, mtessun, mzamazal, pagupta, pbunyan, thuth, wdavis, xfu, ylavi, yuhuang
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-07 22:13:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1228543, 1320534    
Attachments:
Description Flags
PATCH add removable memory sections to ZONE_MOVABLE by default. none

Description Igor Mammedov 2016-03-03 11:41:04 UTC
Description of problem:

when memory hot-plugged kernel onlines only some last memory sections
of hoplugged range if udev is configured to online them as movable.

Version-Release number of selected component (if applicable):
kernel-3.10.0-356
systemd-219-19

How reproducible:
100%

Steps to Reproduce:
1. run RHEL7.2 guest and modify /usr/lib/udev/rules.d/40-redhat.rule as follows:
   from:
SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
to:
SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online_movable"

  then install the latest kernel or update initrd image by running: dracut -f
  configure kernel to use serial as console and shutdown guest

On host, install qemu-kvm-rhev-2.3.0-31.el7_2.8 or later

2. start guest with following CLI:

/usr/libexec/qemu-kvm -enable-kvm -m 2G,slots=16,maxmem=16G -smp 2 -numa node -object memory-backend-ram,id=m1,size=1G -device pc-dimm,id=d1,memdev=m1 -nographic rhel72-image-file

Actual results:
in guest:
#cat /sys/devices/system/memory/memory32/state
offline

and it's the same till
#cat /sys/devices/system/memory/memory37/state
offline

and only sections 38, 39 are onlined

It's possible to online sections as movable manually only if it's done
in reverse order, like:
echo online_movable > /sys/devices/system/memory/memory37/state
echo online_movable > /sys/devices/system/memory/memory36/state
...

if onlining is done out of that order is fails, however section could be
onlined successfully in any order if it's onlined as not movable i.e.:

echo online > /sys/devices/system/memory/memory32/state

Expected results:

memory sections should be onlined as movable successfully in any order,
i.e. the same behaviour when 'online' is echoed into 'state'.

Additional info:

Upstream kernel also broken the same way.

Comment 2 Milan Zamazal 2016-03-07 16:11:08 UTC
FWIW, online_movable works in any order on memory blocks from memory0 to memory31. From memory32 up, the reported problem is exhibited.

Comment 7 Igor Mammedov 2016-03-30 14:35:36 UTC
Issue is not virt specific, baremetal also should be affected, to reproduce
one need physically hotplug a memory module. (i.e. prereq to trigger issue
is that memory module is not present at boot time).

Here is what happens:

  echo online_movable > memoryXX/online

triggers following code path:

memory_subsys_online -> memory_block_change_state -> memory_block_action ->
      online_pages():
        if (online_type == MMOP_ONLINE_MOVABLE &&
            zone_idx(zone) == ZONE_MOVABLE - 1) {
                if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages))
                        return -EINVAL;
        }

where move_pfn_range_right() fails following check:

        /* the move out part mast at the right most of @z1 */
        if (zone_end_pfn(z1) >  end_pfn)
                goto out_fail;

since we are trying to online as movable not the last section in
ZONE_NORMAL.

Here is what makes hotplugged memory end up in ZONE_NORMAL:
 acpi_memory_enable_device() -> add_memory -> add_memory_resource ->
   -> arch/x86/mm/init_64.c  
     :w
/*
      * Memory is added always to NORMAL zone. This means you will never get
      * additional DMA/DMA32 memory.
      */

     int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
     {
        struct pglist_data *pgdat = NODE_DATA(nid);
        struct zone *zone = pgdat->node_zones +
                zone_for_memory(nid, start, size, ZONE_NORMAL, for_device);

i.e. all hot-plugged memory modules always go to ZONE_NORMAL.

Comment 10 Igor Mammedov 2016-04-12 10:08:54 UTC
Issue here is not in current movable zone design and that it design allows to move only memory section on border between zones.

Issue is that if movable zone doesn't exists in hotplug time, zone_for_memory()
defaults to ZONE_NORMAL. As result all all hotplugged memory sections go to
ZONE_NORMAL except the last one for which valid_zones == "NORMAL MOVABLE". i.e.
it is also in ZONE_NORMAL but could be moved to ZONE_MOVABLE.

Idea I've have had to fix it is to allow caller of add_memory() to specify default zone. That way ACPI hotplug acpi_memory_enable_device() could pick
to which zone hotplugged memory goes in.
Like if corresponding ACPI node is removable /has _EJ0 method/, pass to ad_memory() as default ZONE_MOVABLE, if hotplugged memory is not removable
set default to ZONE_NORMAL. That way removable memory goes by default to
ZONE_MOVABLE and onlining with echo [online|online_movable] > memoryXX/online makes it go to ZONE_MOVABLE by default without failing onlining memory sections.

Since I've been preemted and don't have time to work on this bug in near future, I'll attach WIP patch that does above and fixes this particular issue.
Whoever picks this up post something upstream pls CC me on it so I could help with review.

Comment 11 Igor Mammedov 2016-04-12 10:11:36 UTC
Created attachment 1146332 [details]
PATCH add removable memory sections to ZONE_MOVABLE by default.

it's just a proof of concept [i.e. on t ready to post upstream as is]