Bug 1314306 - memory hotplug: kernel fails to online some memory sections as movable
memory hotplug: kernel fails to online some memory sections as movable
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.2
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Andrea Arcangeli
Yumei Huang
:
Depends On:
Blocks: 1320534 1228543
  Show dependency treegraph
 
Reported: 2016-03-03 06:41 EST by Igor Mammedov
Modified: 2018-02-14 15:23 EST (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
PATCH add removable memory sections to ZONE_MOVABLE by default. (4.25 KB, patch)
2016-04-12 06:11 EDT, Igor Mammedov
no flags Details | Diff

  None (edit)
Description Igor Mammedov 2016-03-03 06:41:04 EST
Description of problem:

when memory hot-plugged kernel onlines only some last memory sections
of hoplugged range if udev is configured to online them as movable.

Version-Release number of selected component (if applicable):
kernel-3.10.0-356
systemd-219-19

How reproducible:
100%

Steps to Reproduce:
1. run RHEL7.2 guest and modify /usr/lib/udev/rules.d/40-redhat.rule as follows:
   from:
SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
to:
SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online_movable"

  then install the latest kernel or update initrd image by running: dracut -f
  configure kernel to use serial as console and shutdown guest

On host, install qemu-kvm-rhev-2.3.0-31.el7_2.8 or later

2. start guest with following CLI:

/usr/libexec/qemu-kvm -enable-kvm -m 2G,slots=16,maxmem=16G -smp 2 -numa node -object memory-backend-ram,id=m1,size=1G -device pc-dimm,id=d1,memdev=m1 -nographic rhel72-image-file

Actual results:
in guest:
#cat /sys/devices/system/memory/memory32/state
offline

and it's the same till
#cat /sys/devices/system/memory/memory37/state
offline

and only sections 38, 39 are onlined

It's possible to online sections as movable manually only if it's done
in reverse order, like:
echo online_movable > /sys/devices/system/memory/memory37/state
echo online_movable > /sys/devices/system/memory/memory36/state
...

if onlining is done out of that order is fails, however section could be
onlined successfully in any order if it's onlined as not movable i.e.:

echo online > /sys/devices/system/memory/memory32/state

Expected results:

memory sections should be onlined as movable successfully in any order,
i.e. the same behaviour when 'online' is echoed into 'state'.

Additional info:

Upstream kernel also broken the same way.
Comment 2 Milan Zamazal 2016-03-07 11:11:08 EST
FWIW, online_movable works in any order on memory blocks from memory0 to memory31. From memory32 up, the reported problem is exhibited.
Comment 7 Igor Mammedov 2016-03-30 10:35:36 EDT
Issue is not virt specific, baremetal also should be affected, to reproduce
one need physically hotplug a memory module. (i.e. prereq to trigger issue
is that memory module is not present at boot time).

Here is what happens:

  echo online_movable > memoryXX/online

triggers following code path:

memory_subsys_online -> memory_block_change_state -> memory_block_action ->
      online_pages():
        if (online_type == MMOP_ONLINE_MOVABLE &&
            zone_idx(zone) == ZONE_MOVABLE - 1) {
                if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages))
                        return -EINVAL;
        }

where move_pfn_range_right() fails following check:

        /* the move out part mast at the right most of @z1 */
        if (zone_end_pfn(z1) >  end_pfn)
                goto out_fail;

since we are trying to online as movable not the last section in
ZONE_NORMAL.

Here is what makes hotplugged memory end up in ZONE_NORMAL:
 acpi_memory_enable_device() -> add_memory -> add_memory_resource ->
   -> arch/x86/mm/init_64.c  
     :w
/*
      * Memory is added always to NORMAL zone. This means you will never get
      * additional DMA/DMA32 memory.
      */

     int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
     {
        struct pglist_data *pgdat = NODE_DATA(nid);
        struct zone *zone = pgdat->node_zones +
                zone_for_memory(nid, start, size, ZONE_NORMAL, for_device);

i.e. all hot-plugged memory modules always go to ZONE_NORMAL.
Comment 10 Igor Mammedov 2016-04-12 06:08:54 EDT
Issue here is not in current movable zone design and that it design allows to move only memory section on border between zones.

Issue is that if movable zone doesn't exists in hotplug time, zone_for_memory()
defaults to ZONE_NORMAL. As result all all hotplugged memory sections go to
ZONE_NORMAL except the last one for which valid_zones == "NORMAL MOVABLE". i.e.
it is also in ZONE_NORMAL but could be moved to ZONE_MOVABLE.

Idea I've have had to fix it is to allow caller of add_memory() to specify default zone. That way ACPI hotplug acpi_memory_enable_device() could pick
to which zone hotplugged memory goes in.
Like if corresponding ACPI node is removable /has _EJ0 method/, pass to ad_memory() as default ZONE_MOVABLE, if hotplugged memory is not removable
set default to ZONE_NORMAL. That way removable memory goes by default to
ZONE_MOVABLE and onlining with echo [online|online_movable] > memoryXX/online makes it go to ZONE_MOVABLE by default without failing onlining memory sections.

Since I've been preemted and don't have time to work on this bug in near future, I'll attach WIP patch that does above and fixes this particular issue.
Whoever picks this up post something upstream pls CC me on it so I could help with review.
Comment 11 Igor Mammedov 2016-04-12 06:11 EDT
Created attachment 1146332 [details]
PATCH add removable memory sections to ZONE_MOVABLE by default.

it's just a proof of concept [i.e. on t ready to post upstream as is]

Note You need to log in before you can comment on or make changes to this bug.