Bug 1198497

Summary: virsh numatune does not move the already allocated memory
Product: Red Hat Enterprise Linux 6 Reporter: Martin Tessun <mtessun>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.8CC: dyuan, dzheng, honzhang, jdenemar, jsuchane, lhuang, mzhan, rbalakri, shyu
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-53.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1640869 (view as bug list) Environment:
Last Closed: 2015-07-22 05:48:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Tessun 2015-03-04 09:41:41 UTC
Description of problem:
virsh numatune doesn't migrate memory in a running VM

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Start a VM with strict pinning to NUMA Node 0
2. Change the NumaNode via "virsh numatune <VM> --nodeset 1

Actual results:
New memory gets allocated in the new numanode, but already allocated one stays in NumaNode 0


Expected results:
As the allocation policy is strict, all memory shoudl be moved to NumaNode 1


Additional info:
This happens as the cgroup has cpuset.memory_migrate set to 0
In case this is set to 1 manually the memory gets migrated.

As typically the Numanode and the CPUset are "pinned" to the same NumaNode, having memory in another node slows down the VM in case this memory is accessed.

Comment 2 Martin Kletzander 2015-03-09 08:03:56 UTC
I remember raising the memory_migrate question upstream, let me find the discussion (if I remember correctly that there was any).  In the meantime, setting it manually to 1 is a valid workaround that will not break libvirt.

Comment 3 Martin Kletzander 2015-03-11 12:50:32 UTC
The discussion mentioned in comment #2 was private and mostly irrelevant, so I just composed a patch and sent it upstream:

https://www.redhat.com/archives/libvir-list/2015-March/msg00586.html

Comment 4 Martin Kletzander 2015-03-20 12:45:24 UTC
Fixed upstream with v1.2.13-250-gba1dfc5..v1.2.13-251-g3a0e5b0: 

commit ba1dfc5b6a65914ec8ceadbcfbe16c17e83cc760
Author: Martin Kletzander <mkletzan>
Date:   Wed Mar 11 11:15:29 2015 +0100

    cgroup: Add accessors for cpuset.memory_migrate
    
commit 3a0e5b0c20815f986ac434e3df67f56d5d1aa44c
Author: Martin Kletzander <mkletzan>
Date:   Wed Mar 11 11:17:15 2015 +0100

    qemu: Migrate memory on numatune change

Comment 10 Jiri Denemark 2015-04-10 11:38:32 UTC
The backport introduces a memory leak.

Comment 13 Luyao Huang 2015-04-13 08:47:37 UTC
I can reproduce this issue with libvirt-0.10.2-51.el6.x86_64:

1. start a vm guest mem bind to host with strict mode:
# virsh dumpxml r6
...
  <memory unit='KiB'>40240000</memory>
  <currentMemory unit='KiB'>30240000</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
...

2. start this vm

# virsh start r6
Domain r6 started

3. check cgroup and numa mem

# numastat -p `pidof qemu-kvm`

Per-node process memory usage (in MBs) for PID 31656 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                       196.49            0.00          196.49
Stack                        0.04            0.00            0.04
Private                  30486.73            5.62        30492.35
----------------  --------------- --------------- ---------------
Total                    30683.27            5.62        30688.88

# cgget -g cpuset /libvirt/qemu/r6
/libvirt/qemu/r6:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 0
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 0-31

4. change the memory bind node

# virsh numatune r6 0 1

5.recheck cgroup and numa:
# numastat -p `pidof qemu-kvm`

Per-node process memory usage (in MBs) for PID 31656 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                       196.49            0.00          196.49
Stack                        0.04            0.00            0.04
Private                  30494.73            5.62        30500.35
----------------  --------------- --------------- ---------------
Total                    30691.27            5.62        30696.88

# cgget -g cpuset /libvirt/qemu/r6
/libvirt/qemu/r6:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 0
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 0-31



And verify this bug with libvirt-0.10.2-53.el6.x86_64:

1. prepare a vm mem bind to host with strict:
# virsh dumpxml r6
...
  <memory unit='KiB'>40240000</memory>
  <currentMemory unit='KiB'>30240000</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
...

2. start this vm

# virsh start r6
Domain r6 started

3.check cgroup and numa mem:
# cgget -g cpuset /libvirt/qemu/r6
/libvirt/qemu/r6:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 0-31

# numastat -p `pidof qemu-kvm`

Per-node process memory usage (in MBs) for PID 15103 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                       196.49            0.00          196.49
Stack                        0.04            0.00            0.04
Private                     48.64            5.31           53.95
----------------  --------------- --------------- ---------------
Total                      245.17            5.31          250.48

4. run a memeater in guest and recheck memory

# numastat -p `pidof qemu-kvm`

Per-node process memory usage (in MBs) for PID 15103 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                       196.49            0.00          196.49
Stack                        0.04            0.00            0.04
Private                  29629.27            5.32        29634.59
----------------  --------------- --------------- ---------------
Total                    29825.80            5.32        29831.12

5. migrate memory via virsh numatune:

# time virsh numatune r6 0 1


real	0m38.960s
user	0m0.030s
sys	0m0.034s

6. recheck cgroup and numa memory

# cgget -g cpuset /libvirt/qemu/r6
/libvirt/qemu/r6:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 0-31

# numastat -p `pidof qemu-kvm`

Per-node process memory usage (in MBs) for PID 15103 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00          196.49          196.49
Stack                        0.00            0.04            0.04
Private                      0.04        29638.54        29638.58
----------------  --------------- --------------- ---------------
Total                        0.04        29835.07        29835.11


7. also test with numad with placement='auto', it work's well

Comment 15 errata-xmlrpc 2015-07-22 05:48:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1252.html