Bug 1308743

Summary: [RFE] PAPR Hash Page Table (HPT) resizing (libvirt)
Product: Red Hat Enterprise Linux 7 Reporter: Karen Noel <knoel>
Component: libvirtAssignee: Andrea Bolognani <abologna>
Status: CLOSED ERRATA QA Contact: Dan Zheng <dzheng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: abologna, bugproxy, dgibson, dyuan, dzheng, gsun, hannsj_uhl, jsuchane, mdeng, michen, qzhang, rbalakri, virt-bugs, virt-maint, xuhan, xuma
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.5   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.9.0-3.el7 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1305398
: 1308744 1308746 (view as bug list) Environment:
Last Closed: 2018-04-10 10:36:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1305398    
Bug Blocks: 1248279, 1284775, 1308744, 1308746, 1399177, 1444027, 1469590    

Description Karen Noel 2016-02-15 22:26:47 UTC
In case libvirt requires changes to support HPT resizing. If no changes required, set this BZ to TestOnly.

+++ This bug was initially created as a clone of Bug #1305398 +++

Description of problem:

Allow the hash page table (HPT) of PAPR guests to be resized at runtime.

This is important for practical memory hotplug.  Without this the HPT needs to be sized for the guest's maximum possible memory - since RHEV wants to set that to 4T, this can result in a much bigger than necessary HPT which wastes host resources and can cause allocation failures.  With HV KVM the HPT is unswappable, contiguous host memory.

This BZ covers the qemu parts of this including TCG and PR KVM implementation of the necessary hypercalls, feature negotation with the guest and enabling the necessary KVM host pieces.

--- Additional comment from David Gibson on 2016-02-07 19:52:00 EST ---

An RFC has been posted upstream:

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg05852.html

Comment 2 David Gibson 2016-02-16 02:55:55 UTC
No changes should be needed in libvirt to support HPT resizing per se.

However (small) changes might be necessary in order for layers above libvirt to control the "resize-hpt" qemu machine property via libvirt.  The property can be "disabled" - don't permit HPT resizing, "enabled" - permit HPT resizing or "required" permit HPT resizing and refuse to boot guests which don't support it.

Comment 3 Andrea Bolognani 2016-03-16 15:20:44 UTC
I'm trying to figure out exactly what changes are needed to
libvirt in order to support this.

I assume the interface won't change a great deal before being
merged, maybe just use slightly different names and stuff
like that.

Being a machine property, you'd use it along the lines of

  -machine pseries,resize-hpt=enabled

right? Can the value be altered at runtime, or is it going
to be decided by this command line option alone?

Comment 4 David Gibson 2016-03-21 04:59:10 UTC
> I assume the interface won't change a great deal before being merged, maybe just use slightly different names and stuff like that.

Yes, that's what I'm assuming as well.

> Being a machine property, you'd use it along the lines of
>
>  -machine pseries,resize-hpt=enabled
> 
> right? Can the value be altered at runtime, or is it going to be decided by this command line option alone?

That's right.  The value can not be changed at runtime, it's just set on the commandline then stays that way.

Comment 7 David Gibson 2017-03-09 03:41:52 UTC
Required qemu and host kernel bugs have been deferred to 7.5, so deferring this one as well.

Comment 8 Andrea Bolognani 2017-11-06 16:09:36 UTC
Patches posted upstream.

  https://www.redhat.com/archives/libvir-list/2017-November/msg00190.html

Comment 9 Andrea Bolognani 2017-11-14 16:14:49 UTC
Fix merged upstream.

commit 85b2ae96dfcf7dc324d6782f64c848fa412443e4
Author: Andrea Bolognani <abologna>
Date:   Mon Nov 6 16:39:40 2017 +0100

    qemu: Enable configuration of HPT resizing for pSeries guests
    
    Most of the time it's okay to leave this up to negotiation between
    the guest and the host, but in some situations it can be useful to
    manually decide the behavior, especially to enforce its availability.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1308743
    
    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: John Ferlan <jferlan>

v3.9.0-130-g85b2ae96d

Comment 12 Dan Zheng 2017-12-29 09:02:03 UTC
Test packages:
libvirt-3.9.0-6.el7.ppc64le
qemu-kvm-rhev-2.10.0-13.el7.ppc64le
kernel-3.10.0-823.el7.ppc64le (host and guest)

Case 1: Libvirt report reasonable messages for incorrect qemu version
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.9.0-16.el7_4.13.ppc64le
<features> 
   <hpt resizing='enabled'/>    
</features>

# virsh start vm1
error: Failed to start domain vm1
error: unsupported configuration: HTP resizing is not supported by this QEMU binary

Case 2: Set hpt resizing 'disabled'
Start guest with below setting, guest is started.
<features> 
   <hpt resizing='disabled'/>    
</features>

Qemu line: 
-machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off,***resize-hpt=disabled*** 

Check in guest.
[root@localhost ~]# cat /sys/kernel/debug/powerpc/hpt_order
25
[root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/hpt_order
-bash: echo: write error: No such device

Case 3: Set hpt resizing 'required'
<features> 
   <hpt resizing='required'/>    
</features>

-machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off,***resize-hpt=required*** -m size=1048576k,slots=16,maxmem=30670848k

Check in guest:
# cat /sys/kernel/debug/powerpc/hpt_order
23
# echo 22 > /sys/kernel/debug/powerpc/hpt_order
# echo 23 > /sys/kernel/debug/powerpc/hpt_order
# echo 24 > /sys/kernel/debug/powerpc/hpt_order
# dmesg
[    3.098663] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    3.837938] random: crng init done
[  125.417972] lpar: Attempting to resize HPT to shift 22
[  125.558357] lpar: HPT resize to shift 22 complete (105 ms / 34 ms)
[  147.561932] lpar: Attempting to resize HPT to shift 23
[  147.694366] lpar: HPT resize to shift 23 complete (101 ms / 30 ms)
[  157.689604] lpar: Attempting to resize HPT to shift 24
[  157.807051] lpar: HPT resize to shift 24 complete (103 ms / 13 ms)

Case 4: Without hpt resizing defined, the hpt resizing should be enabled as default ??? Need Andrea's confirmation

Qemu: -machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off

Attach memory device for several times. Check dmesg:

[   40.283171] random: crng init done
[   98.334835] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 80000004
[   98.334980] lpar: Attempting to resize HPT to shift 21
[   98.458011] lpar: Hash collision while resizing HPT
[   98.458015] Unable to resize hash page table to target order 21: -28
[   98.462263] lpar: Attempting to resize HPT to shift 21
[   98.586851] lpar: Hash collision while resizing HPT
[   98.586856] Unable to resize hash page table to target order 21: -28
[   98.591144] lpar: Attempting to resize HPT to shift 21
[   98.717476] lpar: Hash collision while resizing HPT
[   98.717479] Unable to resize hash page table to target order 21: -28
[   98.721934] lpar: Attempting to resize HPT to shift 21
[   98.848081] lpar: Hash collision while resizing HPT
[   98.848085] Unable to resize hash page table to target order 21: -28
[   98.870321] pseries-hotplug-mem: Memory at 40000000 (drc index 80000004) was hot-added

[  100.952201] pseries-hotplug-mem: Memory at 140000000 (drc index 80000014) was hot-added

[  101.660290] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 8000001c
[  101.679231] lpar: Attempting to resize HPT to shift 24
[  101.838251] lpar: HPT resize to shift 24 complete (104 ms / 54 ms)
[  101.856865] pseries-hotplug-mem: Memory at 1c0000000 (drc index 8000001c) was ...
[  102.440778] pseries-hotplug-mem: Memory at 2a0000000 (drc index 8000002a) was hot-added

...
[  108.295732] lpar: Attempting to resize HPT to shift 25
[  108.517180] lpar: HPT resize to shift 25 complete (108 ms / 113 ms)
[  108.535865] pseries-hotplug-mem: Memory at 3c0000000 (drc index 8000003c) was hot-added


Case 5: Set hpt resizing 'enabled' + memory hotplug/unhotplug
1. Start guest with setting:
  <maxMemory slots='16' unit='KiB'>30670848</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
..
  <features>
    <hpt resizing='enabled'/>
  </features>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

qemu: -machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off,***resize-hpt=enabled*** -m size=1048576k,slots=16,maxmem=3145728k 


2. Check in guest:
[root@localhost ~]# cat /sys/kernel/debug/powerpc/hpt_order
23

[root@localhost ~]# cat /proc/meminfo |grep Mem
MemTotal:        1027200 kB
MemFree:          616640 kB
MemAvailable:     755328 kB

3. Attach memory device for 12 times
# cat mem.xml 
<memory model='dimm'>
<target>
<size unit='KiB'>2048000</size>
<node>0</node>
</target>
</memory>

# virsh attach-device vm1 mem.xml 
Device attached successfully

-repeat attach action for 12 times

# virsh dumpxml vm1|grep memory -A5
<domain type='kvm' id='13'>
  <name>vm1</name>
  <uuid>ff1b78ff-90b7-4f0a-ad91-754de3e29309</uuid>
  <maxMemory slots='16' unit='KiB'>30670848</maxMemory>
  <memory unit='KiB'>11534336</memory>
  <currentMemory unit='KiB'>11534336</currentMemory>
...
    <memory model='dimm'>
      <target>
        <size unit='KiB'>2097152</size>
        <node>0</node>
      </target>
      <alias name='dimm0'/>
...
    <memory model='dimm'>
      <target>
        <size unit='KiB'>2097152</size>
        <node>0</node>
      </target>
      <alias name='dimm11'/>
--
    </memory>

Check dmesg in guest
[   38.083064] random: crng init done
[   69.907160] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 80000004
[   69.907347] lpar: Attempting to resize HPT to shift 21
[   70.036748] lpar: Hash collision while resizing HPT
[   70.036752] Unable to resize hash page table to target order 21: -28
[   70.041112] lpar: Attempting to resize HPT to shift 21
[   70.155946] lpar: Hash collision while resizing HPT
[   70.155950] Unable to resize hash page table to target order 21: -28
[   70.160306] lpar: Attempting to resize HPT to shift 21
[   70.276591] lpar: Hash collision while resizing HPT
[   70.276595] Unable to resize hash page table to target order 21: -28
[   70.281015] lpar: Attempting to resize HPT to shift 21
[   70.396415] lpar: Hash collision while resizing HPT
[   70.396419] Unable to resize hash page table to target order 21: -28
[   70.418868] pseries-hotplug-mem: Memory at 40000000 (drc index 80000004) was hot-added
...
[   78.613840] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 8000000c
...
[   99.696549] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 8000001c
[   99.715087] lpar: Attempting to resize HPT to shift 24      ***
[   99.878489] lpar: HPT resize to shift 24 complete (107 ms / 55 ms)
...
[  118.859630] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 80000024
...
[  118.896709] pseries-hotplug-mem: Memory at 2b0000000 (drc index 8000002b) was hot-added
[  126.588153] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 8000002c
...
[  145.176816] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index 8000003c
[  145.197701] lpar: Attempting to resize HPT to shift 25      ***
[  145.404822] lpar: HPT resize to shift 25 complete (105 ms / 101 ms)
[  145.423477] pseries-hotplug-mem: Memory at 3c0000000 (drc index 8000003c) was hot-added

4. Detach memory device for 7 times
# virsh detach-device vm1 mem.xml 
Device detached successfully
...

# virsh dumpxml vm1|grep memory -A5
  <memory unit='KiB'>11534336</memory>
  <currentMemory unit='KiB'>11534336</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
--
      <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
--
    <memory model='dimm'>
      <target>
        <size unit='KiB'>2097152</size>
        <node>0</node>
      </target>
      <alias name='dimm7'/>
...
    </memory>
...
    <memory model='dimm'>
      <target>
        <size unit='KiB'>2097152</size>
        <node>0</node>
      </target>
      <alias name='dimm11'/>
--
    </memory>

5. Check dmesg
[  659.358504] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at 80000004
[  659.366407] Offlined Pages 4096
[  659.373573] Offlined Pages 4096
[  659.380575] Offlined Pages 4096
[  659.385384] Offlined Pages 4096
[  659.390987] Offlined Pages 4096
[  659.395271] Offlined Pages 4096
[  659.399601] Offlined Pages 4096
[  659.404056] Offlined Pages 4096
[  659.406466] pseries-hotplug-mem: Memory at 40000000 (drc index 80000004) was hot-removed
...
[ 1048.868678] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at 80000014
...
[ 1048.905530] Offlined Pages 4096
[ 1048.911061] Offlined Pages 4096
[ 1048.916390] Offlined Pages 4096
[ 1048.919104] pseries-hotplug-mem: Memory at 140000000 (drc index 80000014) was hot-removed
......
[ 1056.392723] pseries-hotplug-mem: Memory at 2b0000000 (drc index 8000002b) was hot-removed


5. Do detach once more. After about 10 seconds, this command returns successfully while previous detach commands can return at once.
# virsh detach-device vm1 mem.xml 
Device detached successfully

6. Check guest dmesg
[ 1076.485733] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at 8000002c
[ 1076.489603] Offlined Pages 4096
[ 1076.496715] Offlined Pages 4096
[ 1076.501226] Offlined Pages 4096
[ 1076.505658] Offlined Pages 4096
[ 1076.510386] Offlined Pages 4096
[ 1076.514567] Offlined Pages 4096
[ 1076.518609] Offlined Pages 4096
[ 1076.520910] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs     ----?

7.Check guest xml and same as above. Below memory device should not exist in the guest, but it did.

    <memory model='dimm'>
      <target>
        <size unit='KiB'>2097152</size>
        <node>0</node>
      </target>
      <alias name='dimm7'/>

8. Retest above steps with hpt disabled and detaching memory device after server times will also get the error:
[  669.111481] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at 8000002c
[  669.113902] Offlined Pages 4096
[  669.118115] Offlined Pages 4096
[  669.122837] Offlined Pages 4096
[  669.129005] Offlined Pages 4096
[  669.134376] Offlined Pages 4096
[  669.139373] Offlined Pages 4096
[  669.141693] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs

I think this is a problem about memory hotunplug, not hpt resizing. Maybe it is the problem David memtioned in bug 1305400 comment 7. Please confirm.

Andrea, could you help confirm above question in case 4 and case 5? And any other tests do you recommend?

Comment 13 Andrea Bolognani 2018-01-02 16:12:08 UTC
(In reply to Dan Zheng from comment #12)
> Case 4: Without hpt resizing defined, the hpt resizing should be enabled as
> default ??? Need Andrea's confirmation

If the user has not configured HPT resizing explicitly, the guest
will use QEMU's default. This happens to be "enabled" at the moment,
and will probably not change going forward, but as usual it's up to
the user requesting a feature explicitly if they depend on it.

> Case 5: Set hpt resizing 'enabled' + memory hotplug/unhotplug
> 1. Start guest with setting:
>   <maxMemory slots='16' unit='KiB'>30670848</maxMemory>
>   <memory unit='KiB'>1048576</memory>
>   <currentMemory unit='KiB'>1048576</currentMemory>
> ..
>   <features>
>     <hpt resizing='enabled'/>
>   </features>
>   <cpu>
>     <numa>
>       <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
>       <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
>     </numa>
>   </cpu>
> 
> qemu: -machine
> pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off,***resize-
> hpt=enabled*** -m size=1048576k,slots=16,maxmem=3145728k 
> 
> 
> 2. Check in guest:
> [root@localhost ~]# cat /sys/kernel/debug/powerpc/hpt_order
> 23
> 
> [root@localhost ~]# cat /proc/meminfo |grep Mem
> MemTotal:        1027200 kB
> MemFree:          616640 kB
> MemAvailable:     755328 kB
> 
> 3. Attach memory device for 12 times
> # cat mem.xml 
> <memory model='dimm'>
> <target>
> <size unit='KiB'>2048000</size>
> <node>0</node>
> </target>
> </memory>
> 
> # virsh attach-device vm1 mem.xml 
> Device attached successfully
> 
> -repeat attach action for 12 times
> 
> # virsh dumpxml vm1|grep memory -A5
> <domain type='kvm' id='13'>
>   <name>vm1</name>
>   <uuid>ff1b78ff-90b7-4f0a-ad91-754de3e29309</uuid>
>   <maxMemory slots='16' unit='KiB'>30670848</maxMemory>
>   <memory unit='KiB'>11534336</memory>
>   <currentMemory unit='KiB'>11534336</currentMemory>
> ...
>     <memory model='dimm'>
>       <target>
>         <size unit='KiB'>2097152</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm0'/>
> ...
>     <memory model='dimm'>
>       <target>
>         <size unit='KiB'>2097152</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm11'/>
> --
>     </memory>
> 
> Check dmesg in guest
> [   38.083064] random: crng init done
> [   69.907160] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 80000004
> [   69.907347] lpar: Attempting to resize HPT to shift 21
> [   70.036748] lpar: Hash collision while resizing HPT
> [   70.036752] Unable to resize hash page table to target order 21: -28
> [   70.041112] lpar: Attempting to resize HPT to shift 21
> [   70.155946] lpar: Hash collision while resizing HPT
> [   70.155950] Unable to resize hash page table to target order 21: -28
> [   70.160306] lpar: Attempting to resize HPT to shift 21
> [   70.276591] lpar: Hash collision while resizing HPT
> [   70.276595] Unable to resize hash page table to target order 21: -28
> [   70.281015] lpar: Attempting to resize HPT to shift 21
> [   70.396415] lpar: Hash collision while resizing HPT
> [   70.396419] Unable to resize hash page table to target order 21: -28
> [   70.418868] pseries-hotplug-mem: Memory at 40000000 (drc index 80000004)
> was hot-added
> ...
> [   78.613840] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 8000000c
> ...
> [   99.696549] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 8000001c
> [   99.715087] lpar: Attempting to resize HPT to shift 24      ***
> [   99.878489] lpar: HPT resize to shift 24 complete (107 ms / 55 ms)
> ...
> [  118.859630] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 80000024
> ...
> [  118.896709] pseries-hotplug-mem: Memory at 2b0000000 (drc index 8000002b)
> was hot-added
> [  126.588153] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 8000002c
> ...
> [  145.176816] pseries-hotplug-mem: Attempting to hot-add 8 LMB(s) at index
> 8000003c
> [  145.197701] lpar: Attempting to resize HPT to shift 25      ***
> [  145.404822] lpar: HPT resize to shift 25 complete (105 ms / 101 ms)
> [  145.423477] pseries-hotplug-mem: Memory at 3c0000000 (drc index 8000003c)
> was hot-added
> 
> 4. Detach memory device for 7 times
> # virsh detach-device vm1 mem.xml 
> Device detached successfully
> ...
> 
> # virsh dumpxml vm1|grep memory -A5
>   <memory unit='KiB'>11534336</memory>
>   <currentMemory unit='KiB'>11534336</currentMemory>
>   <vcpu placement='static'>4</vcpu>
>   <resource>
>     <partition>/machine</partition>
>   </resource>
> --
>       <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
>       <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
>     </numa>
>   </cpu>
>   <clock offset='utc'/>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
> --
>     <memory model='dimm'>
>       <target>
>         <size unit='KiB'>2097152</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm7'/>
> ...
>     </memory>
> ...
>     <memory model='dimm'>
>       <target>
>         <size unit='KiB'>2097152</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm11'/>
> --
>     </memory>
> 
> 5. Check dmesg
> [  659.358504] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at
> 80000004
> [  659.366407] Offlined Pages 4096
> [  659.373573] Offlined Pages 4096
> [  659.380575] Offlined Pages 4096
> [  659.385384] Offlined Pages 4096
> [  659.390987] Offlined Pages 4096
> [  659.395271] Offlined Pages 4096
> [  659.399601] Offlined Pages 4096
> [  659.404056] Offlined Pages 4096
> [  659.406466] pseries-hotplug-mem: Memory at 40000000 (drc index 80000004)
> was hot-removed
> ...
> [ 1048.868678] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at
> 80000014
> ...
> [ 1048.905530] Offlined Pages 4096
> [ 1048.911061] Offlined Pages 4096
> [ 1048.916390] Offlined Pages 4096
> [ 1048.919104] pseries-hotplug-mem: Memory at 140000000 (drc index 80000014)
> was hot-removed
> ......
> [ 1056.392723] pseries-hotplug-mem: Memory at 2b0000000 (drc index 8000002b)
> was hot-removed
> 
> 
> 5. Do detach once more. After about 10 seconds, this command returns
> successfully while previous detach commands can return at once.
> # virsh detach-device vm1 mem.xml 
> Device detached successfully
> 
> 6. Check guest dmesg
> [ 1076.485733] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at
> 8000002c
> [ 1076.489603] Offlined Pages 4096
> [ 1076.496715] Offlined Pages 4096
> [ 1076.501226] Offlined Pages 4096
> [ 1076.505658] Offlined Pages 4096
> [ 1076.510386] Offlined Pages 4096
> [ 1076.514567] Offlined Pages 4096
> [ 1076.518609] Offlined Pages 4096
> [ 1076.520910] pseries-hotplug-mem: Memory indexed-count-remove failed,
> adding any removed LMBs     ----?
> 
> 7.Check guest xml and same as above. Below memory device should not exist in
> the guest, but it did.
> 
>     <memory model='dimm'>
>       <target>
>         <size unit='KiB'>2097152</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm7'/>
> 
> 8. Retest above steps with hpt disabled and detaching memory device after
> server times will also get the error:
> [  669.111481] pseries-hotplug-mem: Attempting to hot-remove 8 LMB(s) at
> 8000002c
> [  669.113902] Offlined Pages 4096
> [  669.118115] Offlined Pages 4096
> [  669.122837] Offlined Pages 4096
> [  669.129005] Offlined Pages 4096
> [  669.134376] Offlined Pages 4096
> [  669.139373] Offlined Pages 4096
> [  669.141693] pseries-hotplug-mem: Memory indexed-count-remove failed,
> adding any removed LMBs
> 
> I think this is a problem about memory hotunplug, not hpt resizing. Maybe it
> is the problem David memtioned in bug 1305400 comment 7. Please confirm.

Thanks to your detailed instructions, I managed to reproduce the
issue quite easily. I agree with your assessment that HPT resizing
is probably not the root cause here; moreover, it doesn't look like
libvirt is doing anything particularly wrong here[1], so the problem
is probably in either QEMU or the kernel.

David, have you run into anything similar already?


[1] Except for reporting "Device detached successfully" then the
    guest didn't actually release the device, but that might be
    a known limitation due to the way hot-unplugging work.

Comment 14 David Gibson 2018-01-02 23:17:09 UTC
I concur that HPT resizing isn't really relevant to this failure.

I don't think it's the thing mentioned in bug 1305400 comment 7 either though (I've since confirmed that it's usually not possible to shrink the HPT below the initial size, though for reasons a bit more complicated than stated there).

I think this is just the general problem that memory unplugs aren't guaranteed to work, because something unmovable might have occupied the memory region.  There are some things in the works to improve that, but I don't think we should wait on that to verify this bug.

Comment 15 Dan Zheng 2018-01-05 08:55:29 UTC
According to comments 14 and comment 13, I mark it verified.

Comment 19 errata-xmlrpc 2018-04-10 10:36:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704