Bug 1476380 - Fedora 26 ppc64le guest with MEMORY_HOTPLUG_DEFAULT_ONLINE=y gets a "kernel BUG at mm/memory_hotplug.c:2185" when hotplugging LMBs with QEMU upstream
Summary: Fedora 26 ppc64le guest with MEMORY_HOTPLUG_DEFAULT_ONLINE=y gets a "kernel B...
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 26
Hardware: ppc64le
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: PPCTracker F-ExcludeArch-ppc64le, PPC64LETracker
TreeView+ depends on / blocked
 
Reported: 2017-07-28 18:36 UTC by Daniel Henrique Barboza
Modified: 2017-08-23 06:08 UTC (History)
12 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-08-23 06:08:45 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 157130 None None None 2017-07-30 15:33 UTC

Description Daniel Henrique Barboza 2017-07-28 18:36:05 UTC
- Host information: Cent OS 7 running upstream QEMU:

$ cat /proc/cpuinfo
processor	: 0
cpu		: POWER8 (raw), altivec supported
clock		: 2061.000000MHz
revision	: 2.0 (pvr 004d 0200)
(...)
$ uname -a
Linux ltc-hab1.aus.stglabs.ibm.com 4.11.0-7.gitd255e14.el7.centos.ppc64le #1 SMP Wed Jul 26 11:46:31 BRT 2017 ppc64le ppc64le ppc64le GNU/Linux

- qemu command line that launched the F26 ppc64le guest:

[danielhb@ltc-hab1 ppc64-softmmu]$ sudo ./qemu-system-ppc64 -name migrate_qemu -boot strict=on --enable-kvm -device nec-usb-xhci,id=usb,bus=pci.0,addr=0xf -device spapr-vscsi,id=scsi0,reg=0x2000 -smp 1,maxcpus=4,sockets=4,cores=1,threads=1 --machine pseries,accel=kvm,usb=off,dump-guest-core=off -m 4G,slots=32,maxmem=32G -drive file=/home/danielhb/vm_imgs/f26.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -nographic 


- guest information: Fedora 26 ppc64le:

[danielhb@localhost ~]$ uname -a
Linux localhost.localdomain 4.11.11-300.fc26.ppc64le #1 SMP Mon Jul 17 16:14:56 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
[danielhb@localhost ~]$ 


- Problem: hotplugging a LMB generates a guest kernel Oops:

[danielhb@localhost ~]$ QEMU 2.9.90 monitor - type 'help' for more information
(qemu) 
(qemu) object_add memory-backend-ram,id=ram1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=ram1
(qemu) [  220.681979] kernel BUG at mm/memory_hotplug.c:2185!
[  220.682289] Oops: Exception in kernel mode, sig: 5 [#1]
[  220.682508] SMP NR_CPUS=1024 
[  220.682509] NUMA 
[  220.682680] pSeries
[  220.682920] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc ghash_generic vmx_crypto xfs libcrc32c bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_blk ttm drm ibmveth ibmvscsi scsi_transport_srp crc32c_vpmsum virtio_pci virtio_ring virtio i2c_core
[  220.685274] CPU: 0 PID: 47 Comm: kworker/u8:1 Not tainted 4.11.11-300.fc26.ppc64le #1
[  220.685603] Workqueue: pseries hotplug workque pseries_hp_work_fn
[  220.685851] task: c0000000fed5ff00 task.stack: c0000000f2e08000
[  220.686097] NIP: c00000000038cd10 LR: c00000000038cc80 CTR: 0000000000000000
[  220.686362] REGS: c0000000f2e0b770 TRAP: 0700   Not tainted  (4.11.11-300.fc26.ppc64le)
[  220.686650] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[  220.686657]   CR: 42002422  XER: 20000000
[  220.687059] CFAR: c00000000038cc88 SOFTE: 1 
[  220.687059] GPR00: c00000000038cc80 c0000000f2e0b9f0 c0000000013a1900 0000000000000001 
[  220.687059] GPR04: c0000000f2042480 c0000000ffe1f3f0 000000000000003e 0000000000000003 
[  220.687059] GPR08: 0000000000000002 0000000000000003 0000000000000003 303078302d303030 
[  220.687059] GPR12: 0000000000002200 c00000000fdc0000 c0000000001218a8 c0000000fe18bcc0 
[  220.687059] GPR16: 0000000000000000 0000000000000001 0000000000000004 0000000000000001 
[  220.687059] GPR20: c0000000fffffc30 0000000000000010 c0000000fa314184 0000000000000010 
[  220.687059] GPR24: c0000000fffffea0 c0000000f3ba18e0 c0000000fffffc30 0000000000000000 
[  220.687059] GPR28: 0000000000000001 c0000000f2042438 0000000010000000 0000000100000000 
[  220.689625] NIP [c00000000038cd10] remove_memory+0x100/0x110
[  220.689848] LR [c00000000038cc80] remove_memory+0x70/0x110
[  220.690049] Call Trace:
[  220.690166] [c0000000f2e0b9f0] [c00000000038cc80] remove_memory+0x70/0x110 (unreliable)
[  220.690463] [c0000000f2e0ba40] [c0000000000b7e10] dlpar_add_lmb+0x280/0x480
[  220.690711] [c0000000f2e0bb20] [c0000000000b955c] dlpar_memory+0xa5c/0xe50
[  220.690959] [c0000000f2e0bbe0] [c0000000000b0958] handle_dlpar_errorlog+0xf8/0x160
[  220.691250] [c0000000f2e0bc50] [c0000000000b0a54] pseries_hp_work_fn+0x94/0xa0
[  220.691542] [c0000000f2e0bc80] [c000000000117824] process_one_work+0x234/0x570
[  220.691833] [c0000000f2e0bd20] [c000000000117bf8] worker_thread+0x98/0x650
[  220.692081] [c0000000f2e0bdc0] [c000000000121a4c] kthread+0x1ac/0x1c0
[  220.692329] [c0000000f2e0be30] [c00000000000bc60] ret_from_kernel_thread+0x5c/0x7c
[  220.692618] Instruction dump:
[  220.692777] 387d0060 487ac425 60000000 eba10038 38210050 e8010010 eb61ffd8 eb81ffe0 
[  220.693075] ebc1fff0 7c0803a6 ebe1fff8 4e800020 <0fe00000> 00000000 00000000 00000000 
[  220.693386] ---[ end trace 8bb8d70f889bfce2 ]---
[  220.698108] 

Investigating the cause I've found this kernel commit:

commit 943db62c316c578f8e2cc6fb81a5f641096b29bf
Author: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Date:   Wed Feb 15 13:45:30 2017 -0500

    powerpc/pseries: Revert 'Auto-online hotplugged memory'
    
    This reverts commit ec999072442a ("powerpc/pseries: Auto-online
    hotplugged memory"), and 9dc512819e4b ("powerpc: Fix unused function
    warning 'lmb_to_memblock'").
    
    Using the auto-online acpability does online added memory but does not
    update the associated device struct to indicate that the memory is
    online. This causes the pseries memory DLPAR code to fail when trying to
    remove a LMB that was previously removed and added back. This happens
    when validating that the LMB is removable.
    
    This patch reverts to the previous behavior of calling device_online()
    to online the LMB when it is DLPAR added and moves the lmb_to_memblock()
    routine out of CONFIG_MEMORY_HOTREMOVE now that we call it for add.


This commit removed a specific kernel configuration in the revert:

--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -58,7 +58,6 @@ CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTREMOVE=y
-CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
 CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y


Using the vanilla kernel from Linus and the one from git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git I've got the following default config for pseries:

[danielhb@arthas linux]$ ARCH=powerpc make pseries_defconfig
#
# configuration written to .config
#
[danielhb@arthas linux]$ grep -R 'HOTPLUG_DEFAULT' .
./mm/Kconfig:config MEMORY_HOTPLUG_DEFAULT_ONLINE
./mm/memory_hotplug.c:#ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
./.config:# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
./Documentation/admin-guide/kernel-parameters.txt:			CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
./Documentation/memory-hotplug.txt:The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
[danielhb@arthas linux]$ 


As we can see from the grep result, the .config was generated without the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE option and there is no other place in the code where it is set to Y.

In the mm/Kconfig we have:

(...)
config MEMORY_HOTPLUG_DEFAULT_ONLINE
        bool "Online the newly added memory blocks by default"
        default n
        depends on MEMORY_HOTPLUG
        help
(...)  

This shows that the default value for this option is N, which makes sense with the change made in the patch - the absence of the option in .config disables the auto_online_blocks feature.

However, the F26 ppc64le kernel is setting this option to Y:

[danielhb@localhost ~]$ cat /boot/config-4.11.11-300.fc26.ppc64le | grep HOTPLUG_DEFAULT
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
[danielhb@localhost ~]$ 


Trying to understand where this option is being set to Y, searching the src.rpm of the guest kernel prior to the build I've found the following:


[danielhb@localhost rpmbuild]$ grep -R 'MEMORY_HOTPLUG_DEFAULT_ONLINE' .
./SOURCES/kernel-x86_64.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64-debug.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64le-debug.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64le.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64p7-debug.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-ppc64p7.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-s390x-debug.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-s390x.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SOURCES/kernel-x86_64-debug.config:CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
./SPECS/kernel.spec:- Enable CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (rhbz 1339281)
[danielhb@localhost rpmbuild]$ 


The bug https://bugzilla.redhat.com/show_bug.cgi?id=1339281 that is referenced in SPECS/kernel.spec asks for the option to be enabled:

"To make things work automagically CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE option was introduced. I'd like to have it enabled in Fedora."


I am not sure if the intent was to enable just for x86 or all architectures but, as is, this is breaking memory hotplug in pseries after the mentioned kernel commit 943db62c316c578f8e2cc6fb81a5f641096b29bf. Given that the default behavior when the option is not set is N, my suggestion is to change the MEMORY_HOTPLUG_DEFAULT_ONLINE  to 'not set' in any ppc64 config file in the Fedora build, following the defconfig we have in the vanilla kernel.


- Workarounds:

The most obvious one: if I recompile the F26 kernel without this option (or setting it to 'n'), LMB hotplug works.

Another possible workaround, further documented in the kernel commits that introduced the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE option, is to manually set the auto_online blocks to 'offline'. Doing that prior to the hotplug avoids the kernel Oops:

[root@localhost danielhb]# 
[root@localhost danielhb]# cat /sys/devices/system/memory/auto_online_blocks
online
[root@localhost danielhb]# echo offline > /sys/devices/system/memory/auto_online_blocks
[root@localhost danielhb]# 
[root@localhost danielhb]# QEMU 2.9.90 monitor - type 'help' for more information
(qemu) 
(qemu) object_add memory-backend-ram,id=ram1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=ram1
(qemu) 

[root@localhost danielhb]# grep Mem /proc/meminfo
MemTotal:        5219136 kB
MemFree:         4604928 kB
MemAvailable:    4849856 kB

[root@localhost danielhb]# dmesg | tail -n 5
[   75.199061] Unable to resize hash page table to target order 23: -28
[   75.208096] pseries-hotplug-mem: Memory at 100000000 (drc index 80000010) was hot-added
[   75.208097] pseries-hotplug-mem: Memory at 110000000 (drc index 80000011) was hot-added
[   75.208099] pseries-hotplug-mem: Memory at 120000000 (drc index 80000012) was hot-added
[   75.208100] pseries-hotplug-mem: Memory at 130000000 (drc index 80000013) was hot-added
[root@localhost danielhb]# 

As we can see, it is possible to normally hotplug memory if we manually disable auto_online_blocks.


Let me know if you need any extra information about the issue or the tests.


Thanks,


Daniel

Comment 1 Josh Boyer 2017-07-30 13:47:59 UTC
We can turn this off, but if the function truly doesn't work on ppc64/ppc64le it should likely be prevented from being selected at all in the Kconfig values.  I don't see why upstream would want to even allow someone to select a broken config option.

Have you reported this upstream as well?

Comment 2 Josh Boyer 2017-07-30 13:58:39 UTC
I've disabled this on f25-rawhide.

Comment 3 IBM Bug Proxy 2017-07-30 16:00:24 UTC
------- Comment From hannsj_uhl@de.ibm.com 2017-07-30 11:52 EDT-------
*** Bug 157122 has been marked as a duplicate of this bug. ***

Comment 4 Daniel Henrique Barboza 2017-07-31 20:14:28 UTC
(In reply to Josh Boyer from comment #1)
> We can turn this off, but if the function truly doesn't work on
> ppc64/ppc64le it should likely be prevented from being selected at all in
> the Kconfig values.  I don't see why upstream would want to even allow
> someone to select a broken config option.
> 
> Have you reported this upstream as well?

Good point. I've sent a RFC patch to start the discussion in the Linuxppc kernel mailing list:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-August/161316.html

Meanwhile, I appreciate disabling it in the next versions of F26 kernels.

Comment 5 Fedora Update System 2017-08-08 21:09:08 UTC
kernel-4.12.5-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f98cef571d

Comment 6 Laurent Vivier 2017-08-10 16:32:31 UTC
A patch to fix the broken part in ppc64 has been proposed upstream, see BZ1478057 comment 1.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-August/161429.html

Comment 7 Fedora Update System 2017-08-13 00:59:25 UTC
kernel-4.12.5-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f98cef571d

Comment 8 IBM Bug Proxy 2017-08-14 09:30:24 UTC
------- Comment From hannsj_uhl@de.ibm.com 2017-08-14 05:22 EDT-------
(In reply to comment #10)
> A patch to fix the broken part in ppc64 has been proposed upstream, see
> BZ1478057 comment 1.
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-August/161429.html
.
... which is now upstream accepted as git commit
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=1a367063ca0c1c6f6f54b5abd7b4836b0866a07b
("powerpc/pseries: Check memory device state before onlining/offlining")
...

Comment 9 Dan Horák 2017-08-14 09:53:49 UTC
Do I read it right that a proper fix has been applied in upstream kernel and thus our workaround [1] in the kernel config can be reverted?

[1] http://pkgs.fedoraproject.org/rpms/kernel/c/d1d51e73b5abeb2bf0aab53a8eb288300a9ee9c7?branch=master

Comment 10 Fedora Update System 2017-08-18 15:59:29 UTC
kernel-4.12.8-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-73f71456d7

Comment 11 Fedora Update System 2017-08-21 01:21:35 UTC
kernel-4.12.8-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-73f71456d7

Comment 12 Fedora Update System 2017-08-23 06:08:45 UTC
kernel-4.12.8-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.