Bug 1733106 - [ppc]while running LTP cpuset_hotplug(runtest) ,Processor 1 is stuck.task irqbalance:4305 blocked for more than 120 seconds
Summary: [ppc]while running LTP cpuset_hotplug(runtest) ,Processor 1 is stuck.task irq...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: David Gibson
QA Contact: zhenyzha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-25 08:41 UTC by zhenyzha
Modified: 2019-08-29 08:23 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-29 08:23:13 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
this is guest /var/log (18.09 MB, application/x-tar)
2019-07-25 08:43 UTC, zhenyzha
no flags Details

Description zhenyzha 2019-07-25 08:41:26 UTC
Description of problem:
Installed RHE8.1 (4.18.0-119.el8.ppc64le) guest,build LTP :git clone https://github.com/linux-test-project/ltp.git
while running LTP cpuset_hotplug(runtest) ,Processor 1 is stuck.task irqbalance:4305 blocked for more than 120 seconds

Version-Release number of selected component (if applicable):
host:
# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Model:               2.3 (pvr 004e 1203)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2300.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127
# update_flash_nv -d

Firmware version:
 Product Version       : witherspoon-ibm-OP9-v2.2-3.5
 Product Extra         : bmc-firmware-version-2.03
 Product Extra         : buildroot-2019.02.1-16-ge01dcd0
 Product Extra         : capp-ucode-p9-dd2-v4
 Product Extra         : hcode-hw040319a.940
 Product Extra         : hostboot-e5622fb
 Product Extra         : hostboot-binaries-hw021419a.930
 Product Extra         : linux-5.0.5-openpower1-p4b42b5c
 Product Extra         : machine-xml-e3e9aef
 Product Extra         : occ-58e422d
 Product Extra         : petitboot-v1.10.3
 Product Extra         : sbe-1410677
 Product Extra         : skiboot-v6.3-rc1-p1ce8930

# uname -r
4.18.0-120.el8.ppc64le
# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.0.0 (qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf)

guest:
# lscpu
Architecture:         ppc64le
Byte Order:           Little Endian
CPU(s):               64
On-line CPU(s) list:  0,2-63
Off-line CPU(s) list: 1
Thread(s) per core:   1
Core(s) per socket:   1
Socket(s):            32
NUMA node(s):         1
Model:                2.3 (pvr 004e 1203)
Model name:           POWER9 (architected), altivec supported
Hypervisor vendor:    KVM
Virtualization type:  para
L1d cache:            32K
L1i cache:            32K
NUMA node0 CPU(s):    0,2-63

# uname -r
4.18.0-119.el8.ppc64le

How reproducible:
3/3



Steps to Reproduce:
1.Installed RHE8.1 (4.18.0-119.el8.ppc64le) guest
2.boot guest
# /usr/libexec/qemu-kvm \
> -name zhenyzha-RHEL-8.1 \
> -sandbox off \
> -machine pseries,cap-nested-hv=on \
> -m 120G \
> -nodefaults \
> -vga std \
> -device nec-usb-xhci,id=xhci \
> -device usb-tablet,id=usb-tablet0 \
> -device usb-kbd,id=usb-kbd0 \
> -smp 64,cores=2,threads=1,sockets=32 \
> -vnc :30 \
> -serial stdio \
> -rtc base=utc,clock=host \
> -boot order=cdn,menu=off,strict=off \
> -enable-kvm \
> -qmp unix:/var/tmp/qmp-monitor-zhenyzha,server,nowait \
> -qmp tcp:0:3001,server,nowait \
> -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup \
> -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:07 \
> -device virtio-scsi-pci,bus=pci.0,addr=0x06,id=scsi-pci-0 \
> -drive id=my0,format=qcow2,media=disk,if=none,file=os.qcow2 \
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=my0,id=virtio0-0-1,bootindex=0

3. Build LTP on guest
# git clone https://github.com/linux-test-project/ltp.git
# make autotools
# ./configure
# make
# make install
# cd /opt/ltp

4. run ltp case
# /opt/ltp/runltp -f controllers
the /var/log/messages display :
Jul 25 10:42:30 localhost LTP: starting cpuset_hotplug (cpuset_hotplug_test.sh)
Jul 25 10:42:30 localhost kernel: cpu 1 (hwid 8) Ready to die...
Jul 25 10:42:30 localhost systemd[1]: Started /usr/lib/udev/kdump-udev-throttler.
Jul 25 10:42:31 localhost kernel: cpu 1 (hwid 8) Ready to die...
Jul 25 10:42:31 localhost kernel: Querying DEAD? cpu 1 (8) shows 2
Jul 25 10:42:32 localhost kdump-udev-throttler[41941]: kexec: unloaded kdump kernel
Jul 25 10:42:32 localhost kdump-udev-throttler[41941]: Stopping kdump: [OK]
Jul 25 10:42:32 localhost kdump-udev-throttler[41941]: Modified cmdline:BOOT_IMAGE=/vmlinuz-4.18.0-119.el8.ppc64le ro console=ttyS0,115200 biosdevname=0 net.ifnames=0 rhgb console=tty0 irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 rootflags=nofail kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd elfcorehdr=158272K
Jul 25 10:42:32 localhost kdump-udev-throttler[41941]: kexec: loaded kdump kernel
Jul 25 10:42:32 localhost kdump-udev-throttler[41941]: Starting kdump: [OK]
Jul 25 10:45:01 localhost kernel: Processor 1 is stuck.
Jul 25 10:47:31 localhost kernel: INFO: task irqbalance:4305 blocked for more than 120 seconds.-----------------------------------------blocked
Jul 25 10:47:31 localhost kernel:      Not tainted 4.18.0-119.el8.ppc64le #1
Jul 25 10:47:31 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 25 10:47:31 localhost kernel: irqbalance      D    0  4305      1 0x00040080
Jul 25 10:47:31 localhost kernel: Call Trace:
Jul 25 10:47:31 localhost kernel: [c000001d967df850] [c000001d75582f80] 0xc000001d75582f80 (unreliable)
Jul 25 10:47:31 localhost kernel: [c000001d967dfa20] [c00000000001fa10] __switch_to+0x2e0/0x4e0
Jul 25 10:47:31 localhost kernel: [c000001d967dfa80] [c000000000d411c4] __schedule+0x2c4/0xb20
Jul 25 10:47:31 localhost kernel: [c000001d967dfb50] [c000000000d41a68] schedule+0x48/0xb0
Jul 25 10:47:31 localhost kernel: [c000001d967dfb70] [c000000000d42060] schedule_preempt_disabled+0x20/0x30
Jul 25 10:47:31 localhost kernel: [c000001d967dfb90] [c000000000d43ac8] __mutex_lock.isra.1+0x3b8/0x6f0
Jul 25 10:47:31 localhost kernel: [c000001d967dfc20] [c0000000001e18f8] irq_lock_sparse+0x28/0x40
Jul 25 10:47:31 localhost kernel: [c000001d967dfc40] [c0000000001ee54c] show_interrupts+0x18c/0x550
Jul 25 10:47:31 localhost kernel: [c000001d967dfd00] [c00000000050b66c] seq_read+0x1bc/0x640
Jul 25 10:47:31 localhost kernel: [c000001d967dfda0] [c000000000594264] proc_reg_read+0x84/0x100
Jul 25 10:47:31 localhost kernel: [c000001d967dfdd0] [c0000000004c489c] sys_read+0x10c/0x310
Jul 25 10:47:31 localhost kernel: [c000001d967dfe30] [c00000000000b388] system_call+0x5c/0x70
Jul 25 10:47:31 localhost kernel: Processor 1 is stuck.
Jul 25 10:49:02 localhost rhsmd[42188]: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date.
Jul 25 10:50:01 localhost kernel: Processor 1 is stuck.
Jul 25 10:50:02 localhost systemd[1]: Starting system activity accounting tool...
Jul 25 10:50:02 localhost systemd[1]: Started system activity accounting tool.
Jul 25 10:52:32 localhost kernel: Processor 1 is stuck.
Jul 25 10:55:03 localhost kernel: Processor 1 is stuck.
Jul 25 10:57:33 localhost kernel: Processor 1 is stuck.
Jul 25 10:59:48 localhost kernel: INFO: task irqbalance:4305 blocked for more than 120 seconds.
Jul 25 10:59:48 localhost kernel:      Not tainted 4.18.0-119.el8.ppc64le #1
Jul 25 10:59:48 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 25 10:59:48 localhost kernel: irqbalance      D    0  4305      1 0x00040080
Jul 25 10:59:48 localhost kernel: Call Trace:
Jul 25 10:59:48 localhost kernel: [c000001d967df850] [000000010005b820] 0x10005b820 (unreliable)
Jul 25 10:59:48 localhost kernel: [c000001d967dfa20] [c00000000001fa10] __switch_to+0x2e0/0x4e0
Jul 25 10:59:48 localhost kernel: [c000001d967dfa80] [c000000000d411c4] __schedule+0x2c4/0xb20
Jul 25 10:59:48 localhost kernel: [c000001d967dfb50] [c000000000d41a68] schedule+0x48/0xb0
Jul 25 10:59:48 localhost kernel: [c000001d967dfb70] [c000000000d42060] schedule_preempt_disabled+0x20/0x30
Jul 25 10:59:48 localhost kernel: [c000001d967dfb90] [c000000000d43ac8] __mutex_lock.isra.1+0x3b8/0x6f0
Jul 25 10:59:48 localhost kernel: [c000001d967dfc20] [c0000000001e18f8] irq_lock_sparse+0x28/0x40
Jul 25 10:59:48 localhost kernel: [c000001d967dfc40] [c0000000001ee54c] show_interrupts+0x18c/0x550
Jul 25 10:59:48 localhost kernel: [c000001d967dfd00] [c00000000050b66c] seq_read+0x1bc/0x640
Jul 25 10:59:48 localhost kernel: [c000001d967dfda0] [c000000000594264] proc_reg_read+0x84/0x100
Jul 25 10:59:48 localhost kernel: [c000001d967dfdd0] [c0000000004c489c] sys_read+0x10c/0x310
Jul 25 10:59:48 localhost kernel: [c000001d967dfe30] [c00000000000b388] system_call+0x5c/0x70
Jul 25 11:00:03 localhost kernel: Processor 1 is stuck.
Jul 25 11:00:04 localhost systemd[1]: Starting system activity accounting tool...
Jul 25 11:00:04 localhost LTP: starting cpuset_memory (cpuset_memory_testset.sh)
Jul 25 11:00:04 localhost systemd[1]: Started system activity accounting tool.
Jul 25 11:00:04 localhost LTP: starting cpuset_memory_pressure (cpuset_memory_pressure_testset.sh)
Jul 25 11:00:04 localhost LTP: starting cpuset_memory_spread (cpuset_memory_spread_testset.sh)
Jul 25 11:00:04 localhost LTP: starting cpuset_regression_test (cpuset_regression_test.sh)
Jul 25 11:00:04 localhost LTP: starting cgroup_xattr
Jul 25 11:00:04 localhost kernel: new mount options do not match the existing superblock, will be ignored

# cat /opt/ltp/results/LTP_RUN_ON-2019_07_25-09h_57m_04s.log | grep FAIL
memcg_max_usage_in_bytes                           FAIL       2
memcg_stat                                         FAIL       1
memcg_use_hierarchy                                FAIL       1
cpuset_hotplug                                     FAIL       1
cpuset_regression_test                             FAIL       1

Actual results:
Processor 1 is stuck.task irqbalance:4305 blocked for more than 120 seconds 

Expected results:
the guest without calltrace

Additional info:

Comment 1 zhenyzha 2019-07-25 08:43:40 UTC
Created attachment 1593342 [details]
this is guest /var/log

Comment 3 Laurent Vivier 2019-07-26 04:59:18 UTC
(In reply to zhenyzha from comment #0)
...
> # cat /opt/ltp/results/LTP_RUN_ON-2019_07_25-09h_57m_04s.log | grep FAIL
> memcg_max_usage_in_bytes                           FAIL       2
> memcg_stat                                         FAIL       1
> memcg_use_hierarchy                                FAIL       1

These 3 failures are tracked in BZ 1732785 and are not related to the cpuset_hotplug error.

Comment 7 David Gibson 2019-08-26 06:29:58 UTC
zhenyzha,

Can you retest now that we have official builds based on qemu-4.1?

Comment 8 zhenyzha 2019-08-26 06:37:15 UTC
(In reply to David Gibson from comment #7)
> zhenyzha,
> 
> Can you retest now that we have official builds based on qemu-4.1?

OK,Update results later.

Comment 9 zhenyzha 2019-08-26 10:18:36 UTC
hit this issue again on qemu-4.1

# cat results/LTP_RUN_ON-2019_08_26-16h_58m_15s.log | grep FAIL
...... 
cpuset_hotplug                                     FAIL       1    
cpuset_regression_test                             FAIL       1     

check guest :
vim /var/log/messages
Aug 26 17:44:39 dhcp19-129-175 LTP: starting cpuset_hotplug (cpuset_hotplug_test.sh)
Aug 26 17:44:40 dhcp19-129-175 kernel: Querying DEAD? cpu 1 (8) shows 2
Aug 26 17:44:40 dhcp19-129-175 kernel: cpu 1 (hwid 8) Ready to die...
Aug 26 17:44:40 dhcp19-129-175 systemd[1]: Started /usr/lib/udev/kdump-udev-throttler.
Aug 26 17:44:40 dhcp19-129-175 kernel: Querying DEAD? cpu 1 (8) shows 2
Aug 26 17:44:41 dhcp19-129-175 kdump-udev-throttler[41319]: kexec: unloaded kdump kernel
Aug 26 17:44:41 dhcp19-129-175 kdump-udev-throttler[41319]: Stopping kdump: [OK]
Aug 26 17:44:42 dhcp19-129-175 kdump-udev-throttler[41319]: Modified cmdline:BOOT_IMAGE=/vmlinuz-4.18.0-136.el8.ppc64le ro console=ttyS0,115200 biosdevname=0 net.ifnames=0 rhgb console=tty0 irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 rootflags=nofail kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd elfcorehdr=158272K
Aug 26 17:44:42 dhcp19-129-175 kdump-udev-throttler[41319]: kexec: loaded kdump kernel
Aug 26 17:44:42 dhcp19-129-175 kdump-udev-throttler[41319]: Starting kdump: [OK]
Aug 26 17:47:10 dhcp19-129-175 kernel: WARNING: CPU: 1 PID: 0 at arch/powerpc/platforms/pseries/hotplug-cpu.c:159 pseries_mach_cpu_die+0xbc/0x2f0
Aug 26 17:47:10 dhcp19-129-175 kernel: Modules linked in: loop fuse nf_tables_set nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nft_chain_route_ipv4 nf_conntrack ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink xfs libcrc32c bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm xts vmx_crypto virtio_net net_failover virtio_blk failover drm_panel_orientation_quirks dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
Aug 26 17:47:10 dhcp19-129-175 kernel: CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 4.18.0-136.el8.ppc64le #1
Aug 26 17:47:10 dhcp19-129-175 kernel: NIP:  c0000000000fc61c LR: c0000000000fc5d8 CTR: c000000007ffee00
Aug 26 17:47:10 dhcp19-129-175 kernel: REGS: c0000018f674fab0 TRAP: 0700   Not tainted  (4.18.0-136.el8.ppc64le)
Aug 26 17:47:10 dhcp19-129-175 kernel: MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24000444  XER: 20040000
Aug 26 17:47:10 dhcp19-129-175 kernel: CFAR: c0000000000b2bb8 IRQMASK: 1 #012GPR00: c0000000000fc5d8 c0000018f674fd30 c000000001662900 0000000000000001 #012GPR04: c0000018fe16b570 c0000018fe16b570 0000000000000001 0000000000000010 #012GPR08: c0000018fe16b570 0000000000000001 00000018fcfb0000 00000018fcfb0000 #012GPR12: c0000000000b5200 c000000007ffee00 c0000018f674ff90 0000000000000000 #012GPR16: c0000000016920e8 0000000000000000 0000000000000800 0000000000000001 #012GPR20: c000000001195608 0000000000000001 0000000000000000 00000000000000
0000008 c000000001691ee8
Aug 26 17:47:10 dhcp19-129-175 kernel: NIP [c0000000000fc61c] pseries_mach_cpu_die+0xbc/0x2f0
Aug 26 17:47:10 dhcp19-129-175 kernel: LR [c0000000000fc5d8] pseries_mach_cpu_die+0x78/0x2f0
Aug 26 17:47:10 dhcp19-129-175 kernel: Call Trace:
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fd30] [c0000000000fc5d8] pseries_mach_cpu_die+0x78/0x2f0 (unreliable)
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fde0] [c0000000000592e8] cpu_die+0x48/0x70
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fe00] [c0000000000210c0] arch_cpu_idle_dead+0x20/0x40
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fe20] [c000000000199154] do_idle+0x2d4/0x480
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fea0] [c000000000199538] cpu_startup_entry+0x38/0x40
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674fed0] [c000000000058ea0] start_secondary+0x780/0x860
Aug 26 17:47:10 dhcp19-129-175 kernel: [c0000018f674ff90] [c00000000000ac70] start_secondary_prolog+0x10/0x14
Aug 26 17:47:10 dhcp19-129-175 kernel: Instruction dump:
Aug 26 17:47:10 dhcp19-129-175 kernel: 3b1842f0 7d5bf02a 3bb80004 7fa9eb78 7d29502e 2f890001 419e00c0 7d5bf02a
Aug 26 17:47:10 dhcp19-129-175 kernel: 7d3d502e 7d290034 5529d97e 69290001 <0b090000> 38800000 39200000 7f45d378
Aug 26 17:47:10 dhcp19-129-175 kernel: ---[ end trace 9b6249510dc45846 ]---
Aug 26 17:47:10 dhcp19-129-175 kernel: cpu 1 (hwid 8) Ready to die...
Aug 26 17:47:10 dhcp19-129-175 kernel: Processor 1 is stuck.                                                    -----------------------------------------stuck
Aug 26 17:49:23 dhcp19-129-175 kernel: INFO: task irqbalance:3711 blocked for more than 120 seconds.            -----------------------------------------blocked
Aug 26 17:49:23 dhcp19-129-175 kernel:      Tainted: G        W        --------- -  - 4.18.0-136.el8.ppc64le #1
Aug 26 17:49:23 dhcp19-129-175 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 17:49:23 dhcp19-129-175 kernel: irqbalance      D    0  3711      1 0x00040080
Aug 26 17:49:23 dhcp19-129-175 kernel: Call Trace:
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13850] [c0000018efdc5e80] 0xc0000018efdc5e80 (unreliable)
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13a20] [c00000000001fa00] __switch_to+0x2e0/0x4e0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13a80] [c000000000d43654] __schedule+0x2c4/0xb20
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13b50] [c000000000d43ef8] schedule+0x48/0xb0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13b70] [c000000000d444f0] schedule_preempt_disabled+0x20/0x30
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13b90] [c000000000d45f58] __mutex_lock.isra.1+0x3b8/0x6f0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13c20] [c0000000001e1758] irq_lock_sparse+0x28/0x40
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13c40] [c0000000001ee3ac] show_interrupts+0x18c/0x550
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13d00] [c00000000050d30c] seq_read+0x1bc/0x640
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13da0] [c000000000595e94] proc_reg_read+0x84/0x100
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13dd0] [c0000000004c668c] sys_read+0x10c/0x310
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000018efe13e30] [c00000000000b388] system_call+0x5c/0x70
Aug 26 17:49:23 dhcp19-129-175 kernel: INFO: task kworker/53:1:24184 blocked for more than 120 seconds.
Aug 26 17:49:23 dhcp19-129-175 kernel:      Tainted: G        W        --------- -  - 4.18.0-136.el8.ppc64le #1
Aug 26 17:49:23 dhcp19-129-175 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 17:49:23 dhcp19-129-175 kernel: kworker/53:1    D    0 24184      2 0x00000888
Aug 26 17:49:23 dhcp19-129-175 kernel: Workqueue: events slab_caches_to_rcu_destroy_workfn
Aug 26 17:49:23 dhcp19-129-175 kernel: Call Trace:
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc437d0] [c0000017c77ed900] 0xc0000017c77ed900 (unreliable)
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc439a0] [c00000000001fa00] __switch_to+0x2e0/0x4e0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43a00] [c000000000d43654] __schedule+0x2c4/0xb20
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43ad0] [c000000000d43ef8] schedule+0x48/0xb0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43af0] [c000000000d47ca8] rwsem_down_read_failed+0x138/0x250
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43b70] [c0000000001d1578] __percpu_down_read+0x128/0x130
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43ba0] [c000000000144abc] cpus_read_lock+0x7c/0x90
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43bc0] [c0000000001f7e18] rcu_barrier+0xc8/0x320
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43c30] [c0000000003f96d8] slab_caches_to_rcu_destroy_workfn+0xa8/0x110
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43c70] [c000000000171ef4] process_one_work+0x2f4/0x5c0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43d10] [c000000000172c50] worker_thread+0x360/0x760
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43dc0] [c00000000017c4dc] kthread+0x1ac/0x1c0
Aug 26 17:49:23 dhcp19-129-175 kernel: [c0000017bfc43e30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
Aug 26 17:49:41 dhcp19-129-175 kernel: Processor 1 is stuck.
Aug 26 17:52:11 dhcp19-129-175 kernel: Processor 1 is stuck.
Aug 26 17:52:11 dhcp19-129-175 systemd[1]: Starting system activity accounting tool...
Aug 26 17:52:11 dhcp19-129-175 systemd[1]: Started system activity accounting tool.
Aug 26 17:54:41 dhcp19-129-175 kernel: Processor 1 is stuck.
Aug 26 17:57:12 dhcp19-129-175 kernel: Processor 1 is stuck.
Aug 26 17:59:37 dhcp19-129-175 kernel: INFO: task kworker/0:3:402 blocked for more than 120 seconds.
Aug 26 17:59:37 dhcp19-129-175 kernel:      Tainted: G        W        --------- -  - 4.18.0-136.el8.ppc64le #1
Aug 26 17:59:37 dhcp19-129-175 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 17:59:37 dhcp19-129-175 kernel: kworker/0:3     D    0   402      2 0x00000808
Aug 26 17:59:37 dhcp19-129-175 kernel: Workqueue: events vmstat_shepherd


Version-Release number of selected component (if applicable):
host:
# uname -r
4.18.0-137.el8.ppc64le
# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93)

guest:
# uname -r
4.18.0-136.el8.ppc64le

Comment 10 zhenyzha 2019-08-26 10:30:07 UTC
Additional info:
The same steps were tested on qemu-kvm-4.0.0-6, no hit this issue.

# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.0.0 (qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3)


cpuset_hotplug                    PASS
cpuset_regression_test            PASS

Comment 11 zhenyzha 2019-08-29 06:59:07 UTC
Additional info:
The same steps were tested on 4.18.0-137.el8.ppc64le host, no hit this issue.

cpuset_hotplug                    FAIL ----------------but no Call Trace:
cpuset_regression_test            PASS

[root@ibm-p9b-11 results]# cat LTP_RUN_ON-2019_08_29-02h_47m_11s.log 
Test Start Time: Thu Aug 29 02:47:12 2019
-----------------------------------------
Testcase                                           Result     Exit Value
--------                                           ------     ----------
cpuset_hotplug                                     FAIL       1    

-----------------------------------------------
Total Tests: 1
Total Skipped Tests: 0
Total Failures: 1
Kernel Version: 4.18.0-137.el8.ppc64le
Machine Architecture: ppc64le
Hostname: ibm-p9b-11.pnr.lab.eng.bos.redhat.com


check the host /var/log/messages:
Aug 29 02:47:11 ibm-p9b-11 kernel: loop: module loaded
Aug 29 02:47:12 ibm-p9b-11 LTP: starting cpuset_hotplug (cpuset_hotplug_test.sh)
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 31: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 110: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 187: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 244: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 263: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 439: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 460: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 kernel: IRQ 536: no longer affine to CPU1
Aug 29 02:47:12 ibm-p9b-11 systemd[1]: Started /usr/lib/udev/kdump-udev-throttler.
Aug 29 02:47:13 ibm-p9b-11 systemd[1]: Started /usr/lib/udev/kdump-udev-throttler.
Aug 29 02:47:13 ibm-p9b-11 kdump-udev-throttler[66336]: Throttling kdump restart for concurrent udev event
Aug 29 02:47:13 ibm-p9b-11 kdump-udev-throttler[66196]: kexec: unloaded kdump kernel
Aug 29 02:47:13 ibm-p9b-11 kdump-udev-throttler[66196]: Stopping kdump: [OK]
Aug 29 02:47:14 ibm-p9b-11 kdump-udev-throttler[66196]: Modified cmdline:ro irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 rootflags=nofail kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd elfcorehdr=158272K
Aug 29 02:47:14 ibm-p9b-11 kdump-udev-throttler[66196]: kexec: loaded kdump kernel
Aug 29 02:47:14 ibm-p9b-11 kdump-udev-throttler[66196]: Starting kdump: [OK]
Aug 29 02:47:15 ibm-p9b-11 systemd[1]: Started /usr/lib/udev/kdump-udev-throttler.
Aug 29 02:47:16 ibm-p9b-11 kdump-udev-throttler[67035]: kexec: unloaded kdump kernel
Aug 29 02:47:16 ibm-p9b-11 kdump-udev-throttler[67035]: Stopping kdump: [OK]
Aug 29 02:47:16 ibm-p9b-11 kdump-udev-throttler[67035]: Modified cmdline:ro irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 rootflags=nofail kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd elfcorehdr=158272K
Aug 29 02:47:16 ibm-p9b-11 kdump-udev-throttler[67035]: kexec: loaded kdump kernel
Aug 29 02:47:16 ibm-p9b-11 kdump-udev-throttler[67035]: Starting kdump: [OK]

Comment 12 zhenyzha 2019-08-29 08:23:13 UTC
The same steps were tested on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc, no hit this issue.

host:
# uname -r
4.18.0-137.el8.ppc64le
# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc)

guest:
# uname -r
4.18.0-139.el8.ppc64le

cpuset_hotplug                    PASS
cpuset_regression_test            PASS

so close this bug


Note You need to log in before you can comment on or make changes to this bug.