Hide Forgot
Description of problem: System hung when I test KSM split THP on s390x, the hang is only occurred on s390x, and the other arches are okay. you can get more details from BZ647334. Version-Release number of selected component (if applicable): [root@ibm-z10-11 ksm]# uname -ri 2.6.32-97.el6.test.s390x s390x you can obtained the specified kernel from the below link: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3032335 How reproducible: always 100% Steps to Reproduce: 1. download ltp and install it: # git clone git://ltp.git.sourceforge.net/gitroot/ltp/ltp ltp-git; cd ltp-git; make autotools; ./configure; make 2. execute the ksm01 program: # cd /root/ltp-git/testcases/kernel/mem/ksm; ./ksm01 -i 30 ... Actual results: a few minutes later, you would find the hung messages: #dmesg ... cpu: Processor 1 started, address 0, identification 32C5C2 INFO: task cpuplugd:1690 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cpuplugd D 00000000004ab5d2 0 1690 1 0x00000200 00000000028437a0 0000000000fe1900 00000000028437a0 00000000028437c8 0000000002510378 00000000007a4900 0000000000fe1900 0000000002510378 0000000002510378 0000000000000001 0000000002843948 000000000070ee68 00000000007a4900 0000000002609318 0000000002510340 0000000000fe1900 00000000004b4b38 00000000004aa7be 0000000002843800 0000000002843910 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004ab5d2>] schedule_timeout+0x242/0x340 [<00000000004aa0e2>] wait_for_common+0x11a/0x194 [<0000000000167be8>] synchronize_sched+0x74/0x7c [<0000000000137490>] free_rootdomain+0x28/0x44 [<00000000001376ac>] rq_attach_root+0x200/0x23c [<0000000000137f2a>] cpu_attach_domain+0x1ce/0x250 [<00000000001393e2>] partition_sched_domains+0x166/0x640 [<00000000001a1c4e>] cpuset_track_online_cpus+0xc6/0xdc [<00000000004aea94>] notifier_call_chain+0x5c/0xa0 [<00000000001711ec>] __raw_notifier_call_chain+0x24/0x30 [<00000000004a1250>] _cpu_down+0xf8/0x35c [<00000000004a1510>] cpu_down+0x5c/0x70 [<00000000004a2fac>] store_online+0x60/0xcc [<00000000002c63ac>] sysfs_write_file+0xe0/0x194 [<0000000000248854>] vfs_write+0xa0/0x1a0 [<0000000000248a56>] SyS_write+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbef4>] 0x406d9dbef4 INFO: task cpuplugd:1693 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cpuplugd D 00000000004abbfe 0 1693 1 0x00000200 00000000023a3aa0 00000000023a3aa0 00000000007a4900 00000000024c64d8 0000000000000001 0000000000fe1900 00000000024c6040 000000000013d55e 00000000023a3aa0 0000000000000000 00000000024c6040 000000000070ee68 00000000007a4900 00000000024c64d8 0000000000710f30 0000000000fe1900 00000000004b4b38 00000000004aa7be 00000000023a3af0 00000000023a3c00 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148 [<00000000004abcfa>] mutex_lock+0x5a/0x60 [<0000000000147672>] get_online_cpus+0x3a/0x60 [<000000000020c774>] all_vm_events+0x28/0x234 [<000000000020ca4a>] vmstat_start+0xca/0x160 [<000000000026bba4>] seq_read+0x19c/0x510 [<00000000002b3166>] proc_reg_read+0x9e/0xe4 [<0000000000248b48>] vfs_read+0xa0/0x1a0 [<0000000000248d4a>] SyS_read+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbe64>] 0x406d9dbe64 INFO: task sadc:2740 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sadc D 00000000004abbfe 0 2740 2738 0x00000204 000000001cc4faa0 000000001cc4faa0 00000000007a4900 000000001f629758 0000000000000001 0000000000fe1900 000000001f6292c0 000000000013d55e 000000001cc4faa0 0000000000000000 000000001f6292c0 000000000070ee68 00000000007a4900 000000001f629758 0000000000710f30 0000000000fe1900 00000000004b4b38 00000000004aa7be 000000001cc4faf0 000000001cc4fc00 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148 [<00000000004abcfa>] mutex_lock+0x5a/0x60 [<0000000000147672>] get_online_cpus+0x3a/0x60 [<000000000020c774>] all_vm_events+0x28/0x234 [<000000000020ca4a>] vmstat_start+0xca/0x160 [<000000000026bba4>] seq_read+0x19c/0x510 [<00000000002b3166>] proc_reg_read+0x9e/0xe4 [<0000000000248b48>] vfs_read+0xa0/0x1a0 [<0000000000248d4a>] SyS_read+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbe90>] 0x406d9dbe90 INFO: task cpuplugd:1690 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cpuplugd D 00000000004ab5d2 0 1690 1 0x00000200 00000000028437a0 0000000000fe1900 00000000028437a0 00000000028437c8 0000000002510378 00000000007a4900 0000000000fe1900 0000000002510378 0000000002510378 0000000000000001 0000000002843948 000000000070ee68 00000000007a4900 0000000002609318 0000000002510340 0000000000fe1900 00000000004b4b38 00000000004aa7be 0000000002843800 0000000002843910 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004ab5d2>] schedule_timeout+0x242/0x340 [<00000000004aa0e2>] wait_for_common+0x11a/0x194 [<0000000000167be8>] synchronize_sched+0x74/0x7c [<0000000000137490>] free_rootdomain+0x28/0x44 [<00000000001376ac>] rq_attach_root+0x200/0x23c [<0000000000137f2a>] cpu_attach_domain+0x1ce/0x250 [<00000000001393e2>] partition_sched_domains+0x166/0x640 [<00000000001a1c4e>] cpuset_track_online_cpus+0xc6/0xdc [<00000000004aea94>] notifier_call_chain+0x5c/0xa0 [<00000000001711ec>] __raw_notifier_call_chain+0x24/0x30 [<00000000004a1250>] _cpu_down+0xf8/0x35c [<00000000004a1510>] cpu_down+0x5c/0x70 [<00000000004a2fac>] store_online+0x60/0xcc [<00000000002c63ac>] sysfs_write_file+0xe0/0x194 [<0000000000248854>] vfs_write+0xa0/0x1a0 [<0000000000248a56>] SyS_write+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbef4>] 0x406d9dbef4 INFO: task cpuplugd:1693 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cpuplugd D 00000000004abbfe 0 1693 1 0x00000200 00000000023a3aa0 00000000023a3aa0 00000000007a4900 00000000024c64d8 0000000000000001 0000000000fe1900 00000000024c6040 000000000013d55e 00000000023a3aa0 0000000000000000 00000000024c6040 000000000070ee68 00000000007a4900 00000000024c64d8 0000000000710f30 0000000000fe1900 00000000004b4b38 00000000004aa7be 00000000023a3af0 00000000023a3c00 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148 [<00000000004abcfa>] mutex_lock+0x5a/0x60 [<0000000000147672>] get_online_cpus+0x3a/0x60 [<000000000020c774>] all_vm_events+0x28/0x234 [<000000000020ca4a>] vmstat_start+0xca/0x160 [<000000000026bba4>] seq_read+0x19c/0x510 [<00000000002b3166>] proc_reg_read+0x9e/0xe4 [<0000000000248b48>] vfs_read+0xa0/0x1a0 [<0000000000248d4a>] SyS_read+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbe64>] 0x406d9dbe64 INFO: task sadc:2740 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sadc D 00000000004abbfe 0 2740 2738 0x00000204 000000001cc4faa0 000000001cc4faa0 00000000007a4900 000000001f629758 0000000000000001 0000000000fe1900 000000001f6292c0 000000000013d55e 000000001cc4faa0 0000000000000000 000000001f6292c0 000000000070ee68 00000000007a4900 000000001f629758 0000000000710f30 0000000000fe1900 00000000004b4b38 00000000004aa7be 000000001cc4faf0 000000001cc4fc00 Call Trace: ([<00000000004aa7be>] schedule+0x59a/0xf30) [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148 [<00000000004abcfa>] mutex_lock+0x5a/0x60 [<0000000000147672>] get_online_cpus+0x3a/0x60 [<000000000020c774>] all_vm_events+0x28/0x234 [<000000000020ca4a>] vmstat_start+0xca/0x160 [<000000000026bba4>] seq_read+0x19c/0x510 [<00000000002b3166>] proc_reg_read+0x9e/0xe4 [<0000000000248b48>] vfs_read+0xa0/0x1a0 [<0000000000248d4a>] SyS_read+0x5a/0xac [<00000000001185a4>] sysc_tracego+0xe/0x14 [<000000406d9dbe90>] 0x406d9dbe90 Expected results: no these dmesg Additional info: I did the same actions on RHEL GA kernel and 2.6.32-98.el6.s390x, it's okay without these hung messages.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Running LTP (Linux Testing Project) test suite on s390x platform Consequence cpu offlining high latencies could happen.
It is worth noting that s390x has no THP support, so no THP splitting can have been involved in this bug. I am not sure why we have KSM enabled on s390x, because we do not support KSM there and I am not aware of anybody running any other KSM-using programs.
Oh, the cpuplugd backtraces look like the cpu hotplug issue that Larry Woodman was working on in another (scheduler related) BZ. Do they still happen with the latest 6.2 development kernel?
no such hung task appear in -155.el6, only the following messages printed: 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. cpu: Processor 1 stopped 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. cpu: Processor 1 stopped 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. cpu: Processor 1 stopped 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. cpu: Processor 1 stopped 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00.