672753 – cpu offlining high latencies while testing KSM through ksm01(LTP) program on s390x

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 672753 - cpu offlining high latencies while testing KSM through ksm01(LTP) program on s390x

Summary: cpu offlining high latencies while testing KSM through ksm01(LTP) program on ...

Keywords:
Status:	CLOSED DUPLICATE of bug 557364
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	s390x
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrea Arcangeli
QA Contact:	Caspar Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	702988
TreeView+	depends on / blocked

Reported:	2011-01-26 09:00 UTC by Zhouping Liu
Modified:	2014-01-13 00:01 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause Running LTP (Linux Testing Project) test suite on s390x platform Consequence cpu offlining high latencies could happen.
Clone Of:
Environment:
Last Closed:	2011-06-03 13:00:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Zhouping Liu 2011-01-26 09:00:08 UTC

Description of problem:
System hung when I test KSM split THP on s390x, the hang is only occurred on s390x, and the other arches are okay. you can get more details from BZ647334. 

Version-Release number of selected component (if applicable):
[root@ibm-z10-11 ksm]# uname -ri
2.6.32-97.el6.test.s390x s390x
you can obtained the specified kernel from the below link:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3032335

How reproducible:
always 100%

Steps to Reproduce:
1. download ltp and install it:
# git clone git://ltp.git.sourceforge.net/gitroot/ltp/ltp ltp-git; cd ltp-git;
make autotools; ./configure; make
2. execute the ksm01 program: 
# cd /root/ltp-git/testcases/kernel/mem/ksm; ./ksm01 -i 30
...  

Actual results:
a few minutes later, you would find the hung messages:
#dmesg
...
cpu: Processor 1 started, address 0, identification 32C5C2
INFO: task cpuplugd:1690 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpuplugd      D 00000000004ab5d2     0  1690      1 0x00000200
00000000028437a0 0000000000fe1900 00000000028437a0 00000000028437c8 
       0000000002510378 00000000007a4900 0000000000fe1900 0000000002510378 
       0000000002510378 0000000000000001 0000000002843948 000000000070ee68 
       00000000007a4900 0000000002609318 0000000002510340 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 0000000002843800 0000000002843910 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004ab5d2>] schedule_timeout+0x242/0x340
 [<00000000004aa0e2>] wait_for_common+0x11a/0x194
 [<0000000000167be8>] synchronize_sched+0x74/0x7c
 [<0000000000137490>] free_rootdomain+0x28/0x44
 [<00000000001376ac>] rq_attach_root+0x200/0x23c
 [<0000000000137f2a>] cpu_attach_domain+0x1ce/0x250
 [<00000000001393e2>] partition_sched_domains+0x166/0x640
 [<00000000001a1c4e>] cpuset_track_online_cpus+0xc6/0xdc
 [<00000000004aea94>] notifier_call_chain+0x5c/0xa0
 [<00000000001711ec>] __raw_notifier_call_chain+0x24/0x30
 [<00000000004a1250>] _cpu_down+0xf8/0x35c
 [<00000000004a1510>] cpu_down+0x5c/0x70
 [<00000000004a2fac>] store_online+0x60/0xcc
 [<00000000002c63ac>] sysfs_write_file+0xe0/0x194
 [<0000000000248854>] vfs_write+0xa0/0x1a0
 [<0000000000248a56>] SyS_write+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbef4>] 0x406d9dbef4
INFO: task cpuplugd:1693 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpuplugd      D 00000000004abbfe     0  1693      1 0x00000200
00000000023a3aa0 00000000023a3aa0 00000000007a4900 00000000024c64d8 
       0000000000000001 0000000000fe1900 00000000024c6040 000000000013d55e 
       00000000023a3aa0 0000000000000000 00000000024c6040 000000000070ee68 
       00000000007a4900 00000000024c64d8 0000000000710f30 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 00000000023a3af0 00000000023a3c00 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148
 [<00000000004abcfa>] mutex_lock+0x5a/0x60
 [<0000000000147672>] get_online_cpus+0x3a/0x60
 [<000000000020c774>] all_vm_events+0x28/0x234
 [<000000000020ca4a>] vmstat_start+0xca/0x160
 [<000000000026bba4>] seq_read+0x19c/0x510
 [<00000000002b3166>] proc_reg_read+0x9e/0xe4
 [<0000000000248b48>] vfs_read+0xa0/0x1a0
 [<0000000000248d4a>] SyS_read+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbe64>] 0x406d9dbe64
INFO: task sadc:2740 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sadc          D 00000000004abbfe     0  2740   2738 0x00000204
000000001cc4faa0 000000001cc4faa0 00000000007a4900 000000001f629758 
       0000000000000001 0000000000fe1900 000000001f6292c0 000000000013d55e 
       000000001cc4faa0 0000000000000000 000000001f6292c0 000000000070ee68 
       00000000007a4900 000000001f629758 0000000000710f30 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 000000001cc4faf0 000000001cc4fc00 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148
 [<00000000004abcfa>] mutex_lock+0x5a/0x60
 [<0000000000147672>] get_online_cpus+0x3a/0x60
 [<000000000020c774>] all_vm_events+0x28/0x234
 [<000000000020ca4a>] vmstat_start+0xca/0x160
 [<000000000026bba4>] seq_read+0x19c/0x510
 [<00000000002b3166>] proc_reg_read+0x9e/0xe4
 [<0000000000248b48>] vfs_read+0xa0/0x1a0
 [<0000000000248d4a>] SyS_read+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbe90>] 0x406d9dbe90
INFO: task cpuplugd:1690 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpuplugd      D 00000000004ab5d2     0  1690      1 0x00000200
00000000028437a0 0000000000fe1900 00000000028437a0 00000000028437c8 
       0000000002510378 00000000007a4900 0000000000fe1900 0000000002510378 
       0000000002510378 0000000000000001 0000000002843948 000000000070ee68 
       00000000007a4900 0000000002609318 0000000002510340 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 0000000002843800 0000000002843910 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004ab5d2>] schedule_timeout+0x242/0x340
 [<00000000004aa0e2>] wait_for_common+0x11a/0x194
 [<0000000000167be8>] synchronize_sched+0x74/0x7c
 [<0000000000137490>] free_rootdomain+0x28/0x44
 [<00000000001376ac>] rq_attach_root+0x200/0x23c
 [<0000000000137f2a>] cpu_attach_domain+0x1ce/0x250
 [<00000000001393e2>] partition_sched_domains+0x166/0x640
 [<00000000001a1c4e>] cpuset_track_online_cpus+0xc6/0xdc
 [<00000000004aea94>] notifier_call_chain+0x5c/0xa0
 [<00000000001711ec>] __raw_notifier_call_chain+0x24/0x30
 [<00000000004a1250>] _cpu_down+0xf8/0x35c
 [<00000000004a1510>] cpu_down+0x5c/0x70
 [<00000000004a2fac>] store_online+0x60/0xcc
 [<00000000002c63ac>] sysfs_write_file+0xe0/0x194
 [<0000000000248854>] vfs_write+0xa0/0x1a0
 [<0000000000248a56>] SyS_write+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbef4>] 0x406d9dbef4
INFO: task cpuplugd:1693 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpuplugd      D 00000000004abbfe     0  1693      1 0x00000200
00000000023a3aa0 00000000023a3aa0 00000000007a4900 00000000024c64d8 
       0000000000000001 0000000000fe1900 00000000024c6040 000000000013d55e 
       00000000023a3aa0 0000000000000000 00000000024c6040 000000000070ee68 
       00000000007a4900 00000000024c64d8 0000000000710f30 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 00000000023a3af0 00000000023a3c00 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148
 [<00000000004abcfa>] mutex_lock+0x5a/0x60
 [<0000000000147672>] get_online_cpus+0x3a/0x60
 [<000000000020c774>] all_vm_events+0x28/0x234
 [<000000000020ca4a>] vmstat_start+0xca/0x160
 [<000000000026bba4>] seq_read+0x19c/0x510
 [<00000000002b3166>] proc_reg_read+0x9e/0xe4
 [<0000000000248b48>] vfs_read+0xa0/0x1a0
 [<0000000000248d4a>] SyS_read+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbe64>] 0x406d9dbe64
INFO: task sadc:2740 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sadc          D 00000000004abbfe     0  2740   2738 0x00000204
000000001cc4faa0 000000001cc4faa0 00000000007a4900 000000001f629758 
       0000000000000001 0000000000fe1900 000000001f6292c0 000000000013d55e 
       000000001cc4faa0 0000000000000000 000000001f6292c0 000000000070ee68 
       00000000007a4900 000000001f629758 0000000000710f30 0000000000fe1900 
       00000000004b4b38 00000000004aa7be 000000001cc4faf0 000000001cc4fc00 
Call Trace:
([<00000000004aa7be>] schedule+0x59a/0xf30)
 [<00000000004abbfe>] __mutex_lock_slowpath+0xa6/0x148
 [<00000000004abcfa>] mutex_lock+0x5a/0x60
 [<0000000000147672>] get_online_cpus+0x3a/0x60
 [<000000000020c774>] all_vm_events+0x28/0x234
 [<000000000020ca4a>] vmstat_start+0xca/0x160
 [<000000000026bba4>] seq_read+0x19c/0x510
 [<00000000002b3166>] proc_reg_read+0x9e/0xe4
 [<0000000000248b48>] vfs_read+0xa0/0x1a0
 [<0000000000248d4a>] SyS_read+0x5a/0xac
 [<00000000001185a4>] sysc_tracego+0xe/0x14
 [<000000406d9dbe90>] 0x406d9dbe90

Expected results:
no these dmesg

Additional info:
I did the same actions on RHEL GA kernel and 2.6.32-98.el6.s390x, it's okay without these hung messages.

Comment 10 Qian Cai 2011-05-17 06:39:26 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Running LTP (Linux Testing Project) test suite on s390x 
    platform
Consequence
    cpu offlining high latencies could happen.

Comment 11 Rik van Riel 2011-06-02 23:09:14 UTC

It is worth noting that s390x has no THP support, so no THP splitting can have been involved in this bug.

I am not sure why we have KSM enabled on s390x, because we do not support KSM there and I am not aware of anybody running any other KSM-using programs.

Comment 12 Rik van Riel 2011-06-02 23:10:36 UTC

Oh, the cpuplugd backtraces look like the cpu hotplug issue that Larry Woodman was working on in another (scheduler related) BZ.  Do they still happen with the latest 6.2 development kernel?

Comment 13 Caspar Zhang 2011-06-03 04:27:27 UTC

no such hung task appear in -155.el6, only the following messages printed:

00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01.
cpu: Processor 1 started, address 0, identification 32C5C2
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
cpu: Processor 1 stopped
00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01.
cpu: Processor 1 started, address 0, identification 32C5C2
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
cpu: Processor 1 stopped
00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01.
cpu: Processor 1 started, address 0, identification 32C5C2
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
cpu: Processor 1 stopped
00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01.
cpu: Processor 1 started, address 0, identification 32C5C2
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
cpu: Processor 1 stopped
00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial CPU reset from CPU 01.
cpu: Processor 1 started, address 0, identification 32C5C2
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.

Note You need to log in before you can comment on or make changes to this bug.