Bug 675294
| Summary: | [RHEL6.1] s/390x hang while running LTP test | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> |
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
| Status: | CLOSED ERRATA | QA Contact: | Mike Gahagan <mgahagan> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.1 | CC: | arozansk, jstancek, mzywusko, pbunyan |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | s390x | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-117.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-05-23 20:39:14 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I have narrowed the culprit down to this upstream backport commit. I have reproduce the hang with 2 separate upstream kernels as well as RHEL6.1 32-96 and beyond.
------------------------------------------------------------------------
Subject: sched: Change nohz idle load balancing logic to push model
From: Larry Woodman <lwoodman>
Author: Venkatesh Pallipadi <venki>
Date: Fri May 21 17:09:41 2010 -0700
sched: Change nohz idle load balancing logic to push model
mainline commit 83cd4fe27ad8446619b2e030b171b858501de87d
In the new push model, all idle CPUs indeed go into nohz mode. There is
still the concept of idle load balancer (performing the load balancing
on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
balancer when any of the nohz CPUs need idle load balancing.
The kickee CPU does the idle load balancing on behalf of all idle CPUs
instead of the normal idle balance.
This addresses the below two problems with the current nohz ilb logic:
* the idle load balancer continued to have periodic ticks during idle and
wokeup frequently, even though it did not have any rebalancing to do on
behalf of any of the idle CPUs.
* On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
periodic wakeup can result in a periodic additional interrupt on a CPU
doing the timer broadcast.
Also currently we are migrating the unpinned timers from an idle to the cpu
doing idle load balancing (when all the cpus in the system are idle,
there is no idle load balancing cpu and timers get added to the same idle cpu
where the request was made. So the existing optimization works only on semi idle
system).
And In semi idle system, we no longer have periodic ticks on the idle load
balancer CPU. Using that cpu will add more delays to the timers than intended
(as that cpu's timer base may not be uptodate wrt jiffies etc). This was
causing mysterious slowdowns during boot etc.
---------------------------------------------------------------------
The problem for some reason only on the on the s390x nohz_balancer_kick() calls __smp_call_function_single() which calls csd_lock() which calls csd_lock_wait().
For some reason(unknown yet) the system is spinning in this loop yes every dump shows the data->flags is zero.
/*
* csd_lock/csd_unlock used to serialize access to per-cpu csd resources
*
* For non-synchronous ipi calls the csd can still be in use by the
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
*/
static void csd_lock_wait(struct call_single_data *data)
{
while (data->flags & CSD_FLAG_LOCK)
cpu_relax();
}
Larry
This upstream patch was missing from RHEL6.1:
commit 27c379f7f89a4d558c685b5d89b5ba2fe79ae701
Author: Heiko Carstens <heiko.carstens.com>
Date: Fri Sep 10 13:47:29 2010 +0200
generic-ipi: Fix deadlock in __smp_call_function_single
Just got my 6 way machine to a state where cpu 0 is in an
endless loop within __smp_call_function_single.
All other cpus are idle.
The call trace on cpu 0 looks like this:
__smp_call_function_single
scheduler_tick
update_process_times
tick_sched_timer
__run_hrtimer
hrtimer_interrupt
clock_comparator_work
do_extint
ext_int_handler
----> timer irq
cpu_idle
__smp_call_function_single() got called from nohz_balancer_kick()
(inlined) with the remote cpu being 1, wait being 0 and the per
cpu variable remote_sched_softirq_cb (call_single_data) of the
current cpu (0).
Then it loops forever when it tries to grab the lock of the
call_single_data, since it is already locked and enqueued on cpu 0.
My theory how this could have happened: for some reason the
scheduler decided to call __smp_call_function_single() on it's own
cpu, and sends an IPI to itself. The interrupt stays pending
since IRQs are disabled. If then the hypervisor schedules the
cpu away it might happen that upon rescheduling both the IPI and
the timer IRQ are pending. If then interrupts are enabled again
it depends which one gets scheduled first.
If the timer interrupt gets delivered first we end up with the
local deadlock as seen in the calltrace above.
Let's make __smp_call_function_single() check if the target cpu is
the current cpu and execute the function immediately just like
smp_call_function_single does. That should prevent at least the
scenario described here.
It might also be that the scheduler is not supposed to call
__smp_call_function_single with the remote cpu being the current
cpu, but that is a different issue.
Signed-off-by: Heiko Carstens <heiko.carstens.com>
Acked-by: Peter Zijlstra <a.p.zijlstra>
Acked-by: Jens Axboe <jaxboe>
Cc: Venkatesh Pallipadi <venki>
Cc: Suresh Siddha <suresh.b.siddha>
LKML-Reference: <20100910114729.GB2827.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo>
----------------------------------------------------------------------------------------------------------------
Fixes BZ675294
rhel6-ipi_deadlock.patch
diff --git a/kernel/smp.c b/kernel/smp.c
index 75c970c..ed6aacf 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -365,9 +365,10 @@ call:
EXPORT_SYMBOL_GPL(smp_call_function_any);
/**
- * __smp_call_function_single(): Run a function on another CPU
+ * __smp_call_function_single(): Run a function on a specific CPU
* @cpu: The CPU to run on.
* @data: Pre-allocated and setup data structure
+ * @wait: If true, wait until function has completed on specified CPU.
*
* Like smp_call_function_single(), but allow caller to pass in a
* pre-allocated data structure. Useful for embedding @data inside
@@ -376,8 +377,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any);
void __smp_call_function_single(int cpu, struct call_single_data *data,
int wait)
{
- csd_lock(data);
+ unsigned int this_cpu;
+ unsigned long flags;
+ this_cpu = get_cpu();
/*
* Can deadlock when called with interrupts disabled.
* We allow cpu's that are not yet online though, as no one else can
@@ -387,7 +390,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
&& !oops_in_progress);
- generic_exec_single(cpu, data, wait);
+ if (cpu == this_cpu) {
+ local_irq_save(flags);
+ data->func(data->info);
+ local_irq_restore(flags);
+ } else {
+ csd_lock(data);
+ generic_exec_single(cpu, data, wait);
+ }
+ put_cpu();
}
/**
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-117.el6 Confirmed ltp has run to completion with both -118 and -122 kernels so this one can be verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |
Description of problem: While running the kernel testing. the LTP test causes the system to hang. Version-Release number of selected component (if applicable): 2.6.32-96.el6 How reproducible: 99% of the time Steps to Reproduce: 1. Install RHEL6 GA s/390x 2. Install Kernel 2.6.32-96.el6 or greater 3. Run the upstream LTP testsuite Actual results: [-- MARK -- Wed Feb 2 17:45:00 2011] logger: 2011-02-02 17:46:36 /usr/bin/rhts-test-runner.sh 5992 1620 hearbeat... logger: 2011-02-02 17:47:36 /usr/bin/rhts-test-runner.sh 5992 1680 hearbeat... logger: 2011-02-02 17:48:36 /usr/bin/rhts-test-runner.sh 5992 1740 hearbeat... 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial C PU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 logger: 2011-02-02 17:49:35 /usr/bin/rhts-test-runner.sh 5992 1800 hearbeat... [-- MARK -- Wed Feb 2 17:50:00 2011] logger: 2011-02-02 17:50:35 /usr/bin/rhts-test-runner.sh 5992 1860 hearbeat... <000003c000985d3c> ext4_dirty_inode+0x38/0x74 ext4 <000000000027a40e> __mark_inode_dirty+0x46/0x198 <0000000000269ad0> touch_atime+0x138/0x170 <00000000001f022c> generic_file_aio_read+0x418/0x7ac <000000000024f354> do_sync_read+0xf0/0x154 <0000000000250348> vfs_read+0xa0/0x1a0 <000000000025054a> SyS_read+0x5a/0xac <000000000011860c> sysc_tracego+0xe/0x14 <000002000012fe90> 0x2000012fe90 [-- MARK -- Wed Feb 2 17:55:00 2011] INFO: task plymouthd:175 blocked for more than 120 seconds. "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. plymouthd D 000003c0008636b4 0 175 1 0x00000000 00000000000005ff 0000000000000600 00000000000000a8 0000000000000000 0000000000ff4e00 0000000000fe4e00 0000000000000600 00000000007b3cf8 0000000000000000 0000000000000000 000000000236e140 000000000070ee98 00000000007a5e00 000000000236e5d8 000000001f992040 0000000000ff4e00 00000000004c4c78 00000000004bb1be 00000000023bf818 00000000023bf9d0 Call Trace: (<00000000004bb1be> schedule+0x5aa/0xf84) <000003c0008636b4> start_this_handle+0x308/0x5e0 jbd2 <000003c000863ba4> jbd2_journal_start+0xd8/0x118 jbd2 <000003c000985d3c> ext4_dirty_inode+0x38/0x74 ext4 <000000000027a40e> __mark_inode_dirty+0x46/0x198 <000000000026993c> file_update_time+0x110/0x16c <00000000001ef812> __generic_file_aio_write+0x256/0x448 <00000000001efa72> generic_file_aio_write+0x6e/0xf4 <000003c000980286> ext4_file_write+0x7e/0x21c ext4 <000000000024f200> do_sync_write+0xf0/0x154 <0000000000250054> vfs_write+0xa0/0x1a0 <0000000000250256> SyS_write+0x5a/0xac <00000000001184d4> sysc_noemu+0x10/0x16 <0000020000206f20> 0x20000206f20 INFO: task jbd2/dm-0-8:469 blocked for more than 120 seconds. "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/dm-0-8 D 000003c000864010 0 469 2 0x00000000 00000000000005ff 0000000000000600 00000000000000a8 0000000000000000 0000000000ff4e00 0000000000fe4e00 0000000000000600 00000000007b3cf8 0000000000000000 000000001cbdeb90 0000000000000000 000000000070ee98 00000000007a5e00 000000001cbdf028 000000001f992040 0000000000ff4e00 00000000004c4c78 00000000004bb1be 000000000240fab0 000000000240fc68 Call Trace: (<00000000004bb1be> schedule+0x5aa/0xf84) <000003c000864010> jbd2_journal_commit_transaction+0x1c8/0x1a94 jbd2 <000003c00086c47e> kjournald2+0xde/0x2c0 jbd2 <000000000016cbac> kthread+0xa4/0xac <0000000000109dea> kernel_thread_starter+0x6/0xc <0000000000109de4> kernel_thread_starter+0x0/0xc Expected results: Additional info: