Bug 675294
Summary: | [RHEL6.1] s/390x hang while running LTP test | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> |
Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
Status: | CLOSED ERRATA | QA Contact: | Mike Gahagan <mgahagan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.1 | CC: | arozansk, jstancek, mzywusko, pbunyan |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | s390x | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-117.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-23 20:39:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Burke
2011-02-04 19:50:06 UTC
I have narrowed the culprit down to this upstream backport commit. I have reproduce the hang with 2 separate upstream kernels as well as RHEL6.1 32-96 and beyond. ------------------------------------------------------------------------ Subject: sched: Change nohz idle load balancing logic to push model From: Larry Woodman <lwoodman> Author: Venkatesh Pallipadi <venki> Date: Fri May 21 17:09:41 2010 -0700 sched: Change nohz idle load balancing logic to push model mainline commit 83cd4fe27ad8446619b2e030b171b858501de87d In the new push model, all idle CPUs indeed go into nohz mode. There is still the concept of idle load balancer (performing the load balancing on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz balancer when any of the nohz CPUs need idle load balancing. The kickee CPU does the idle load balancing on behalf of all idle CPUs instead of the normal idle balance. This addresses the below two problems with the current nohz ilb logic: * the idle load balancer continued to have periodic ticks during idle and wokeup frequently, even though it did not have any rebalancing to do on behalf of any of the idle CPUs. * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic wakeup can result in a periodic additional interrupt on a CPU doing the timer broadcast. Also currently we are migrating the unpinned timers from an idle to the cpu doing idle load balancing (when all the cpus in the system are idle, there is no idle load balancing cpu and timers get added to the same idle cpu where the request was made. So the existing optimization works only on semi idle system). And In semi idle system, we no longer have periodic ticks on the idle load balancer CPU. Using that cpu will add more delays to the timers than intended (as that cpu's timer base may not be uptodate wrt jiffies etc). This was causing mysterious slowdowns during boot etc. --------------------------------------------------------------------- The problem for some reason only on the on the s390x nohz_balancer_kick() calls __smp_call_function_single() which calls csd_lock() which calls csd_lock_wait(). For some reason(unknown yet) the system is spinning in this loop yes every dump shows the data->flags is zero. /* * csd_lock/csd_unlock used to serialize access to per-cpu csd resources * * For non-synchronous ipi calls the csd can still be in use by the * previous function call. For multi-cpu calls its even more interesting * as we'll have to ensure no other cpu is observing our csd. */ static void csd_lock_wait(struct call_single_data *data) { while (data->flags & CSD_FLAG_LOCK) cpu_relax(); } Larry This upstream patch was missing from RHEL6.1: commit 27c379f7f89a4d558c685b5d89b5ba2fe79ae701 Author: Heiko Carstens <heiko.carstens.com> Date: Fri Sep 10 13:47:29 2010 +0200 generic-ipi: Fix deadlock in __smp_call_function_single Just got my 6 way machine to a state where cpu 0 is in an endless loop within __smp_call_function_single. All other cpus are idle. The call trace on cpu 0 looks like this: __smp_call_function_single scheduler_tick update_process_times tick_sched_timer __run_hrtimer hrtimer_interrupt clock_comparator_work do_extint ext_int_handler ----> timer irq cpu_idle __smp_call_function_single() got called from nohz_balancer_kick() (inlined) with the remote cpu being 1, wait being 0 and the per cpu variable remote_sched_softirq_cb (call_single_data) of the current cpu (0). Then it loops forever when it tries to grab the lock of the call_single_data, since it is already locked and enqueued on cpu 0. My theory how this could have happened: for some reason the scheduler decided to call __smp_call_function_single() on it's own cpu, and sends an IPI to itself. The interrupt stays pending since IRQs are disabled. If then the hypervisor schedules the cpu away it might happen that upon rescheduling both the IPI and the timer IRQ are pending. If then interrupts are enabled again it depends which one gets scheduled first. If the timer interrupt gets delivered first we end up with the local deadlock as seen in the calltrace above. Let's make __smp_call_function_single() check if the target cpu is the current cpu and execute the function immediately just like smp_call_function_single does. That should prevent at least the scenario described here. It might also be that the scheduler is not supposed to call __smp_call_function_single with the remote cpu being the current cpu, but that is a different issue. Signed-off-by: Heiko Carstens <heiko.carstens.com> Acked-by: Peter Zijlstra <a.p.zijlstra> Acked-by: Jens Axboe <jaxboe> Cc: Venkatesh Pallipadi <venki> Cc: Suresh Siddha <suresh.b.siddha> LKML-Reference: <20100910114729.GB2827.de.ibm.com> Signed-off-by: Ingo Molnar <mingo> ---------------------------------------------------------------------------------------------------------------- Fixes BZ675294 rhel6-ipi_deadlock.patch diff --git a/kernel/smp.c b/kernel/smp.c index 75c970c..ed6aacf 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -365,9 +365,10 @@ call: EXPORT_SYMBOL_GPL(smp_call_function_any); /** - * __smp_call_function_single(): Run a function on another CPU + * __smp_call_function_single(): Run a function on a specific CPU * @cpu: The CPU to run on. * @data: Pre-allocated and setup data structure + * @wait: If true, wait until function has completed on specified CPU. * * Like smp_call_function_single(), but allow caller to pass in a * pre-allocated data structure. Useful for embedding @data inside @@ -376,8 +377,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any); void __smp_call_function_single(int cpu, struct call_single_data *data, int wait) { - csd_lock(data); + unsigned int this_cpu; + unsigned long flags; + this_cpu = get_cpu(); /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -387,7 +390,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data, WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled() && !oops_in_progress); - generic_exec_single(cpu, data, wait); + if (cpu == this_cpu) { + local_irq_save(flags); + data->func(data->info); + local_irq_restore(flags); + } else { + csd_lock(data); + generic_exec_single(cpu, data, wait); + } + put_cpu(); } /** This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-117.el6 Confirmed ltp has run to completion with both -118 and -122 kernels so this one can be verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |