Red Hat Bugzilla – Bug 675294
[RHEL6.1] s/390x hang while running LTP test
Last modified: 2011-05-23 16:39:14 EDT
Description of problem: While running the kernel testing. the LTP test causes the system to hang. Version-Release number of selected component (if applicable): 2.6.32-96.el6 How reproducible: 99% of the time Steps to Reproduce: 1. Install RHEL6 GA s/390x 2. Install Kernel 2.6.32-96.el6 or greater 3. Run the upstream LTP testsuite Actual results: [-- MARK -- Wed Feb 2 17:45:00 2011] logger: 2011-02-02 17:46:36 /usr/bin/rhts-test-runner.sh 5992 1620 hearbeat... logger: 2011-02-02 17:47:36 /usr/bin/rhts-test-runner.sh 5992 1680 hearbeat... logger: 2011-02-02 17:48:36 /usr/bin/rhts-test-runner.sh 5992 1740 hearbeat... 00: HCPGSP2627I The virtual machine is placed in CP mode due to a SIGP initial C PU reset from CPU 01. cpu: Processor 1 started, address 0, identification 32C5C2 logger: 2011-02-02 17:49:35 /usr/bin/rhts-test-runner.sh 5992 1800 hearbeat... [-- MARK -- Wed Feb 2 17:50:00 2011] logger: 2011-02-02 17:50:35 /usr/bin/rhts-test-runner.sh 5992 1860 hearbeat... <000003c000985d3c> ext4_dirty_inode+0x38/0x74 ext4 <000000000027a40e> __mark_inode_dirty+0x46/0x198 <0000000000269ad0> touch_atime+0x138/0x170 <00000000001f022c> generic_file_aio_read+0x418/0x7ac <000000000024f354> do_sync_read+0xf0/0x154 <0000000000250348> vfs_read+0xa0/0x1a0 <000000000025054a> SyS_read+0x5a/0xac <000000000011860c> sysc_tracego+0xe/0x14 <000002000012fe90> 0x2000012fe90 [-- MARK -- Wed Feb 2 17:55:00 2011] INFO: task plymouthd:175 blocked for more than 120 seconds. "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. plymouthd D 000003c0008636b4 0 175 1 0x00000000 00000000000005ff 0000000000000600 00000000000000a8 0000000000000000 0000000000ff4e00 0000000000fe4e00 0000000000000600 00000000007b3cf8 0000000000000000 0000000000000000 000000000236e140 000000000070ee98 00000000007a5e00 000000000236e5d8 000000001f992040 0000000000ff4e00 00000000004c4c78 00000000004bb1be 00000000023bf818 00000000023bf9d0 Call Trace: (<00000000004bb1be> schedule+0x5aa/0xf84) <000003c0008636b4> start_this_handle+0x308/0x5e0 jbd2 <000003c000863ba4> jbd2_journal_start+0xd8/0x118 jbd2 <000003c000985d3c> ext4_dirty_inode+0x38/0x74 ext4 <000000000027a40e> __mark_inode_dirty+0x46/0x198 <000000000026993c> file_update_time+0x110/0x16c <00000000001ef812> __generic_file_aio_write+0x256/0x448 <00000000001efa72> generic_file_aio_write+0x6e/0xf4 <000003c000980286> ext4_file_write+0x7e/0x21c ext4 <000000000024f200> do_sync_write+0xf0/0x154 <0000000000250054> vfs_write+0xa0/0x1a0 <0000000000250256> SyS_write+0x5a/0xac <00000000001184d4> sysc_noemu+0x10/0x16 <0000020000206f20> 0x20000206f20 INFO: task jbd2/dm-0-8:469 blocked for more than 120 seconds. "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/dm-0-8 D 000003c000864010 0 469 2 0x00000000 00000000000005ff 0000000000000600 00000000000000a8 0000000000000000 0000000000ff4e00 0000000000fe4e00 0000000000000600 00000000007b3cf8 0000000000000000 000000001cbdeb90 0000000000000000 000000000070ee98 00000000007a5e00 000000001cbdf028 000000001f992040 0000000000ff4e00 00000000004c4c78 00000000004bb1be 000000000240fab0 000000000240fc68 Call Trace: (<00000000004bb1be> schedule+0x5aa/0xf84) <000003c000864010> jbd2_journal_commit_transaction+0x1c8/0x1a94 jbd2 <000003c00086c47e> kjournald2+0xde/0x2c0 jbd2 <000000000016cbac> kthread+0xa4/0xac <0000000000109dea> kernel_thread_starter+0x6/0xc <0000000000109de4> kernel_thread_starter+0x0/0xc Expected results: Additional info:
I have narrowed the culprit down to this upstream backport commit. I have reproduce the hang with 2 separate upstream kernels as well as RHEL6.1 32-96 and beyond. ------------------------------------------------------------------------ Subject: sched: Change nohz idle load balancing logic to push model From: Larry Woodman <lwoodman@redhat.com> Author: Venkatesh Pallipadi <venki@google.com> Date: Fri May 21 17:09:41 2010 -0700 sched: Change nohz idle load balancing logic to push model mainline commit 83cd4fe27ad8446619b2e030b171b858501de87d In the new push model, all idle CPUs indeed go into nohz mode. There is still the concept of idle load balancer (performing the load balancing on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz balancer when any of the nohz CPUs need idle load balancing. The kickee CPU does the idle load balancing on behalf of all idle CPUs instead of the normal idle balance. This addresses the below two problems with the current nohz ilb logic: * the idle load balancer continued to have periodic ticks during idle and wokeup frequently, even though it did not have any rebalancing to do on behalf of any of the idle CPUs. * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic wakeup can result in a periodic additional interrupt on a CPU doing the timer broadcast. Also currently we are migrating the unpinned timers from an idle to the cpu doing idle load balancing (when all the cpus in the system are idle, there is no idle load balancing cpu and timers get added to the same idle cpu where the request was made. So the existing optimization works only on semi idle system). And In semi idle system, we no longer have periodic ticks on the idle load balancer CPU. Using that cpu will add more delays to the timers than intended (as that cpu's timer base may not be uptodate wrt jiffies etc). This was causing mysterious slowdowns during boot etc. --------------------------------------------------------------------- The problem for some reason only on the on the s390x nohz_balancer_kick() calls __smp_call_function_single() which calls csd_lock() which calls csd_lock_wait(). For some reason(unknown yet) the system is spinning in this loop yes every dump shows the data->flags is zero. /* * csd_lock/csd_unlock used to serialize access to per-cpu csd resources * * For non-synchronous ipi calls the csd can still be in use by the * previous function call. For multi-cpu calls its even more interesting * as we'll have to ensure no other cpu is observing our csd. */ static void csd_lock_wait(struct call_single_data *data) { while (data->flags & CSD_FLAG_LOCK) cpu_relax(); } Larry
This upstream patch was missing from RHEL6.1: commit 27c379f7f89a4d558c685b5d89b5ba2fe79ae701 Author: Heiko Carstens <heiko.carstens@de.ibm.com> Date: Fri Sep 10 13:47:29 2010 +0200 generic-ipi: Fix deadlock in __smp_call_function_single Just got my 6 way machine to a state where cpu 0 is in an endless loop within __smp_call_function_single. All other cpus are idle. The call trace on cpu 0 looks like this: __smp_call_function_single scheduler_tick update_process_times tick_sched_timer __run_hrtimer hrtimer_interrupt clock_comparator_work do_extint ext_int_handler ----> timer irq cpu_idle __smp_call_function_single() got called from nohz_balancer_kick() (inlined) with the remote cpu being 1, wait being 0 and the per cpu variable remote_sched_softirq_cb (call_single_data) of the current cpu (0). Then it loops forever when it tries to grab the lock of the call_single_data, since it is already locked and enqueued on cpu 0. My theory how this could have happened: for some reason the scheduler decided to call __smp_call_function_single() on it's own cpu, and sends an IPI to itself. The interrupt stays pending since IRQs are disabled. If then the hypervisor schedules the cpu away it might happen that upon rescheduling both the IPI and the timer IRQ are pending. If then interrupts are enabled again it depends which one gets scheduled first. If the timer interrupt gets delivered first we end up with the local deadlock as seen in the calltrace above. Let's make __smp_call_function_single() check if the target cpu is the current cpu and execute the function immediately just like smp_call_function_single does. That should prevent at least the scenario described here. It might also be that the scheduler is not supposed to call __smp_call_function_single with the remote cpu being the current cpu, but that is a different issue. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <jaxboe@fusionio.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100910114729.GB2827@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> ---------------------------------------------------------------------------------------------------------------- Fixes BZ675294 rhel6-ipi_deadlock.patch diff --git a/kernel/smp.c b/kernel/smp.c index 75c970c..ed6aacf 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -365,9 +365,10 @@ call: EXPORT_SYMBOL_GPL(smp_call_function_any); /** - * __smp_call_function_single(): Run a function on another CPU + * __smp_call_function_single(): Run a function on a specific CPU * @cpu: The CPU to run on. * @data: Pre-allocated and setup data structure + * @wait: If true, wait until function has completed on specified CPU. * * Like smp_call_function_single(), but allow caller to pass in a * pre-allocated data structure. Useful for embedding @data inside @@ -376,8 +377,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any); void __smp_call_function_single(int cpu, struct call_single_data *data, int wait) { - csd_lock(data); + unsigned int this_cpu; + unsigned long flags; + this_cpu = get_cpu(); /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -387,7 +390,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data, WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled() && !oops_in_progress); - generic_exec_single(cpu, data, wait); + if (cpu == this_cpu) { + local_irq_save(flags); + data->func(data->info); + local_irq_restore(flags); + } else { + csd_lock(data); + generic_exec_single(cpu, data, wait); + } + put_cpu(); } /**
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-117.el6
Confirmed ltp has run to completion with both -118 and -122 kernels so this one can be verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html