Bug 1303733
Summary: | backport of: "softirq: split timer softirqs out of ksoftirqd" | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Bristot de Oliveira <daolivei> | |
Component: | kernel-rt | Assignee: | Clark Williams <williams> | |
kernel-rt sub component: | Other | QA Contact: | Jiri Kastner <jkastner> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | bhu, lgoncalv, mkolaja, williams | |
Version: | 7.3 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | 7.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1306962 1313435 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-03 19:41:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1203710, 1274397, 1306962, 1313435 |
Description
Daniel Bristot de Oliveira
2016-02-01 19:45:29 UTC
A customer faced a RCU stall because a the timer softirq was preempted by a spinning RT thread. As the ksoftirq currently runs network load, it is not a good idea to run it on a high RT priority. On the other hand, the delay on delivering a timer, or a long preemption of the timer softirq while it holds a lock can cause damage on system's operation. Being able to increase the priority of the timer softirq can be the best solution for these potential problems. Here is the RCA of a RCU stall caused by the preemption of the timer softirq, which can be avoided by increasing timer's softirq thread priority. ==== RCA of a customer's support case ==== The system complained about a RCU Stall, and an hung task - the rcu_preempt. The rcu_preempt task is sleeping on UN state for 00:02:26.544: crash> ps -m 14 [0 00:02:26.544] [UN] PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt" rcu_preempt's stack is: crash> bt PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt" #0 [ffff880c151b3c78] __schedule at ffffffff815bf324 #1 [ffff880c151b3d20] schedule at ffffffff815bf9c4 #2 [ffff880c151b3d40] del_timer_sync at ffffffff8106b5f5 #3 [ffff880c151b3dc0] schedule_timeout at ffffffff815be334 #4 [ffff880c151b3e30] rcu_gp_kthread at ffffffff810fe55f #5 [ffff880c151b3ec0] kthread at ffffffff8108389e #6 [ffff880c151b3f50] ret_from_fork at ffffffff815c966c The task is sleeping on del_timer_sync() function... del_timer_sync() definition: #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL) /** * del_timer_sync - deactivate a timer and wait for the handler to finish. * @timer: the timer to be deactivated * * This function only differs from del_timer() on SMP: besides deactivating * the timer it also makes sure the handler has finished executing on other * CPUs. * * Synchronization rules: Callers must prevent restarting of the timer, * otherwise this function is meaningless. It must not be called from * interrupt contexts unless the timer is an irqsafe one. The caller must * not hold locks which would prevent completion of the timer's * handler. The timer's handler must not call add_timer_on(). Upon exit the * timer is not queued and the handler is not running on any CPU. * * Note: For !irqsafe timers, you must not hold locks that are held in * interrupt context while calling this function. Even if the lock has * nothing to do with the timer in question. Here's why: * * CPU0 CPU1 * ---- ---- * <SOFTIRQ> * call_timer_fn(); * base->running_timer = mytimer; * spin_lock_irq(somelock); * <IRQ> * spin_lock(somelock); * del_timer_sync(mytimer); * while (base->running_timer == mytimer); * * Now del_timer_sync() will never return and never release somelock. * The interrupt on the other CPU is waiting to grab somelock but * it has interrupted the softirq that CPU0 is waiting to finish. * * The function returns whether it has deactivated a pending timer or not. */ int del_timer_sync(struct timer_list *timer) { #ifdef CONFIG_LOCKDEP unsigned long flags; /* * If lockdep gives a backtrace here, please reference * the synchronization rules above. */ local_irq_save(flags); lock_map_acquire(&timer->lockdep_map); lock_map_release(&timer->lockdep_map); local_irq_restore(flags); #endif /* * don't use it in hardirq context, because it * could lead to deadlock. */ WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base)); for (;;) { int ret = try_to_del_timer_sync(timer); if (ret >= 0) return ret; wait_for_running_timer(timer); <--- It is waiting for a running timer. } } The wait_for_running_timer() waits for an active timer softirq to finish its work, it sleeps on a wait queue. So, there should be a ksoftirqd thread unable to run. Thus I checked runqueues and I found a ksoftirqd not able to run due to a high priority rt user-space thread on CPU 8: CPU 8 RUNQUEUE: ffff88187f8541c0 CURRENT: PID: 19306 TASK: ffff880bfefb18b0 COMMAND: "SPINNING_RT" RT PRIO_ARRAY: ffff88187f8542a0 [ 97] PID: 19306 TASK: ffff880bfefb18b0 COMMAND: "SPINNING_RT" [ 98] PID: 255 TASK: ffff8818156198b0 COMMAND: "ksoftirqd/8" CFS RB_ROOT: ffff88187f854260 [no tasks queued] These threads are able to run for more than 2 minutes: crash> ps -m 19306 [0 00:02:26.546] [RU] PID: 19306 TASK: ffff880bfefb18b0 CPU: 8 COMMAND: "SPINNING_RT" crash> ps -m 255 [0 00:02:26.546] [RU] PID: 255 TASK: ffff8818156198b0 CPU: 8 COMMAND: "ksoftirqd/8" But as SPINNING_RT has a higher priority, it is running since it was awakened. The ksoftirq was preempted by SPINNING_RT while running the timer softirq: crash> bt 255 PID: 255 TASK: ffff8818156198b0 CPU: 8 COMMAND: "ksoftirqd/8" #0 [ffff881815623b08] __schedule at ffffffff815bf324 #1 [ffff881815623bb0] preempt_schedule at ffffffff815bf77f #2 [ffff881815623bd0] _raw_spin_unlock_irqrestore at ffffffff815c1c3e #3 [ffff881815623c40] wake_up_process at ffffffff81097127 #4 [ffff881815623c60] process_timeout at ffffffff8106abae #5 [ffff881815623c70] call_timer_fn at ffffffff8106b8e9 #6 [ffff881815623cb0] run_timer_softirq at ffffffff8106bbb9 #7 [ffff881815623d50] handle_softirq at ffffffff81062ca5 #8 [ffff881815623da0] do_current_softirqs at ffffffff81062f51 #9 [ffff881815623e20] run_ksoftirqd at ffffffff81063064 #10 [ffff881815623e40] smpboot_thread_fn at ffffffff8108ccb9 #11 [ffff881815623ec0] kthread at ffffffff8108389e #12 [ffff881815623f50] ret_from_fork at ffffffff815c966c The timer softirq was on the way to wake up a thread, but it was preempted before be able to finish this operation, checking the stack to figure out which thread was about to be awakened: #1 [ffff881815623bb0] preempt_schedule at ffffffff815bf77f ffff881815623bb8: 0000000000000296 > ffff880c151a98b0 ffff881815623bc8: ffff881815623bd8 ffffffff815c1c3e #2 [ffff881815623bd0] _raw_spin_unlock_irqrestore at ffffffff815c1c3e ffff881815623bd8: ffff881815623c38 ffffffff81096f79 ffff881815623be8: ffff881815620010 00000000000141c0 ffff881815623bf8: ffff881815623d18 ffffffff00000008 ffff881815623c08: ffff881815620010 ffff880c151a98b0 ffff881815623c18: ffffffff8106aba0 ffff880c151b3dc8 ffff881815623c28: ffff881815620000 ---> ffff880c151a98b0 -> 3: that was pushed here. ffff881815623c38: ffff881815623c58 ffffffff81097127 -> 2: it was stored at rbx #3 [ffff881815623c40] wake_up_process at ffffffff81097127 -> 1: task_struct is the first argument ffff881815623c48: 0000000000000000 0000000000000000 ffff881815623c58: ffff881815623c68 ffffffff8106abae #4 [ffff881815623c60] process_timeout at ffffffff8106abae The process that was about to wake up was: crash> ps -m ffff880c151a98b0 [0 00:02:26.544] [UN] PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt" The rcu_preempt task! Conclusion: the high priority SPINNING_RT thread was running for a long time, starving ksoftirqd thread, that was holding the rcu_preempt thread, that end up causing a hung task/rcu stall. For sake of completeness, the SPINNING_RT was not blocked by any other thread, as it was running on user-space. So, it is not the case of a kernel problem: crash> set 19306 PID: 19306 COMMAND: "SPINNING_RT" TASK: ffff880bfefb18b0 [THREAD_INFO: ffff880c1356c000] CPU: 8 STATE: TASK_RUNNING (ACTIVE) crash> bt PID: 19306 TASK: ffff880bfefb18b0 CPU: 8 COMMAND: "SPINNING_RT" #0 [ffff88187f847e30] crash_nmi_callback at ffffffff81030dd7 #1 [ffff88187f847e40] nmi_handle at ffffffff815c30ba #2 [ffff88187f847ea0] default_do_nmi at ffffffff815c3208 #3 [ffff88187f847ed0] do_nmi at ffffffff815c34c0 #4 [ffff88187f847ef0] end_repeat_nmi at ffffffff815c25b7 RIP: 0000000001179715 RSP: 00007f0d309694b0 RFLAGS: 00000293 RAX: 0000000000000007 RBX: 00000000034a4200 RCX: 0000000000000006 RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000002 RBP: 0000000000000000 R8: 0000000000000008 R9: 0000000000000003 R10: 0000000000000004 R11: 0000000000000005 R12: 00007f0d309694e0 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000005 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b --- <NMI exception stack> --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2584.html |