Bug 1303733
| Summary: | backport of: "softirq: split timer softirqs out of ksoftirqd" | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Bristot de Oliveira <daolivei> | |
| Component: | kernel-rt | Assignee: | Clark Williams <williams> | |
| kernel-rt sub component: | Other | QA Contact: | Jiri Kastner <jkastner> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | high | CC: | bhu, lgoncalv, mkolaja, williams | |
| Version: | 7.3 | Keywords: | ZStream | |
| Target Milestone: | rc | |||
| Target Release: | 7.3 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1306962 1313435 (view as bug list) | Environment: | ||
| Last Closed: | 2016-11-03 19:41:42 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1203710, 1274397, 1306962, 1313435 | |||
|
Description
Daniel Bristot de Oliveira
2016-02-01 19:45:29 UTC
A customer faced a RCU stall because a the timer softirq was preempted
by a spinning RT thread. As the ksoftirq currently runs network load,
it is not a good idea to run it on a high RT priority. On the other
hand, the delay on delivering a timer, or a long preemption of the
timer softirq while it holds a lock can cause damage on system's
operation. Being able to increase the priority of the timer
softirq can be the best solution for these potential problems.
Here is the RCA of a RCU stall caused by the preemption of the
timer softirq, which can be avoided by increasing timer's
softirq thread priority.
==== RCA of a customer's support case ====
The system complained about a RCU Stall, and an hung task - the rcu_preempt.
The rcu_preempt task is sleeping on UN state for 00:02:26.544:
crash> ps -m 14
[0 00:02:26.544] [UN] PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt"
rcu_preempt's stack is:
crash> bt
PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt"
#0 [ffff880c151b3c78] __schedule at ffffffff815bf324
#1 [ffff880c151b3d20] schedule at ffffffff815bf9c4
#2 [ffff880c151b3d40] del_timer_sync at ffffffff8106b5f5
#3 [ffff880c151b3dc0] schedule_timeout at ffffffff815be334
#4 [ffff880c151b3e30] rcu_gp_kthread at ffffffff810fe55f
#5 [ffff880c151b3ec0] kthread at ffffffff8108389e
#6 [ffff880c151b3f50] ret_from_fork at ffffffff815c966c
The task is sleeping on del_timer_sync() function...
del_timer_sync() definition:
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
/**
* del_timer_sync - deactivate a timer and wait for the handler to finish.
* @timer: the timer to be deactivated
*
* This function only differs from del_timer() on SMP: besides deactivating
* the timer it also makes sure the handler has finished executing on other
* CPUs.
*
* Synchronization rules: Callers must prevent restarting of the timer,
* otherwise this function is meaningless. It must not be called from
* interrupt contexts unless the timer is an irqsafe one. The caller must
* not hold locks which would prevent completion of the timer's
* handler. The timer's handler must not call add_timer_on(). Upon exit the
* timer is not queued and the handler is not running on any CPU.
*
* Note: For !irqsafe timers, you must not hold locks that are held in
* interrupt context while calling this function. Even if the lock has
* nothing to do with the timer in question. Here's why:
*
* CPU0 CPU1
* ---- ----
* <SOFTIRQ>
* call_timer_fn();
* base->running_timer = mytimer;
* spin_lock_irq(somelock);
* <IRQ>
* spin_lock(somelock);
* del_timer_sync(mytimer);
* while (base->running_timer == mytimer);
*
* Now del_timer_sync() will never return and never release somelock.
* The interrupt on the other CPU is waiting to grab somelock but
* it has interrupted the softirq that CPU0 is waiting to finish.
*
* The function returns whether it has deactivated a pending timer or not.
*/
int del_timer_sync(struct timer_list *timer)
{
#ifdef CONFIG_LOCKDEP
unsigned long flags;
/*
* If lockdep gives a backtrace here, please reference
* the synchronization rules above.
*/
local_irq_save(flags);
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
local_irq_restore(flags);
#endif
/*
* don't use it in hardirq context, because it
* could lead to deadlock.
*/
WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base));
for (;;) {
int ret = try_to_del_timer_sync(timer);
if (ret >= 0)
return ret;
wait_for_running_timer(timer); <--- It is waiting for a running timer.
}
}
The wait_for_running_timer() waits for an active timer softirq to finish its
work, it sleeps on a wait queue.
So, there should be a ksoftirqd thread unable to run. Thus I checked runqueues
and I found a ksoftirqd not able to run due to a high priority rt user-space
thread on CPU 8:
CPU 8 RUNQUEUE: ffff88187f8541c0
CURRENT: PID: 19306 TASK: ffff880bfefb18b0 COMMAND: "SPINNING_RT"
RT PRIO_ARRAY: ffff88187f8542a0
[ 97] PID: 19306 TASK: ffff880bfefb18b0 COMMAND: "SPINNING_RT"
[ 98] PID: 255 TASK: ffff8818156198b0 COMMAND: "ksoftirqd/8"
CFS RB_ROOT: ffff88187f854260
[no tasks queued]
These threads are able to run for more than 2 minutes:
crash> ps -m 19306
[0 00:02:26.546] [RU] PID: 19306 TASK: ffff880bfefb18b0 CPU: 8 COMMAND: "SPINNING_RT"
crash> ps -m 255
[0 00:02:26.546] [RU] PID: 255 TASK: ffff8818156198b0 CPU: 8 COMMAND: "ksoftirqd/8"
But as SPINNING_RT has a higher priority, it is running since it was awakened.
The ksoftirq was preempted by SPINNING_RT while running the timer softirq:
crash> bt 255
PID: 255 TASK: ffff8818156198b0 CPU: 8 COMMAND: "ksoftirqd/8"
#0 [ffff881815623b08] __schedule at ffffffff815bf324
#1 [ffff881815623bb0] preempt_schedule at ffffffff815bf77f
#2 [ffff881815623bd0] _raw_spin_unlock_irqrestore at ffffffff815c1c3e
#3 [ffff881815623c40] wake_up_process at ffffffff81097127
#4 [ffff881815623c60] process_timeout at ffffffff8106abae
#5 [ffff881815623c70] call_timer_fn at ffffffff8106b8e9
#6 [ffff881815623cb0] run_timer_softirq at ffffffff8106bbb9
#7 [ffff881815623d50] handle_softirq at ffffffff81062ca5
#8 [ffff881815623da0] do_current_softirqs at ffffffff81062f51
#9 [ffff881815623e20] run_ksoftirqd at ffffffff81063064
#10 [ffff881815623e40] smpboot_thread_fn at ffffffff8108ccb9
#11 [ffff881815623ec0] kthread at ffffffff8108389e
#12 [ffff881815623f50] ret_from_fork at ffffffff815c966c
The timer softirq was on the way to wake up a thread, but it was preempted
before be able to finish this operation, checking the stack to figure out
which thread was about to be awakened:
#1 [ffff881815623bb0] preempt_schedule at ffffffff815bf77f
ffff881815623bb8: 0000000000000296 > ffff880c151a98b0
ffff881815623bc8: ffff881815623bd8 ffffffff815c1c3e
#2 [ffff881815623bd0] _raw_spin_unlock_irqrestore at ffffffff815c1c3e
ffff881815623bd8: ffff881815623c38 ffffffff81096f79
ffff881815623be8: ffff881815620010 00000000000141c0
ffff881815623bf8: ffff881815623d18 ffffffff00000008
ffff881815623c08: ffff881815620010 ffff880c151a98b0
ffff881815623c18: ffffffff8106aba0 ffff880c151b3dc8
ffff881815623c28: ffff881815620000 ---> ffff880c151a98b0 -> 3: that was pushed here.
ffff881815623c38: ffff881815623c58 ffffffff81097127 -> 2: it was stored at rbx
#3 [ffff881815623c40] wake_up_process at ffffffff81097127 -> 1: task_struct is the first argument
ffff881815623c48: 0000000000000000 0000000000000000
ffff881815623c58: ffff881815623c68 ffffffff8106abae
#4 [ffff881815623c60] process_timeout at ffffffff8106abae
The process that was about to wake up was:
crash> ps -m ffff880c151a98b0
[0 00:02:26.544] [UN] PID: 14 TASK: ffff880c151a98b0 CPU: 6 COMMAND: "rcu_preempt"
The rcu_preempt task!
Conclusion: the high priority SPINNING_RT thread was running for a long time,
starving ksoftirqd thread, that was holding the rcu_preempt thread, that end up
causing a hung task/rcu stall.
For sake of completeness, the SPINNING_RT was not blocked by any other thread, as
it was running on user-space. So, it is not the case of a kernel problem:
crash> set 19306
PID: 19306
COMMAND: "SPINNING_RT"
TASK: ffff880bfefb18b0 [THREAD_INFO: ffff880c1356c000]
CPU: 8
STATE: TASK_RUNNING (ACTIVE)
crash> bt
PID: 19306 TASK: ffff880bfefb18b0 CPU: 8 COMMAND: "SPINNING_RT"
#0 [ffff88187f847e30] crash_nmi_callback at ffffffff81030dd7
#1 [ffff88187f847e40] nmi_handle at ffffffff815c30ba
#2 [ffff88187f847ea0] default_do_nmi at ffffffff815c3208
#3 [ffff88187f847ed0] do_nmi at ffffffff815c34c0
#4 [ffff88187f847ef0] end_repeat_nmi at ffffffff815c25b7
RIP: 0000000001179715 RSP: 00007f0d309694b0 RFLAGS: 00000293
RAX: 0000000000000007 RBX: 00000000034a4200 RCX: 0000000000000006
RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000002
RBP: 0000000000000000 R8: 0000000000000008 R9: 0000000000000003
R10: 0000000000000004 R11: 0000000000000005 R12: 00007f0d309694e0
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000005
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
--- <NMI exception stack> ---
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2584.html |