Bug 1509264 - [RHEL-RT] Possible regression with NOHZ_FULL & rt_mutexes in IRQ (BZ1250649)
Summary: [RHEL-RT] Possible regression with NOHZ_FULL & rt_mutexes in IRQ (BZ1250649)
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.5
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Daniel Bristot de Oliveira
QA Contact: Jiri Kastner
URL:
Whiteboard:
Keywords: ZStream
Depends On:
Blocks: 1442258 1511382
TreeView+ depends on / blocked
 
Reported: 2017-11-03 12:39 UTC by Daniel Bristot de Oliveira
Modified: 2018-04-10 08:59 UTC (History)
5 users (show)

(edit)
Clone Of:
: 1511382 (view as bug list)
(edit)
Last Closed: 2018-04-10 08:57:57 UTC


Attachments (Terms of Use)
[PATCH] re-apply Revert "timers: do not raise softirq unconditionally" (5.06 KB, application/mbox)
2017-11-03 12:55 UTC, Daniel Bristot de Oliveira
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0676 None None None 2018-04-10 08:59 UTC

Description Daniel Bristot de Oliveira 2017-11-03 12:39:08 UTC
[this is the rhel7.5-rt version of the BZ 1509021]

BNP faced a crash with our kernel involving trying to take a rt_spin_lock
in the IRQ context. More precisely, in an IRQ, preempting a threaded IRQ, the timer (hard) IRQ tried to take a rt_spin_lock in the NOHZ_FULL path.

Even more precisely, this tries to take a rt_spin_lock resembles an old
bug we had with the -rt kernel. The BZ is:

https://bugzilla.redhat.com/show_bug.cgi?id=1250649

But the fix for the BZ is not present in our kernel!!!

(the fix is a revert, but the revert is not present.

In details, our kernel has the current run_local_timers():
------------------------------- %< ---------------------------------------
/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
        struct tvec_base *base = __this_cpu_read(tvec_bases);

        hrtimer_run_queues();
        /*
         * We can access this lockless as we are in the timer
         * interrupt. If there are no timers queued, nothing to do in
         * the timer softirq.
         */
#ifdef CONFIG_PREEMPT_RT_FULL

#ifndef CONFIG_SMP
        /*
         * The spin_do_trylock() later may fail as the lock may be hold before
         * the interrupt arrived. The spin-lock debugging code will raise a
         * warning if the try_lock fails on UP. Since this is only an
         * optimization for the FULL_NO_HZ case (not to run the timer softirq on
         * an nohz_full CPU) we don't really care and shedule the softirq.
         */
        raise_softirq(TIMER_SOFTIRQ);
        return;
#endif

        /* On RT, irq work runs from softirq */
        if (irq_work_needs_cpu()) {
                raise_softirq(TIMER_SOFTIRQ);
                return;
        }

        if (!spin_do_trylock(&base->lock)) {
                raise_softirq(TIMER_SOFTIRQ);
                return;
        }
#endif

        if (!base->active_timers)
                goto out;

        /* Check whether the next pending timer has expired */
        if (time_before_eq(base->next_timer, jiffies))
                raise_softirq(TIMER_SOFTIRQ);
out:
#ifdef CONFIG_PREEMPT_RT_FULL
        rt_spin_unlock_after_trylock_in_irq(&base->lock);
#endif
        /* The ; ensures that gcc won't complain in the !RT case */
        ;
}
------------------------------- >% --------------------------------------

But the upstream code is.....:
------------------------------- %< --------------------------------------
/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
        struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

        hrtimer_run_queues();
        /* Raise the softirq only if required. */
        if (time_before(jiffies, base->clk)) {
                if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
                        return;
                /* CPU is awake, so check the deferrable base. */
                base++;
                if (time_before(jiffies, base->clk))
                        return;
        }
        raise_softirq(TIMER_SOFTIRQ);
}
------------------------------- >% --------------------------------------

Actually, our code should look like the upstream one, because we reverted
the -rt changes to run_local_timers() in the commit:
------------------------------- %< --------------------------------------
commit a79d56f3fbe6504e4055620920a46dfcf2cb0b3e
Author: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Date:   Mon Mar 16 15:54:29 2015 -0400

    Revert "timers: do not raise softirq unconditionally"
------------------------------- >% --------------------------------------

but the revert somehow disappeared from 227 to 327. For example, it is
only present only on these tags:

[bristot@t460s kernel-rt]$ git tag --contains a79d56f3fbe6504e4055620920a46dfcf2cb0b3e
kernel-rt-3.10.0-229.13.1.rt56.160.el6rt
kernel-rt-3.10.0-229.20.1.rt56.141.14.el7
kernel-rt-3.10.0-229.rt56.160.el6rt
kernel-rt-3.10.0-229.rt56.161.el6rt
kernel-rt-3.10.0-229.rt56.162.el6rt
kernel-rt-3.10.0-229rt56.160.el6rt

I think we need to review it.... BNP is facing the problem that we
faced in the BZ that caused the revert (1250649)...

BTW, this is critical because this might change NOHZ_FULL, even break,
on our kernel.

Comment 2 Daniel Bristot de Oliveira 2017-11-03 12:55 UTC
Created attachment 1347341 [details]
[PATCH] re-apply Revert "timers: do not raise softirq unconditionally"

Comment 9 errata-xmlrpc 2018-04-10 08:57:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0676


Note You need to log in before you can comment on or make changes to this bug.