Bug 1509264

Summary: [RHEL-RT] Possible regression with NOHZ_FULL & rt_mutexes in IRQ (BZ1250649)
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Bristot de Oliveira <daolivei>
Component: kernel-rtAssignee: Daniel Bristot de Oliveira <daolivei>
kernel-rt sub component: Other QA Contact: Jiri Kastner <jkastner>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: bhu, dhoward, lgoncalv, stalexan, williams
Version: 7.5Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 3.10.0-771.rt56.703 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1511382 (view as bug list) Environment:
Last Closed: 2018-04-10 08:57:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1442258, 1511382    
Attachments:
Description Flags
[PATCH] re-apply Revert "timers: do not raise softirq unconditionally" none

Description Daniel Bristot de Oliveira 2017-11-03 12:39:08 UTC
[this is the rhel7.5-rt version of the BZ 1509021]

BNP faced a crash with our kernel involving trying to take a rt_spin_lock
in the IRQ context. More precisely, in an IRQ, preempting a threaded IRQ, the timer (hard) IRQ tried to take a rt_spin_lock in the NOHZ_FULL path.

Even more precisely, this tries to take a rt_spin_lock resembles an old
bug we had with the -rt kernel. The BZ is:

https://bugzilla.redhat.com/show_bug.cgi?id=1250649

But the fix for the BZ is not present in our kernel!!!

(the fix is a revert, but the revert is not present.

In details, our kernel has the current run_local_timers():
------------------------------- %< ---------------------------------------
/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
        struct tvec_base *base = __this_cpu_read(tvec_bases);

        hrtimer_run_queues();
        /*
         * We can access this lockless as we are in the timer
         * interrupt. If there are no timers queued, nothing to do in
         * the timer softirq.
         */
#ifdef CONFIG_PREEMPT_RT_FULL

#ifndef CONFIG_SMP
        /*
         * The spin_do_trylock() later may fail as the lock may be hold before
         * the interrupt arrived. The spin-lock debugging code will raise a
         * warning if the try_lock fails on UP. Since this is only an
         * optimization for the FULL_NO_HZ case (not to run the timer softirq on
         * an nohz_full CPU) we don't really care and shedule the softirq.
         */
        raise_softirq(TIMER_SOFTIRQ);
        return;
#endif

        /* On RT, irq work runs from softirq */
        if (irq_work_needs_cpu()) {
                raise_softirq(TIMER_SOFTIRQ);
                return;
        }

        if (!spin_do_trylock(&base->lock)) {
                raise_softirq(TIMER_SOFTIRQ);
                return;
        }
#endif

        if (!base->active_timers)
                goto out;

        /* Check whether the next pending timer has expired */
        if (time_before_eq(base->next_timer, jiffies))
                raise_softirq(TIMER_SOFTIRQ);
out:
#ifdef CONFIG_PREEMPT_RT_FULL
        rt_spin_unlock_after_trylock_in_irq(&base->lock);
#endif
        /* The ; ensures that gcc won't complain in the !RT case */
        ;
}
------------------------------- >% --------------------------------------

But the upstream code is.....:
------------------------------- %< --------------------------------------
/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
        struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

        hrtimer_run_queues();
        /* Raise the softirq only if required. */
        if (time_before(jiffies, base->clk)) {
                if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
                        return;
                /* CPU is awake, so check the deferrable base. */
                base++;
                if (time_before(jiffies, base->clk))
                        return;
        }
        raise_softirq(TIMER_SOFTIRQ);
}
------------------------------- >% --------------------------------------

Actually, our code should look like the upstream one, because we reverted
the -rt changes to run_local_timers() in the commit:
------------------------------- %< --------------------------------------
commit a79d56f3fbe6504e4055620920a46dfcf2cb0b3e
Author: Steven Rostedt (Red Hat) <rostedt>
Date:   Mon Mar 16 15:54:29 2015 -0400

    Revert "timers: do not raise softirq unconditionally"
------------------------------- >% --------------------------------------

but the revert somehow disappeared from 227 to 327. For example, it is
only present only on these tags:

[bristot@t460s kernel-rt]$ git tag --contains a79d56f3fbe6504e4055620920a46dfcf2cb0b3e
kernel-rt-3.10.0-229.13.1.rt56.160.el6rt
kernel-rt-3.10.0-229.20.1.rt56.141.14.el7
kernel-rt-3.10.0-229.rt56.160.el6rt
kernel-rt-3.10.0-229.rt56.161.el6rt
kernel-rt-3.10.0-229.rt56.162.el6rt
kernel-rt-3.10.0-229rt56.160.el6rt

I think we need to review it.... BNP is facing the problem that we
faced in the BZ that caused the revert (1250649)...

BTW, this is critical because this might change NOHZ_FULL, even break,
on our kernel.

Comment 2 Daniel Bristot de Oliveira 2017-11-03 12:55:44 UTC
Created attachment 1347341 [details]
[PATCH] re-apply Revert "timers: do not raise softirq unconditionally"

Comment 9 errata-xmlrpc 2018-04-10 08:57:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0676