1550584 – spurious ktimersoftd wake ups increases latency (rhel-rt 7)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1550584 - spurious ktimersoftd wake ups increases latency (rhel-rt 7)

Summary: spurious ktimersoftd wake ups increases latency (rhel-rt 7)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel-rt
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.8
Assignee:	Daniel Bristot de Oliveira
QA Contact:	Mike Stowell
Docs Contact:	Sujata Kurup
URL:
Whiteboard:
Depends On:	1568294 1593361
Blocks:	1525647 1672377 1678810 1690543 1693411 1723499 1723502 1942495
TreeView+	depends on / blocked

Reported:	2018-03-01 14:21 UTC by Luiz Capitulino
Modified:	2021-04-30 19:55 UTC (History)
CC List:	17 users (show)
Fixed In Version:	kernel-rt-3.10.0-1063.rt56.1023.el7
Doc Type:	Bug Fix
Doc Text:	.The latency for isolated CPU's is now reduced by avoiding spurious `ktimersoftd` activation Previously, for a KVM-RT configured system, per-CPU `ktimersoftd` kernel threads ran once every second even in absence of a timer. Consequently, an increased latency occurred on the isolated CPU's. This update adds an optimization into the real-time kernel that does not wake the `ktimersoftd` on every tick. As a result, `ktimersoftd` is not raised on isolated CPU's, which prevents the interference and reduces the latency.
Clone Of:
Clones:	1723499 1723502 1942495 (view as bug list)
Environment:
Last Closed:	2020-03-31 19:48:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:1070	0	None	None	None	2020-03-31 19:49:55 UTC

Description Luiz Capitulino 2018-03-01 14:21:55 UTC

Description of problem:

On a system configured for KVM-RT, we can observe that a hog application running under fifo:1 is preempted at least once a second by the ktimersoftd thread. We expect the situation to be worse when running an RT guest, as the preemptions will happen in host and guest (see test results below).

We've debugged this down to the following commit:

commit 4ea0288128fa782063f4742651a545dd74396ec6
Author: Daniel Bristot de Oliveira <bristot>
Date:   Thu Nov 2 18:33:51 2017 +0100

    re-apply Revert "timers: do not raise softirq unconditionally"

This commit is dropping the code that avoids ktimersoftd spurious wake ups from the tick handler. The result is that now the ktimersoftd is woken up at every tick.

We measured KVM-RT test-case with and without the commit above, here's the results:

 Before revert:

  # Min Latencies: 00005 00010 00012 00010 00010 00010 00012 00010
  # Avg Latencies: 00012 00012 00014 00012 00012 00012 00014 00012
  # Max Latencies: 00026 00026 00033 00026 00026 00026 00033 00025

 After revert:

  # Min Latencies: 00005 00012 00012 00012 00012 00012 00012 00012
  # Avg Latencies: 00012 00013 00013 00013 00013 00013 00014 00013
  # Max Latencies: 00019 00025 00021 00019 00019 00019 00026 00019

So, we observe an increase in latency of around 20% in the worst case scenario.

Version-Release number of selected component (if applicable): kernel-3.10.0-858.rt56.799.el7.x86_64


How reproducible:


Steps to Reproduce:
1. Configure the system for KVM-RT or fully isolate a CPU
2. Run a hog application under fifo:1 pinned to the isolated CPU
3. Trace sched_switch events

Comment 4 Luiz Capitulino 2018-03-01 16:37:20 UTC

Very quickly tested with RHEL7's kernel-3.10.0-858.el7.x86_64 and the cpu-partitioning profile, can't reproduce. Meaning, I don't see any preemptions at all. However, I ran the test-case only for a few minutes.

Comment 5 Luiz Capitulino 2018-03-02 21:13:34 UTC

Let me emphasize that this issue is extremely important. We have a confirmed 20% worse latency for 10 minutes runs and I'd expect that right now RHEL+cpu-partitioning offers better latency for DPDK than the RT kernel.

Also, for regular RT usage, if you have the tick ticking at 1000 times per second you'll have 1000 preemptions per second.

Comment 6 Luiz Capitulino 2018-03-05 14:21:00 UTC

Here's a reproducer that doesn't require KVM-RT profiles, but still requires you to completely isolate a CPU:

1. Completely isolate a CPU, use nohz_full, etc
2. Run a hog application pinned to that CPU: taskset -c CPU ./hog
3. Trace sched_switch events

On an unfixed kernel, there will be one context switch to the ktimersoftd thread per second. On a fixed system, there should be no sched_switch events whatsoever.

Comment 7 Daniel Bristot de Oliveira 2018-03-05 15:38:29 UTC

Hi Luis,

I am already reproducing the issue in a local system. Currently checking upstream code/reproducer.

I have a question: It is not related to this BZ, but I saw rcuc threads being awakened. To avoid this, we can use rcu_nocb_poll, but I do not see this being enabled in the realtime-virtual-host. Is there a reason for it not being enabled?

-- Daniel

Comment 8 Luiz Capitulino 2018-03-05 15:45:32 UTC

Can you open a BZ? You can assign it to me.

Because of another issue I'm debugging, I've just traced sched_switch on the host when running KVM-RT test-case and didn't see any context switches to rcuc threads. But in any case it's better to investigate.

Comment 9 Daniel Bristot de Oliveira 2018-03-08 16:34:31 UTC

Here is a resume of my findings so far.

I could not reproduce the rcuc wakeup anymore. I do not know why. So I will not file the BZ until I see this again, if I see... maybe there was something wrong with my setup.

Using the kernel with the new timer wheel, I still saw the wakeup of the ktimersoftd. But after applying the patches:

[PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
[PATCH v4 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled

And adding the following kernel command line parameter:

tsc=reliable

to avoid the timer that checks the stability of the TSC.

The above-mentioned patches are not part of the PREEMPT_RT patch set yet, so there might be newer versions.

I can see very few ktimersoftd wake-up. Those that I see seems to be legit, and rarely take place, and each one after a long time one from the previous one.

My next step is to try to backport the timer wheel for the RHEL-7.6 code.

Thanks
-- Daniel

Comment 10 Luiz Capitulino 2018-03-08 17:16:03 UTC

Daniel,

Does the first patch depend on the new timer wheel?

Comment 11 Daniel Bristot de Oliveira 2018-03-09 08:24:36 UTC

Hi Luis,

Unfortunately, yes. These patches improve the timer wheel by forwarding it in the interrupt context. The softirq is then raised just in case of an armed timer in the forwarding process.

-- Daniel

Comment 12 Luis Claudio R. Goncalves 2018-06-12 13:45:04 UTC

The dependencies for the timer wheel update, first half of the solution for the problem described in this bugzilla ticket, have already been added to kernel-rt-3.10.0-900.rt56.846.el7. These changes are enough to keep us in sync with the changes added to RHEL, respecting the kernel-rt differences.

Changes added:

bb2b1db2d575 timers: Reduce the CPU index space to 256k
34e2660305c9 timers: Use proper base migration in add_timer_on()
9a9ececb8d90 hlist: Add hlist_is_singular_node() helper
0fbdb2309b1a signals: Use hrtimer for sigtimedwait()
f68a3f9917ed timers: Remove the deprecated mod_timer_pinned() API
4da22cacb04c timers, driver/net/ethernet/tile: Initialize the egress timer as pinned
b1aa9f139e6b timers, cpufreq/powernv: Initialize the gpstate timer as pinned
c274cd526b70 timers, x86/apic/uv: Initialize the UV heartbeat timer as pinned
dee6b36a6cc9 timers: Make 'pinned' a timer property
da4f00fe9a87 timer: Minimize nohz off overhead
e79103755f45 timer: Reduce timer migration overhead if disabled (v2)
417bd5c3fdc1 Remove code redundancy while calling get_nohz_timer_target()
dde8ca171b2a timer: Stats: Simplify the flags handling
ce38ad8067ac timer: Replace timer base by a cpu index
6e878157ddce timer: Use timer->base for flag checks
563b84b3a535 tracing: timer: Add deferrable flag to timer_start
66743e988f03 timer: Use hlist for the timer wheel hash buckets
45665e16b954 timer: Remove FIFO "guarantee"
6056d7fc3054 timers: Sanitize catchup_timer_jiffies() usage
9b05865436d7 timer: Put usleep_range into the __sched section
2e80b77a26ca timer: Remove pointless return value of do_usleep_range()
c57ba680b598 timer: Further simplify the SMP and HOTPLUG logic
9960cc117fd1 timer: Don't initialize 'tvec_base' on hotplug
c0327baf7493 timer: Allocate per-cpu tvec_base's statically

Comment 13 Red Hat Bugzilla Rules Engine 2018-06-27 14:12:30 UTC

Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.

Comment 20 Daniel Bristot de Oliveira 2019-03-06 10:33:54 UTC

Hi,

We have the main dependencies for this patch set already in our kernel. But, the problem is still present in the upstream kernel.
We are working with the kernel maintainer (Timer/RT) to find a solution, and this is a high priority BZ for us.
However, this is a very complex problem and will require some time to find an upstream solution.

Thanks!
-- Daniel

Comment 39 Daniel Bristot de Oliveira 2019-11-15 12:23:06 UTC

doc text reviewed, It is fine.

Thanks!

Comment 49 errata-xmlrpc 2020-03-31 19:48:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1070

Note You need to log in before you can comment on or make changes to this bug.