Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Improved RT throttling mechanism
The current real-time throttling mechanism prevents the starvation of non-real-time tasks by CPU intensive real-time tasks. When a real-time run queue is throttled, it allows non-real-time tasks to run or if there are none, the CPU goes idle. To safely maximize CPU usage by decreasing the CPU idle time, the "RT_RUNTIME_GREED" scheduler feature has been implemented. When enabled, this feature checks if non-real-time tasks are starving before throttling the real-time task. As a result, the "RT_RUNTIME_GREED" scheduler option guarantees some run time on all CPUs for the non-real-time tasks, while keeping the real-time tasks running as much as possible.
DescriptionDaniel Bristot de Oliveira
2016-12-02 16:29:01 UTC
Description of problem:
Currently, we have two throttling modes:
With RT_RUNTIME_SHARING (default):
before throttle, try to borrow some runtime from other CPU.
Without RT_RUNTIME_SHARING:
throttle the RT task, even if there is nothing else to do.
The problem of the first is that a CPU easily borrow enough runtime to
make the spin-rt-task to run forever, allowing the starvation of the
non-rt-tasks, hence invalidating the mechanism.
The problem of the second is that (with the default values) the CPU will
be idle 5% of the time, wasting CPU time.
So neither solution is perfect.
Daniel Bristot suggested a new option for the rt throttling, the RT_RUNTIME_GREED sched feature.
The description of the feature is:
------------------------%<-------------
The rt throttling mechanism prevents the starvation of non-real-time
tasks by CPU intensive real-time tasks. In terms of percentage,
the default behavior allows real-time tasks to run up to 95% of a
given period, leaving the other 5% of the period for non-real-time
tasks. In the absence of non-rt tasks, the system goes idle for 5%
of the period.
Although this behavior works fine for the purpose of avoiding
bad real-time tasks that can hang the system, some greed users
want to allow the real-time task to continue running in the absence
of non-real-time tasks starving. In other words, they do not want to
see the system going idle.
This patch implements the RT_RUNTIME_GREED scheduler feature for greedy
users (TM). When enabled, this feature will check if non-rt tasks are
starving before throttling the real-time task. If the real-time task
becomes throttled, it will be unthrottled as soon as the system goes
idle, or when the next period starts, whichever comes first.
This feature is enabled with the following command:
# echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features
The user might also want to disable NO_RT_RUNTIME_SHARE logic,
to keep all CPUs with the same rt_runtime.
# echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
With these two options set, the user will guarantee some runtime
for non-rt-tasks on all CPUs, while keeping real-time tasks running
as much as possible.
------------------------>%-------------
Unfortunately, this option was rejected by Peterz, which wants
a more complete solution using a deadline server, such
a hierarchical scheduling of non-real-time task inside a deadline task.
Here is peterz's reply:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1266868.html
There was some discussions about the implementation of the deadline server,
but it will certainly take some time.
There is an internal consensus that Daniel's propose is an acceptable
workaround for the problem for our customers, while waiting for the
definitive solution.
So, the plan is: use RT_RUNTIME_GREED sched feature until having the
definitive upstream solution, in the real-time kernel.
Comment 1Daniel Bristot de Oliveira
2017-04-18 08:35:12 UTC
Hey,
BNP complained about this thread being blocked because of a CPU with a spinning -rt tasks:
crash> bt 6949
PID: 6949 TASK: ffff880418466300 CPU: 10 COMMAND: "force"
#0 [ffff8800b8793918] __schedule at ffffffff815f31dc
#1 [ffff8800b87939b0] schedule at ffffffff815f38f4
#2 [ffff8800b87939d0] wait_transaction_locked at ffffffffa0311a05 [jbd2]
#3 [ffff8800b8793a40] add_transaction_credits at ffffffffa0311e89 [jbd2]
#4 [ffff8800b8793ac0] start_this_handle at ffffffffa0312131 [jbd2]
#5 [ffff8800b8793b60] jbd2__journal_start at ffffffffa0312640 [jbd2]
#6 [ffff8800b8793bc0] __ext4_journal_start_sb at ffffffffa0370889 [ext4]
#7 [ffff8800b8793c10] ext4_dirty_inode at ffffffffa0341934 [ext4]
#8 [ffff8800b8793c30] __mark_inode_dirty at ffffffff811dbd9b
#9 [ffff8800b8793c60] update_time at ffffffff811c8d41
#10 [ffff8800b8793c90] file_update_time at ffffffff811c8e28
#11 [ffff8800b8793cf0] __generic_file_aio_write at ffffffff8114c028
#12 [ffff8800b8793d80] generic_file_aio_write at ffffffff8114c2b5
#13 [ffff8800b8793dd0] ext4_file_write at ffffffffa0339954 [ext4]
#14 [ffff8800b8793e10] do_sync_write at ffffffff811acdff
#15 [ffff8800b8793ef0] vfs_write at ffffffff811ad31f
#16 [ffff8800b8793f20] sys_write at ffffffff811addd0
#17 [ffff8800b8793f80] tracesys at ffffffff815fdca8 (via system_call)
RIP: 0000003eb900e6fd RSP: 00007fdf14f02d60 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: ffffffff815fdca8 RCX: ffffffffffffffff
RDX: 000000000000001f RSI: 00000000007b9a0c RDI: 0000000000000022
RBP: 0000000000000022 R8: 00000000007b99d0 R9: 00000000000001f0
R10: 00007fdf20909718 R11: 0000000000000293 R12: 00007fdeec1fbe10
R13: 00007fdf14f02db0 R14: 000000000000001f R15: 00000000007b9a0c
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This is that old BZ about not being possible to avoid a jbd2 thread on an isolated CPU - BZ1306341.
One possible workaround for this problem is to add the patch suggested in this BZ.
The other would be to try to make jdb2 per-cpu kworkers not to be per-cpu. But that would be really complex.
Comment 3Daniel Bristot de Oliveira
2017-07-14 16:36:27 UTC
How to reproduce the problem:
1) prepare a busy-loop task, like:
f.c:
------------- %< ------------------
int main (void)
{
for(;;);
}
------------- >% -------------------
# gcc -o rt f.c
# gcc -o nonrt f.c
2) disable rt runtime sharing
# echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
3) run the "rt" busy loop task, in the FIFO policy, pinned to a CPU,
for instance, CPU 1:
# taskset -c 1 chrt -f 1 ./rt &
4) see the CPU 1 usage, it should notify 95% busy with the "rt" task,
and +- 5% idle.
5) Then, enable the RT_RUNTIME_GREED feature:
# echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features
and check the CPU 1 usage, now the "rt" should be taking +-100 % of CPU
time.
The system should be able to run for a long period without causing
problems like hung tasks because of the busy-loop task.
(that is the feature implemented by this patch)
6) Finally, run the "nonrt" task in the CPU 1 as non-rt:
# taskset -c 1 ./nonrt &
Now, the "rt" task should be taking 95% and the "nonrt" 5%.
Comment 4Daniel Bristot de Oliveira
2017-07-14 16:59:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:0676