| Summary: | RFE: Improve RT throttling mechanism | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Bristot de Oliveira <daolivei> | ||||
| Component: | kernel-rt | Assignee: | Daniel Bristot de Oliveira <daolivei> | ||||
| kernel-rt sub component: | Memory Management | QA Contact: | Jiri Kastner <jkastner> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | Jana Heves <jsvarova> | ||||
| Severity: | medium | ||||||
| Priority: | high | CC: | bhu, cww, daolivei, dhoward, mkolaja, salmy, stalexan, toneata, williams | ||||
| Version: | 7.4 | Keywords: | FutureFeature, ZStream | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||
| Doc Text: |
Improved RT throttling mechanism
The current real-time throttling mechanism prevents the starvation of non-real-time tasks by CPU intensive real-time tasks. When a real-time run queue is throttled, it allows non-real-time tasks to run or if there are none, the CPU goes idle. To safely maximize CPU usage by decreasing the CPU idle time, the "RT_RUNTIME_GREED" scheduler feature has been implemented. When enabled, this feature checks if non-real-time tasks are starving before throttling the real-time task. As a result, the "RT_RUNTIME_GREED" scheduler option guarantees some run time on all CPUs for the non-real-time tasks, while keeping the real-time tasks running as much as possible.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1505158 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-04-10 09:07:09 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1420851, 1442258, 1505158 | ||||||
| Attachments: |
|
||||||
|
Description
Daniel Bristot de Oliveira
2016-12-02 16:29:01 UTC
Hey,
BNP complained about this thread being blocked because of a CPU with a spinning -rt tasks:
crash> bt 6949
PID: 6949 TASK: ffff880418466300 CPU: 10 COMMAND: "force"
#0 [ffff8800b8793918] __schedule at ffffffff815f31dc
#1 [ffff8800b87939b0] schedule at ffffffff815f38f4
#2 [ffff8800b87939d0] wait_transaction_locked at ffffffffa0311a05 [jbd2]
#3 [ffff8800b8793a40] add_transaction_credits at ffffffffa0311e89 [jbd2]
#4 [ffff8800b8793ac0] start_this_handle at ffffffffa0312131 [jbd2]
#5 [ffff8800b8793b60] jbd2__journal_start at ffffffffa0312640 [jbd2]
#6 [ffff8800b8793bc0] __ext4_journal_start_sb at ffffffffa0370889 [ext4]
#7 [ffff8800b8793c10] ext4_dirty_inode at ffffffffa0341934 [ext4]
#8 [ffff8800b8793c30] __mark_inode_dirty at ffffffff811dbd9b
#9 [ffff8800b8793c60] update_time at ffffffff811c8d41
#10 [ffff8800b8793c90] file_update_time at ffffffff811c8e28
#11 [ffff8800b8793cf0] __generic_file_aio_write at ffffffff8114c028
#12 [ffff8800b8793d80] generic_file_aio_write at ffffffff8114c2b5
#13 [ffff8800b8793dd0] ext4_file_write at ffffffffa0339954 [ext4]
#14 [ffff8800b8793e10] do_sync_write at ffffffff811acdff
#15 [ffff8800b8793ef0] vfs_write at ffffffff811ad31f
#16 [ffff8800b8793f20] sys_write at ffffffff811addd0
#17 [ffff8800b8793f80] tracesys at ffffffff815fdca8 (via system_call)
RIP: 0000003eb900e6fd RSP: 00007fdf14f02d60 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: ffffffff815fdca8 RCX: ffffffffffffffff
RDX: 000000000000001f RSI: 00000000007b9a0c RDI: 0000000000000022
RBP: 0000000000000022 R8: 00000000007b99d0 R9: 00000000000001f0
R10: 00007fdf20909718 R11: 0000000000000293 R12: 00007fdeec1fbe10
R13: 00007fdf14f02db0 R14: 000000000000001f R15: 00000000007b9a0c
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This is that old BZ about not being possible to avoid a jbd2 thread on an isolated CPU - BZ1306341.
One possible workaround for this problem is to add the patch suggested in this BZ.
The other would be to try to make jdb2 per-cpu kworkers not to be per-cpu. But that would be really complex.
How to reproduce the problem:
1) prepare a busy-loop task, like:
f.c:
------------- %< ------------------
int main (void)
{
for(;;);
}
------------- >% -------------------
# gcc -o rt f.c
# gcc -o nonrt f.c
2) disable rt runtime sharing
# echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
3) run the "rt" busy loop task, in the FIFO policy, pinned to a CPU,
for instance, CPU 1:
# taskset -c 1 chrt -f 1 ./rt &
4) see the CPU 1 usage, it should notify 95% busy with the "rt" task,
and +- 5% idle.
5) Then, enable the RT_RUNTIME_GREED feature:
# echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features
and check the CPU 1 usage, now the "rt" should be taking +-100 % of CPU
time.
The system should be able to run for a long period without causing
problems like hung tasks because of the busy-loop task.
(that is the feature implemented by this patch)
6) Finally, run the "nonrt" task in the CPU 1 as non-rt:
# taskset -c 1 ./nonrt &
Now, the "rt" task should be taking 95% and the "nonrt" 5%.
Created attachment 1298514 [details]
[RT PATCH] sched/rt: RT_RUNTIME_GREED sched feature
Patch posted to the internal list: http://post-office.corp.redhat.com/archives/kernel-rt-team/2017-July/msg00005.html patch merged to the version 3.10.0-695.rt56.620. Hello All, 7.5 flag is not required, as kernel-rt it's approved directly for zstream Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0676 |