Bug 1503225

Summary: realtime-virtual-host,guest: enable RT_RUNTIME_GREED
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: tunedAssignee: Luiz Capitulino <lcapitulino>
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: jeder, jskarvad, olysonek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-13 18:16:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1240765    

Description Luiz Capitulino 2017-10-17 15:29:29 UTC
Description of problem:

RHEL7.5 RT kernel has a downstream-only feature called RT_RUNTIME_GREED. This feature allows a SCHED_OTHER task to preempt a fifo:1 task for a limited amount of time. We've decided to enable it in the host and guest KVM-RT profiles in order to have a workaround for issue like bug 1448770, where starvation of SCHED_OTHER kernel threads can lead to system lockup.

NOTE: In its current version, RT_RUNTIME_GREED allows a SCHED_OTHER task to run for a maximum of 1ms. This is too much time for real-time. The RT team is trying to find a way to have a maximum of 10us preemption duration.

Version-Release number of selected component (if applicable): tuned-2.9.0-0.1.rc1.el7.noarch

Comment 2 Luiz Capitulino 2017-10-18 18:45:24 UTC
Marcelo has a better proposal for this issue, which is to change DPDK to allow yielding the CPU for short periods (say 10us). He's going to post a patch upstream.

I'll keep this open for now as Marcelo's idea depends on upstream acceptance.

Comment 3 Luiz Capitulino 2017-11-13 18:16:26 UTC
We have decided we won't be enabling this for KVM-RT. There are two reasons for this decision:

1. This still causes a spike, so even if we enable this we wouldn't be done with our main issue (which is bug 1448770)

2. We've found the series that introduces the bug 1448770 (see bug 1448770 comment 100), so we know how to get it fixed without this anyways