Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1438512 - [rt] pull patchset that lifts single reader restriction on rwsems
[rt] pull patchset that lifts single reader restriction on rwsems
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt (Show other bugs)
7.4
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Luis Claudio R. Goncalves
Jiri Kastner
:
Depends On:
Blocks: 1353018
  Show dependency treegraph
 
Reported: 2017-04-03 11:29 EDT by Clark Williams
Modified: 2017-08-01 20:25 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 15:02:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
[patch RT 1_4] rtmutex: Make lock_killable work (4.77 KB, patch)
2017-04-03 11:30 EDT, Clark Williams
no flags Details | Diff
[patch RT 2_4] rtmutex: Provide rt_mutex_lock_state() (6.70 KB, patch)
2017-04-03 11:31 EDT, Clark Williams
no flags Details | Diff
[patch RT 3_4] rtmutex: Provide locked slowpath (8.36 KB, patch)
2017-04-03 11:32 EDT, Clark Williams
no flags Details | Diff
[patch RT 4_4] rwsem_rt: Lift single reader restriction (23.63 KB, patch)
2017-04-03 11:32 EDT, Clark Williams
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2077 normal SHIPPED_LIVE Important: kernel-rt security, bug fix, and enhancement update 2017-08-01 14:13:37 EDT

  None (edit)
Description Clark Williams 2017-04-03 11:29:20 EDT
R/W semaphores in RT do not allow multiple readers because a writer
blocking on the sempahore would have deal with all the readers in terms of
priority or budget inheritance. While multi reader priority boosting would
be possible (it has been attempted before), multi reader budget inheritance
is impossible.

It's obvious that the single reader restriction has severe performance
problems for situations with heavy reader contention.

A typical issue is the contention of mmap_sem. The main issue with mmap_sem
vs. process shared futexes has been cured for some architectures by
switching to fast GUP, but it still persists for those architectures which
do not (yet) implement it.

Non-RT workloads suffer also from mmap_sem contention when they trigger a
massive amount of page faults on multiple threads.

There is another issue with R/W semaphores. 

The single reader restriction is not violating the !RT semantics of R/W
sempahores, because on !RT R/W sempahores are writer fair. That means, that
when a writer blocks on a contended R/W semaphore newly incoming readers
block behind the writer. This prevents writer starvation.

So the following scenario is resulting in a deadlock independent of RT:

T1
down_read(sem);
wait_for_event();
  schedule()

T2
down_write(sem);
  blocks_on(sem);
    schedule();

T3
down_read(sem);		<- T3 cannot take sem for read and blocks behind T2
...			===> DEADLOCK!
wake_waiters();

Though there is a very subtle semantical difference on RT versus the
following scenario:

T1
down_read(sem);
wait_for_event();
  schedule()

T2
if (down_write_trylock(sem))
  do_something()

T3
down_read(sem);
...
wake_waiters();

That works on mainline, but breaks on RT due to the single reader
restriction.

Yes, that's ugly and should be forbidden, but there is code in the mainline
kernel which relies on that (e.g. Radeon driver).

Finding and fixing such constructs is not an easy task and aside of that
the single reader restriction is a performance bottleneck.

After analyzing the writer sides of R/W semaphores I came to the conclusion
that down_writes() happen in expensive code pathes which should not be
invoked in high priority tasks anyway. And if user space is stupid enough
to do so, then it's nothing we should worry about. Doing mmap() in your
high priority task is stupid to begin with.

The following patch series changes the RT implementation of R/W sempahores
to a multi reader model, which is not writer fair. That means writers have
to wait until the last reader left the critical section and readers are
allowed to take the semaphore for read even when a writer is blocked.

This means there is a risk of writer starvation, but the pathological
workloads which trigger it, are not necessarily the typical RT workloads.

It cures the Radeon mess, lowers the contention on mmap_sem for certain
workloads and did not have any negative impact in our initial testing on RT
behaviour.

I think it's worth to expose it to a wider audience of users for testing,
so we can figure out it whether there are dragons lurking.

Thanks,

	tglx
Comment 2 Clark Williams 2017-04-03 11:30 EDT
Created attachment 1268419 [details]
[patch RT 1_4] rtmutex: Make lock_killable work
Comment 3 Clark Williams 2017-04-03 11:31 EDT
Created attachment 1268420 [details]
[patch RT 2_4] rtmutex: Provide rt_mutex_lock_state()
Comment 4 Clark Williams 2017-04-03 11:32 EDT
Created attachment 1268421 [details]
[patch RT 3_4] rtmutex: Provide locked slowpath
Comment 5 Clark Williams 2017-04-03 11:32 EDT
Created attachment 1268422 [details]
[patch RT 4_4] rwsem_rt: Lift single reader restriction
Comment 9 errata-xmlrpc 2017-08-01 15:02:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077
Comment 10 errata-xmlrpc 2017-08-01 20:25:36 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077

Note You need to log in before you can comment on or make changes to this bug.