Bug 1410052
| Summary: | glibc: pthread_rwlock_rdlock && POSIX 2008 compliance | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Pavel Raiskup <praiskup> | ||||
| Component: | glibc | Assignee: | glibc team <glibc-bugzilla> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | qe-baseos-tools-bugs | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.1 | CC: | ashankar, bruno, codonell, dan, dj, dominic, eblake, fweimer, hannsj_uhl, kdudka, mnewsome, mtasaka, pfrankli, triegel | ||||
| Target Milestone: | pre-dev-freeze | Keywords: | Triaged | ||||
| Target Release: | 8.1 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-03-02 15:27:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 200247, 1757557 | ||||||
| Bug Blocks: | 1406031 | ||||||
| Attachments: |
|
||||||
|
Description
Pavel Raiskup
2017-01-04 10:58:30 UTC
Relevant Fedora discussions: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/PQD576JZLERFY6ROI3GF7UYXKZIRI33G/ A simpler way to reproduce the issue is through a test case of the Linux test project: $ wget https://raw.githubusercontent.com/linux-test-project/ltp/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c $ wget https://raw.githubusercontent.com/linux-test-project/ltp/master/testcases/open_posix_testsuite/include/posixtest.h $ gcc -O -Wall 2-2.c -lpthread $ ./a.out main: attempt read lock main: acquired read lock main: create wr_thread, with priority: 2 wr_thread: attempt write lock main: create rd_thread, with priority: 2 rd_thread: attempt read lock rd_thread: acquired read lock rd_thread: unlock read lock Test FAILED: rd_thread did not block on read lock, when a reader owns the lock, and an equal priority writer is waiting for the lock In an 'ltrace' invocation, I see calls to pthread_rwlock_rdlock. In an 'strace' invocation, I only see calls to futex, rt_sigaction, rt_sigprocmask, which can be assumed to be correctly implemented. So, the blame is clearly on glibc. (In reply to Pavel Raiskup from comment #0) > Created attachment 1237090 [details] > Reproducer > > Per man pthread_rwlock_rdlock: > > DESCRIPTION > The calling thread acquires the read lock if a writer does not hold > the lock and there are no writers blocked on the lock. In the general case, it is implementation-defined whether readers or writers are more likely to acquire a rwlock. Please note that the sentence you cited only states that if no writers have locked or are blocked, then readers will acquire. This does not state that readers will acquire *only if* that is the case. http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_rdlock.html contains further statements: "If the Thread Execution Scheduling option is not supported, it is implementation-defined whether the calling thread acquires the lock when a writer does not hold the lock and there are writers blocked on the lock. If a writer holds the lock, the calling thread shall not acquire the read lock. If the read lock is not acquired, the calling thread shall block until it can acquire the lock. The calling thread may deadlock if at the time the call is made it holds a write lock." glibc does not support Thread Execution Scheduling, at least in the context of rwlocks: https://sourceware.org/bugzilla/show_bug.cgi?id=13701 Irrespective of whether glibc should support Thread Execution Scheduling, the reproducer (and the gnulib test case) do not expect Thread Execution Scheduling, so the test case cannot expect anything but implementation-defined behavior regarding reader/writer preference. glibc provides the PTHREAD_RWLOCK_PREFER*_NP types to control this and specifies the default behavior (to prefer readers). Furthermore, the Thread Execution Scheduling semantics are either prone to deadlock when used or conflict with POSIX allowing threads to acquire an rwlock more than once as a reader. Thus, these semantics do not seem worthwhile. Also, this would make rwlocks much less efficient, at least in the process-shared case (tracking which priorities are blocked requires space, which we can't allocate anywhere, so everything would have to be done in the kernel). I'm inclined to close this as NOTABUG because of the gnulib context, but we could keep it open as an RHBZ tracker for sourceware BZ 13701. However, I'm inclined to close 13701 as WONTFIX too (or NOTABUG, depending on further investigation of whether glibc actually promises to support Thread Execution Scheduling). (In reply to Torvald Riegel from comment #3) > In the general case, it is implementation-defined whether readers or writers > are more likely to acquire a rwlock. > > Please note that the sentence you cited only states that if no writers > have locked or are blocked, then readers will acquire. > This does not state that readers will acquire *only if* that is the > case. I was thinking about it similarly like you (I always try to look for 'shall' or 'iff' wording), but I considered this is just unfortunately spelled (candidate for fix); because if I should interpret this like you, there's no need to talk about 'locked or blocked writers'. > In the general case, it is implementation-defined whether readers or writers > are more likely to acquire a rwlock. According to: https://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00028.html this is no longer true in POSIX:2008. (In reply to Torvald Riegel from comment #3) > the reproducer (and the gnulib test case) do not expect Thread Execution > Scheduling, so the test case cannot expect anything but > implementation-defined behavior regarding reader/writer preference. Not true. The test case mentioned in comment #2 explicitly verifies that the symbol _POSIX_THREAD_PRIORITY_SCHEDULING is defined. It is actually defined in <bits/posix_opt.h> with a comment: /* We provide priority scheduling for threads. */ #define _POSIX_THREAD_PRIORITY_SCHEDULING 200809L The value is a positive constant. Per http://pubs.opengroup.org/onlinepubs/007904875/basedefs/xbd_chap02.html and http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html this means that glibc supports the Thread Priority Scheduling option of POSIX at compile time and at run time. The test activates the SCHED_FIFO on all of its threads. Therefore the premises of the sentence "If the Thread Execution Scheduling option is supported, and the threads involved in the lock are executing with the scheduling policies SCHED_FIFO or SCHED_RR, the calling thread shall not acquire the lock if a writer holds the lock or if writers of higher or equal priority are blocked on the lock; otherwise, the calling thread shall acquire the lock." from http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_rdlock.html are fulfilled. And the writer and readers in this test case have equal priority. > glibc > provides the PTHREAD_RWLOCK_PREFER*_NP types to control this and specifies > the default behavior (to prefer readers). The default behaviour (without explicit use of pthread_rwlockattr_setkind_np) is what we are discussing here. Referring to non-portable functions does not help in resolving standard compliance issues. > Furthermore, the Thread Execution Scheduling semantics are either prone to > deadlock when used or conflict with POSIX allowing threads to acquire an > rwlock more than once as a reader. Thus, these semantics do not seem > worthwhile. There are applications which need recursive rwlocks, and there are applications which don't need recursive rwlocks, because they can guarantee that this part of their code does not have recursion. If there's a conflict in the wording of POSIX TPS that causes problems for the first kind of applications, it's not helpful to deny the usefulness of POSIX for the second kind of application. Instead it would be useful to make corrections to POSIX TPS so that the conflict would be resolved - for the benefit of both kinds of applications. (In reply to Kevin Kofler from comment #5) > > In the general case, it is implementation-defined whether readers or writers > > are more likely to acquire a rwlock. > > According to: > https://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00028.html > this is no longer true in POSIX:2008. What I cited in comment #3 is the 2013 edition. (In reply to Bruno Haible from comment #6) > (In reply to Torvald Riegel from comment #3) > > the reproducer (and the gnulib test case) do not expect Thread Execution > > Scheduling, so the test case cannot expect anything but > > implementation-defined behavior regarding reader/writer preference. > > Not true. The test case mentioned in comment #2 explicitly verifies that the > symbol _POSIX_THREAD_PRIORITY_SCHEDULING is defined. It is actually defined > in <bits/posix_opt.h> with a comment: But this bug report here is about the gnulib testcase, right? Does gnulib depend on having support for TPS? > /* We provide priority scheduling for threads. */ > #define _POSIX_THREAD_PRIORITY_SCHEDULING 200809L > > The value is a positive constant. > Per http://pubs.opengroup.org/onlinepubs/007904875/basedefs/xbd_chap02.html > and http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html > this means that glibc supports the Thread Priority Scheduling option of > POSIX at compile time and at run time. > > The test activates the SCHED_FIFO on all of its threads. Therefore the > premises of the sentence "If the Thread Execution Scheduling option is > supported, and the threads involved in the lock are executing with the > scheduling policies SCHED_FIFO or SCHED_RR, the calling thread shall not > acquire the lock if a writer holds the lock or if writers of higher or equal > priority are blocked on the lock; otherwise, the calling thread shall > acquire the lock." from > http://pubs.opengroup.org/onlinepubs/9699919799/functions/ > pthread_rwlock_rdlock.html are fulfilled. And the writer and readers in this > test case have equal priority. In the LTP context, you are right. This is SWBZ 13701. So if there's a preference for that, we can keep this bug here open as a tracker for that. > > glibc > > provides the PTHREAD_RWLOCK_PREFER*_NP types to control this and specifies > > the default behavior (to prefer readers). > > The default behaviour (without explicit use of > pthread_rwlockattr_setkind_np) is what we are discussing here. Referring to > non-portable functions does not help in resolving standard compliance issues. Most of the time, I'd agree; however, the standard is arguably lacking in this case. > > Furthermore, the Thread Execution Scheduling semantics are either prone to > > deadlock when used or conflict with POSIX allowing threads to acquire an > > rwlock more than once as a reader. Thus, these semantics do not seem > > worthwhile. > > There are applications which need recursive rwlocks, and there are > applications which don't need recursive rwlocks, because they can guarantee > that this part of their code does not have recursion. That's why we have PREFER_READER_NP, PREFER_WRITER_NP, and PREFER_WRITER_NONRECURSIVE_NP. > If there's a conflict > in the wording of POSIX TPS that causes problems for the first kind of > applications, it's not helpful to deny the usefulness of POSIX for the > second kind of application. Instead it would be useful to make corrections > to POSIX TPS so that the conflict would be resolved - for the benefit of > both kinds of applications. IMO, the TPS requirements should just be removed. See http://austingroupbugs.net/view.php?id=1111 (In reply to Torvald Riegel from comment #8) > But this bug report here is about the gnulib testcase, right? No, this bug report is about 'Component: glibc'. The gnulib testcase is from today; in a couple of days I will have delivered a workaround along the lines suggested by Pavel Raiskup. > Does gnulib depend on having support for TPS? Comment #2 already makes it clear that it is irrelevant whether the gnulib testcase sets a scheduling policy or not. And when you look into the glibc code, you will also see that the pthread_rwlock_* functions do not even look at the scheduling policies of the threads. > In the LTP context, you are right. This is SWBZ 13701. So if there's a > preference for that, we can keep this bug here open as a tracker for that. Yes, that's what I think most people agree on: that this bug here and https://sourceware.org/bugzilla/show_bug.cgi?id=13701 are essentially the same thing. (In reply to Torvald Riegel from comment #8) > IMO, the TPS requirements should just be removed. See > http://austingroupbugs.net/view.php?id=1111 This issue has been resolved. The TPS requirements have been changed, but not been removed. The sentence in the pthread_rwlock_rdlock specification now reads: [TPS] If the Thread Execution Scheduling option is supported, and the threads involved in the lock are executing with the scheduling policies SCHED_FIFO or SCHED_RR, the calling thread shall not acquire the lock if a writer holds the lock or if the calling thread does not already hold a read lock and writers of higher or equal priority are blocked on the lock; otherwise, the calling thread shall acquire the lock. I have added a comment to SWBZ 13701. I still don't think glibc should try to implement the extra TPS behavior because that would penalize performance in the common case. I had the opportunity to review this issue upstream while working on a couple of new rwlock bugs. Even if we backport the new rwlock to RHEL7 it won't fix the key issue that we don't support TPS in the glibc implementation, but we *do* define _POSIX_THREAD_PRIORITY_SCHEDULING and that does impact the definition of these interfaces:
pthread_attr_getinheritsched(),
pthread_attr_getschedpolicy(),
pthread_attr_getscope(),
pthread_attr_setinheritsched(),
pthread_attr_setschedpolicy(),
pthread_attr_setscope(),
pthread_getschedparam(),
pthread_setschedparam(),
pthread_setschedprio().
For example in the case of pthread_setschedparam() we should be able to set SCHED_SPORADIC, but we don't define that. So in essence the entire set of interfaces is a big mess at this point because it doesn't match the Linux scheduling model and nobody has cleaned this up.
I am not going to change any of these interfaces in any stable RHEL release, all of these changes have to go upstream first. Therefore I'm moving this to RHEL 8 and will look to see what we can cleanup in upstream through documentation where we deviate from POSIX (like not supporting TPS in rwlocks or SCHED_SPORADIC either).
This is going to be a docs only change for Linux to specify non-compliance with POSIX regarding scheduling and thread priority scheduling.
We need to backport this fix too:
commit c70824b9a4645c0ecd049da8cfdb2c28ae7ada23
Author: Florian Weimer <fweimer>
Date: Sat Feb 2 14:13:33 2019 +0100
manual: Document lack of conformance of sched_* functions [BZ #14829]
On Linux, we define _POSIX_PRIORITY_SCHEDULING, but functions such
as sched_setparam and sched_setscheduler apply to individual threads,
not processes.
Reviewed-by: Carlos O'Donell <carlos>
(In reply to Carlos O'Donell from comment #15) > This is going to be a docs only change for Linux to specify non-compliance > with POSIX regarding scheduling and thread priority scheduling. > > We need to backport this fix too: > > commit c70824b9a4645c0ecd049da8cfdb2c28ae7ada23 > Author: Florian Weimer <fweimer> > Date: Sat Feb 2 14:13:33 2019 +0100 > > manual: Document lack of conformance of sched_* functions [BZ #14829] > > On Linux, we define _POSIX_PRIORITY_SCHEDULING, but functions such > as sched_setparam and sched_setscheduler apply to individual threads, > not processes. > > Reviewed-by: Carlos O'Donell <carlos> This came up again on the Austin Group call 2019-11-07, in reference to http://austingroupbugs.net/view.php?id=1111. Would it not be better to change _POSIX_PRIORITY_SCHEDULING to be defined as 0 to accurately reflect that Linux will NEVER implement this for processes, and then tweak the documentation of the various functions to mention that within individual threads, there IS support as if _POSIX_PRIORITY_SCHEDULING had a non-zero value? I see quite a bit of code like this:
// BOOST_HAS_SCHED_YIELD:
// This is predicated on _POSIX_PRIORITY_SCHEDULING or
// on _POSIX_THREAD_PRIORITY_SCHEDULING or on _XOPEN_REALTIME.
# if defined(_POSIX_PRIORITY_SCHEDULING) && (_POSIX_PRIORITY_SCHEDULING+0 > 0)\
|| (defined(_POSIX_THREAD_PRIORITY_SCHEDULING) && (_POSIX_THREAD_PRIORITY_SCHEDULING+0 > 0))\
|| (defined(_XOPEN_REALTIME) && (_XOPEN_REALTIME+0 >= 0))
# define BOOST_HAS_SCHED_YIELD
# endif
I'm that this would break if we changed the definition at this point.
Another possibility discussed in the Austin Group - it may be worth defining a specific constant value (maybe 2?), different from the usual '1' for always supported or 200809L currently used, and use that constant for the compile-time and/or sysconf() value of _POSIX_PRIORITY_SCHEDULING as a way to let applications be aware that the interfaces are present but not all of the thread scheduling guarantees can be met (at least not without utilizing further implementation-specific extensions). We'd have to open a new POSIX bug to propose wording along those lines, but it could be a nice compromise that could let us declare our implementation as POSIX conformant by using that constant value. Or, propose a new thread-scheduling policy (that matches what Linux actually has) for addition to the standard, optional just like TPS is optional, and advertise that policy instead of TPS. I'm not as well-versed in the specifics, so I would rely on your help in drafting any such proposals, but I'm happy to help champion any such proposal through the Austin Group if we can come up with wording. This bug needs to be handled upstream. Any interface changes we make need to be done upstream and then would be available in the next release of Red Hat Enterprise Linux. Making these changes immediately in any released version of RHEL would be quite problematic. We are tracking this bug upstream here in the upstream glibc bug tracker: https://sourceware.org/bugzilla/show_bug.cgi?id=25619 I'm am marking this bug CLOSED/UPSTREAM, and we can review this once the upstream bug is resolved. |