Bug 531552 - threads on pthread_mutex_lock wake in fifo order, but posix specifies by priority
Summary: threads on pthread_mutex_lock wake in fifo order, but posix specifies by prio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jon Thomas
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 533858
TreeView+ depends on / blocked
 
Reported: 2009-10-28 18:00 UTC by Jon Thomas
Modified: 2018-10-27 15:59 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 06:54:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test case (2.11 KB, text/plain)
2009-10-28 18:00 UTC, Jon Thomas
no flags Details
patch against 5.4 (7.59 KB, patch)
2009-10-28 18:05 UTC, Jon Thomas
no flags Details | Diff
same patch with tabbage fixed (7.39 KB, patch)
2009-10-30 21:04 UTC, Jon Thomas
no flags Details | Diff
patch with tabbage fixed #2 (7.37 KB, patch)
2009-11-03 14:19 UTC, Jon Thomas
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Jon Thomas 2009-10-28 18:00:30 UTC
Created attachment 366475 [details]
test case

I'll attach a test case and patch. Test cases succeeds in f11, fails in rhel5.4.

The issue is the threads waiting on the futex_q q list acquire the mutex lock in the order they are queued rather than by priority.

From man pthread_attr_setschedpolicy

       "When  threads
      executing   with   the  scheduling  policy  SCHED_FIFO,  SCHED_RR,   or
      SCHED_SPORADIC  are waiting on a mutex, they shall acquire the mutex in
      priority order when the mutex is unlocked."

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html) 

"If there are threads blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex." 

The problem is if you set policy to SCHED_FIFO, threads are scheduled SCHED_FIFO regardless of what their priority is set at.

I think the reason is that in rhel5, robust mutexes are still fifo in terms of order in which threads acquire the lock once the lock is unlocked. The reason is plist is not used and the queue is basically a normal linux list.

in rhel 5:

struct futex_q {
       struct list_head list;
       wait_queue_head_t waiters;
.....


In later kernels:


struct futex_q {
       struct plist_node list;
       wait_queue_head_t waiters;
....

Comment 1 Jon Thomas 2009-10-28 18:05:07 UTC
Created attachment 366476 [details]
patch against 5.4

I found an upstream commit that addresses this. I have attached a 5.4 port. This fixes the issue for realtime processes and for the test case. Apparently normal prio processes are still fifo.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ec92d08292d3e9b0823eba138a4564d2d39f25c7

Comment 9 Dave Anderson 2009-10-30 18:06:16 UTC
Given the changes to the futex_q and futex_hash_bucket structures, did you
build your test kernel via brew in order to verify that there are no KABI
issues?

Comment 10 Jon Thomas 2009-10-30 18:13:47 UTC
We build a test kernel via brew and gave it to the customer to test. You can find it at:

https://brewweb.devel.redhat.com/taskinfo?taskID=2051141

Comment 11 Dave Anderson 2009-10-30 18:24:20 UTC
Do you want to post your patch to rhkernel-list?

Comment 12 Jon Thomas 2009-10-30 18:46:25 UTC
Hi Dave, I sent it out

Comment 13 Dave Anderson 2009-10-30 18:54:20 UTC
Thanks Jon -- setting POST:

http://post-office.corp.redhat.com/archives/rhkernel-list/2009-October/msg00852.html

Comment 14 Jon Thomas 2009-10-30 21:04:36 UTC
Created attachment 366869 [details]
same patch with tabbage fixed

Comment 17 RHEL Program Management 2009-11-03 04:11:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Jon Thomas 2009-11-03 14:19:19 UTC
Created attachment 367305 [details]
patch with tabbage fixed #2

Comment 20 Don Zickus 2009-11-10 16:51:59 UTC
in kernel-2.6.18-173.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 24 errata-xmlrpc 2010-03-30 06:54:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.