Bug 531552

Summary: threads on pthread_mutex_lock wake in fifo order, but posix specifies by priority
Product: Red Hat Enterprise Linux 5 Reporter: Jon Thomas <jthomas>
Component: kernelAssignee: Jon Thomas <jthomas>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: anderson, dhoward, dzickus, emcnabb, hjia, jarod, jpirko, kzhang, pzijlstr, sardella, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 06:54:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533858    
Attachments:
Description Flags
test case
none
patch against 5.4
none
same patch with tabbage fixed
none
patch with tabbage fixed #2 none

Description Jon Thomas 2009-10-28 18:00:30 UTC
Created attachment 366475 [details]
test case

I'll attach a test case and patch. Test cases succeeds in f11, fails in rhel5.4.

The issue is the threads waiting on the futex_q q list acquire the mutex lock in the order they are queued rather than by priority.

From man pthread_attr_setschedpolicy

       "When  threads
      executing   with   the  scheduling  policy  SCHED_FIFO,  SCHED_RR,   or
      SCHED_SPORADIC  are waiting on a mutex, they shall acquire the mutex in
      priority order when the mutex is unlocked."

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html) 

"If there are threads blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex." 

The problem is if you set policy to SCHED_FIFO, threads are scheduled SCHED_FIFO regardless of what their priority is set at.

I think the reason is that in rhel5, robust mutexes are still fifo in terms of order in which threads acquire the lock once the lock is unlocked. The reason is plist is not used and the queue is basically a normal linux list.

in rhel 5:

struct futex_q {
       struct list_head list;
       wait_queue_head_t waiters;
.....


In later kernels:


struct futex_q {
       struct plist_node list;
       wait_queue_head_t waiters;
....

Comment 1 Jon Thomas 2009-10-28 18:05:07 UTC
Created attachment 366476 [details]
patch against 5.4

I found an upstream commit that addresses this. I have attached a 5.4 port. This fixes the issue for realtime processes and for the test case. Apparently normal prio processes are still fifo.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ec92d08292d3e9b0823eba138a4564d2d39f25c7

Comment 9 Dave Anderson 2009-10-30 18:06:16 UTC
Given the changes to the futex_q and futex_hash_bucket structures, did you
build your test kernel via brew in order to verify that there are no KABI
issues?

Comment 10 Jon Thomas 2009-10-30 18:13:47 UTC
We build a test kernel via brew and gave it to the customer to test. You can find it at:

https://brewweb.devel.redhat.com/taskinfo?taskID=2051141

Comment 11 Dave Anderson 2009-10-30 18:24:20 UTC
Do you want to post your patch to rhkernel-list?

Comment 12 Jon Thomas 2009-10-30 18:46:25 UTC
Hi Dave, I sent it out

Comment 13 Dave Anderson 2009-10-30 18:54:20 UTC
Thanks Jon -- setting POST:

http://post-office.corp.redhat.com/archives/rhkernel-list/2009-October/msg00852.html

Comment 14 Jon Thomas 2009-10-30 21:04:36 UTC
Created attachment 366869 [details]
same patch with tabbage fixed

Comment 17 RHEL Program Management 2009-11-03 04:11:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Jon Thomas 2009-11-03 14:19:19 UTC
Created attachment 367305 [details]
patch with tabbage fixed #2

Comment 20 Don Zickus 2009-11-10 16:51:59 UTC
in kernel-2.6.18-173.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 24 errata-xmlrpc 2010-03-30 06:54:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html