Bug 149933 - fix missing wakeup in ipc/sem
fix missing wakeup in ipc/sem
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alexander Viro
Brian Brock
:
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2005-02-28 16:01 EST by Tim Burke
Modified: 2010-10-21 22:49 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 17:02:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tim Burke 2005-02-28 16:01:10 EST
Alan bounced me this one. Any extra eyeballs appreciated.

		Dave




Subject:
[Fwd: [PATCH] fix missing wakeup in ipc/sem]
From:
Alan Cox <alan@lxorguk.ukuu.org.uk>
Date:
Sat, 11 Dec 2004 16:49:52 +0000
To:
davej@redhat.com

-----Forwarded Message-----

>> From: Manfred Spraul <manfred@colorfullife.com>
>> To: Andrew Morton <akpm@osdl.org>
>> Cc: Michael Kerrisk <mtk-lkml@gmx.net>, alan@redhat.com,
michael.kerrisk@gmx.net, linux-kernel@vger.kernel.org
>> Subject: [PATCH] fix missing wakeup in ipc/sem
>> Date: Thu, 09 Dec 2004 19:27:04 +0100
>> 
>> Hi Andrew,
>> 
>> My patch that removed the spin_lock calls from the tail of
>> sys_semtimedop introduced a bug:
>> Before my patch was merged, every operation that altered an array called
>> update_queue. That call woke up threads that were waiting until a
>> semaphore value becomes 0. I've accidentially removed that call.
>> 
>> The attached patch fixes that by modifying update_queue: the function
>> now loops internally and wakes up all threads. The patch also removes
>> update_queue calls from the error path of sys_semtimedop: failed
>> operations do not modify the array, no need to rescan the list of
>> waiting threads.
>> 
>> Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
>> 
>> 
>> 
>> 
>> ______________________________________________________________________
>> // $Header$
>> // Kernel Version:
>> //  VERSION = 2
>> //  PATCHLEVEL = 6
>> //  SUBLEVEL = 10
>> //  EXTRAVERSION =-rc2-mm4
>> --- 2.6/include/linux/sem.h	2004-12-05 16:20:31.000000000 +0100
>> +++ build-2.6/include/linux/sem.h	2004-12-09 18:39:58.000000000 +0100
>> @@ -109,6 +109,7 @@ struct sem_queue {
>>  	int			id;	 /* internal sem id */
>>  	struct sembuf *		sops;	 /* array of pending operations */
>>  	int			nsops;	 /* number of operations */
>> +	int			alter;   /* does the operation alter the array? */
>>  };
>>  
>>  /* Each task has a list of undo requests. They are executed automatically
>> --- 2.6/ipc/sem.c	2004-12-05 16:21:39.000000000 +0100
>> +++ build-2.6/ipc/sem.c	2004-12-09 19:13:19.000000000 +0100
>> @@ -358,8 +358,22 @@ static void update_queue (struct sem_arr
>>  		if (error <= 0) {
>>  			struct sem_queue *n;
>>  			remove_from_queue(sma,q);
>> -			n = q->next;
>>  			q->status = IN_WAKEUP;
>> +			/*
>> +			 * Continue scanning. The next operation
>> +			 * that must be checked depends on the type of the
>> +			 * completed operation:
>> +			 * - if the operation modified the array, then
>> +			 *   restart from the head of the queue and
>> +			 *   check for threads that might be waiting
>> +			 *   for semaphore values to become 0.
>> +			 * - if the operation didn't modify the array,
>> +			 *   then just continue.
>> +			 */
>> +			if (q->alter)
>> +				n = sma->sem_pending;
>> +			else
>> +				n = q->next;
>>  			wake_up_process(q->sleeper);
>>  			/* hands-off: q will disappear immediately after
>>  			 * writing q->status.
>> @@ -1119,8 +1133,11 @@ retry_undos:
>>  		goto out_unlock_free;
>>  
>>  	error = try_atomic_semop (sma, sops, nsops, un, current->tgid);
>> -	if (error <= 0)
>> -		goto update;
>> +	if (error <= 0) {
>> +		if (alter && error == 0)
>> +			update_queue (sma);
>> +		goto out_unlock_free;
>> +	}
>>  
>>  	/* We need to sleep on this operation, so we put the current
>>  	 * task into the pending queue and go to sleep.
>> @@ -1132,6 +1149,7 @@ retry_undos:
>>  	queue.undo = un;
>>  	queue.pid = current->tgid;
>>  	queue.id = semid;
>> +	queue.alter = alter;
>>  	if (alter)
>>  		append_to_queue(sma ,&queue);
>>  	else
>> @@ -1183,9 +1201,6 @@ retry_undos:
>>  	remove_from_queue(sma,&queue);
>>  	goto out_unlock_free;
>>  
>> -update:
>> -	if (alter)
>> -		update_queue (sma);
>>  out_unlock_free:
>>  	sem_unlock(sma);
>>  out_free:
Comment 1 Norm Murray 2006-01-30 20:53:36 EST
posting patch to rhkernel-list for review
Comment 5 Doug Chapman 2006-03-27 14:49:00 EST
FYI, 

We have now seen this inside HP on a new platform about to be shipped to a high
profile customer.  I have tested this on ia64 with the 2.6.9-34 source + this
patch and verified it does fix our issue and we have not seen any other problems
with it.

- Doug
Comment 6 Jeff Burke 2006-03-27 15:09:22 EST
Please note: 
This patch was applied to the RHEL4-U4 beta kernel 2.6.9-34.8 kernel.
Please note: The RHTS test "/kernel/drivers/modsym" failes with this patch
applied. It causes a regression with kabi. 

http://master.rhts.boston.redhat.com/cgi-bin/rhts/jobs.cgi?id=1201
Navigate to recipeid 6559, Then to /kernel/drivers/modsym test results.

Error: Kernel symbol changed: register_security from: ca.modlist
Error: Kernel symbol changed: security_ops from: ca.modlist
Error: Kernel symbol changed: unregister_security from: ca.modlist
Error: Kernel symbol changed: security_ops from: veritas.modlist
Comment 7 Doug Chapman 2006-03-27 15:16:52 EST
I have modified the patch with:

#ifndef __GENKSYMS__
       int                     alter;   /* does the operation alter the array? */
#endif

and I am building a test kernel based on the 2.6.9-34 source and I will have the
guys in HP that were blocked on this test it out.

Comment 8 Jason Baron 2006-03-30 17:53:14 EST
committed in stream u3 build 34.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 12 Mike Gahagan 2006-05-25 16:13:30 EDT
Seems to pass on power when run by hand on both 36.1 and -37 (log below).

[root@ibm-hv2-lp4 tmp]# rhts-run-package
/tmp/rhts-build-YQjeB643/install/ppc64pseries/rhts-149933-tests-1.0-1.ppc64pseries.rpm
/tmp/rhts-run-package-etTa2077 /tmp
30 blocks
/tmp/rhts-run-package-etTa2077/mnt/tests
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests/kernel
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests/kernel/syscalls
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp/rhts-run-package-etTa2077/mnt/tests/kernel/syscalls/149933
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
***** Start of runtest.sh *****
Checking log file
Count=1, Sleeping for 5s
Count=2, Sleeping for 5s
Slave 1: semaphore id 0
Slave 1: started and doing sleep for 10 secs.
Slave 1: woke up - upping count.
Slave 1: waiting for go.
Slave 1: got the go all done.
Controller: semaphore id 0
Controller: set up to tell slaves to wait
Controller: starting slave 0
Controller: starting slave 1
Controller:Waiting on slaves and then telling them to go.
Controller: Done deal - told them all slaves to go
Slave 0: semaphore id 0
Slave 0: started and doing sleep for 10 secs.
Slave 0: woke up - upping count.
Slave 0: waiting for go.
Slave 0: got the go all done.
All slaves done
Done checking log file
All slaves exited Passed:
Checking process list
No slave process running Passed:
Done checking process list
***** End of runtest.sh *****
Reporting test PASS as 0
test result submitted with the following arguments:
-t PASS -S -r 0 -v 0 -q developer -D /tmp/dmesg.log -T
storing results in /tmp/results.log
/tmp/rhts-run-package-etTa2077/mnt/tests /tmp
/tmp
[root@ibm-hv2-lp4 tmp]# cat results.log
***** Start of runtest.sh *****
Checking log file
Count=1, Sleeping for 5s
Count=2, Sleeping for 5s
Slave 1: semaphore id 0
Slave 1: started and doing sleep for 10 secs.
Slave 1: woke up - upping count.
Slave 1: waiting for go.
Slave 1: got the go all done.
Controller: semaphore id 0
Controller: set up to tell slaves to wait
Controller: starting slave 0
Controller: starting slave 1
Controller:Waiting on slaves and then telling them to go.
Controller: Done deal - told them all slaves to go
Slave 0: semaphore id 0
Slave 0: started and doing sleep for 10 secs.
Slave 0: woke up - upping count.
Slave 0: waiting for go.
Slave 0: got the go all done.
All slaves done
Done checking log file
All slaves exited Passed:
Checking process list
No slave process running Passed:
Done checking process list
***** End of runtest.sh *****
[root@ibm-hv2-lp4 tmp]# uname -a
Linux ibm-hv2-lp4.test.redhat.com 2.6.9-37.EL #1 SMP Fri May 19 18:05:11 EDT
2006 ppc64 ppc64 ppc64 GNU/Linux
[root@ibm-hv2-lp4 tmp]#
Comment 13 Jason Baron 2006-05-25 16:18:37 EDT
Mike, so you think the rhts failed ppc run, was something wrong with rhts?
Comment 16 Red Hat Bugzilla 2006-08-10 17:02:03 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.