Alan bounced me this one. Any extra eyeballs appreciated. Dave Subject: [Fwd: [PATCH] fix missing wakeup in ipc/sem] From: Alan Cox <alan.org.uk> Date: Sat, 11 Dec 2004 16:49:52 +0000 To: davej -----Forwarded Message----- >> From: Manfred Spraul <manfred> >> To: Andrew Morton <akpm> >> Cc: Michael Kerrisk <mtk-lkml>, alan, michael.kerrisk, linux-kernel.org >> Subject: [PATCH] fix missing wakeup in ipc/sem >> Date: Thu, 09 Dec 2004 19:27:04 +0100 >> >> Hi Andrew, >> >> My patch that removed the spin_lock calls from the tail of >> sys_semtimedop introduced a bug: >> Before my patch was merged, every operation that altered an array called >> update_queue. That call woke up threads that were waiting until a >> semaphore value becomes 0. I've accidentially removed that call. >> >> The attached patch fixes that by modifying update_queue: the function >> now loops internally and wakes up all threads. The patch also removes >> update_queue calls from the error path of sys_semtimedop: failed >> operations do not modify the array, no need to rescan the list of >> waiting threads. >> >> Signed-Off-By: Manfred Spraul <manfred> >> >> >> >> >> ______________________________________________________________________ >> // $Header$ >> // Kernel Version: >> // VERSION = 2 >> // PATCHLEVEL = 6 >> // SUBLEVEL = 10 >> // EXTRAVERSION =-rc2-mm4 >> --- 2.6/include/linux/sem.h 2004-12-05 16:20:31.000000000 +0100 >> +++ build-2.6/include/linux/sem.h 2004-12-09 18:39:58.000000000 +0100 >> @@ -109,6 +109,7 @@ struct sem_queue { >> int id; /* internal sem id */ >> struct sembuf * sops; /* array of pending operations */ >> int nsops; /* number of operations */ >> + int alter; /* does the operation alter the array? */ >> }; >> >> /* Each task has a list of undo requests. They are executed automatically >> --- 2.6/ipc/sem.c 2004-12-05 16:21:39.000000000 +0100 >> +++ build-2.6/ipc/sem.c 2004-12-09 19:13:19.000000000 +0100 >> @@ -358,8 +358,22 @@ static void update_queue (struct sem_arr >> if (error <= 0) { >> struct sem_queue *n; >> remove_from_queue(sma,q); >> - n = q->next; >> q->status = IN_WAKEUP; >> + /* >> + * Continue scanning. The next operation >> + * that must be checked depends on the type of the >> + * completed operation: >> + * - if the operation modified the array, then >> + * restart from the head of the queue and >> + * check for threads that might be waiting >> + * for semaphore values to become 0. >> + * - if the operation didn't modify the array, >> + * then just continue. >> + */ >> + if (q->alter) >> + n = sma->sem_pending; >> + else >> + n = q->next; >> wake_up_process(q->sleeper); >> /* hands-off: q will disappear immediately after >> * writing q->status. >> @@ -1119,8 +1133,11 @@ retry_undos: >> goto out_unlock_free; >> >> error = try_atomic_semop (sma, sops, nsops, un, current->tgid); >> - if (error <= 0) >> - goto update; >> + if (error <= 0) { >> + if (alter && error == 0) >> + update_queue (sma); >> + goto out_unlock_free; >> + } >> >> /* We need to sleep on this operation, so we put the current >> * task into the pending queue and go to sleep. >> @@ -1132,6 +1149,7 @@ retry_undos: >> queue.undo = un; >> queue.pid = current->tgid; >> queue.id = semid; >> + queue.alter = alter; >> if (alter) >> append_to_queue(sma ,&queue); >> else >> @@ -1183,9 +1201,6 @@ retry_undos: >> remove_from_queue(sma,&queue); >> goto out_unlock_free; >> >> -update: >> - if (alter) >> - update_queue (sma); >> out_unlock_free: >> sem_unlock(sma); >> out_free:
posting patch to rhkernel-list for review
FYI, We have now seen this inside HP on a new platform about to be shipped to a high profile customer. I have tested this on ia64 with the 2.6.9-34 source + this patch and verified it does fix our issue and we have not seen any other problems with it. - Doug
Please note: This patch was applied to the RHEL4-U4 beta kernel 2.6.9-34.8 kernel. Please note: The RHTS test "/kernel/drivers/modsym" failes with this patch applied. It causes a regression with kabi. http://master.rhts.boston.redhat.com/cgi-bin/rhts/jobs.cgi?id=1201 Navigate to recipeid 6559, Then to /kernel/drivers/modsym test results. Error: Kernel symbol changed: register_security from: ca.modlist Error: Kernel symbol changed: security_ops from: ca.modlist Error: Kernel symbol changed: unregister_security from: ca.modlist Error: Kernel symbol changed: security_ops from: veritas.modlist
I have modified the patch with: #ifndef __GENKSYMS__ int alter; /* does the operation alter the array? */ #endif and I am building a test kernel based on the 2.6.9-34 source and I will have the guys in HP that were blocked on this test it out.
committed in stream u3 build 34.10. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
Seems to pass on power when run by hand on both 36.1 and -37 (log below). [root@ibm-hv2-lp4 tmp]# rhts-run-package /tmp/rhts-build-YQjeB643/install/ppc64pseries/rhts-149933-tests-1.0-1.ppc64pseries.rpm /tmp/rhts-run-package-etTa2077 /tmp 30 blocks /tmp/rhts-run-package-etTa2077/mnt/tests /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests/kernel /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests/kernel/syscalls /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp/rhts-run-package-etTa2077/mnt/tests/kernel/syscalls/149933 /tmp/rhts-run-package-etTa2077/mnt/tests /tmp ***** Start of runtest.sh ***** Checking log file Count=1, Sleeping for 5s Count=2, Sleeping for 5s Slave 1: semaphore id 0 Slave 1: started and doing sleep for 10 secs. Slave 1: woke up - upping count. Slave 1: waiting for go. Slave 1: got the go all done. Controller: semaphore id 0 Controller: set up to tell slaves to wait Controller: starting slave 0 Controller: starting slave 1 Controller:Waiting on slaves and then telling them to go. Controller: Done deal - told them all slaves to go Slave 0: semaphore id 0 Slave 0: started and doing sleep for 10 secs. Slave 0: woke up - upping count. Slave 0: waiting for go. Slave 0: got the go all done. All slaves done Done checking log file All slaves exited Passed: Checking process list No slave process running Passed: Done checking process list ***** End of runtest.sh ***** Reporting test PASS as 0 test result submitted with the following arguments: -t PASS -S -r 0 -v 0 -q developer -D /tmp/dmesg.log -T storing results in /tmp/results.log /tmp/rhts-run-package-etTa2077/mnt/tests /tmp /tmp [root@ibm-hv2-lp4 tmp]# cat results.log ***** Start of runtest.sh ***** Checking log file Count=1, Sleeping for 5s Count=2, Sleeping for 5s Slave 1: semaphore id 0 Slave 1: started and doing sleep for 10 secs. Slave 1: woke up - upping count. Slave 1: waiting for go. Slave 1: got the go all done. Controller: semaphore id 0 Controller: set up to tell slaves to wait Controller: starting slave 0 Controller: starting slave 1 Controller:Waiting on slaves and then telling them to go. Controller: Done deal - told them all slaves to go Slave 0: semaphore id 0 Slave 0: started and doing sleep for 10 secs. Slave 0: woke up - upping count. Slave 0: waiting for go. Slave 0: got the go all done. All slaves done Done checking log file All slaves exited Passed: Checking process list No slave process running Passed: Done checking process list ***** End of runtest.sh ***** [root@ibm-hv2-lp4 tmp]# uname -a Linux ibm-hv2-lp4.test.redhat.com 2.6.9-37.EL #1 SMP Fri May 19 18:05:11 EDT 2006 ppc64 ppc64 ppc64 GNU/Linux [root@ibm-hv2-lp4 tmp]#
Mike, so you think the rhts failed ppc run, was something wrong with rhts?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html