Bug 121796 - [PATCH] wtd_down bugfix on IA64
Summary: [PATCH] wtd_down bugfix on IA64
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: ia64
OS: Linux
Target Milestone: ---
Assignee: Jason Baron
QA Contact: Brian Brock
Depends On:
Blocks: 116726
TreeView+ depends on / blocked
Reported: 2004-04-27 21:36 UTC by Van Okamura
Modified: 2013-03-06 05:56 UTC (History)
2 users (show)

Clone Of:
Last Closed: 2004-08-18 14:41:46 UTC

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2004:327 normal SHIPPED_LIVE Important: kernel security update 2004-08-18 04:00:00 UTC

Description Van Okamura 2004-04-27 21:36:29 UTC
Description of problem from mark.fasheh@oracle.com:
A while back, a bug in __wtd_down_action was fixed in AS 2.1 x86,
but somehow never made it's way into the ia64 tree. Attached is a
patch to port the bug fix to ia64. Basically in
arch/ia64/kernel/semaphore.c, __wtd_down_action() calls __wtd_down,
which incements sem->sleepers which resulted in some invalid sem
counts under heavy load.  This shouldn't happen, so the solution was
to call a new function __wtd_down_from_wakeup() which didn't increment

This resulted in a P1 bug for us from a large customer. After giving
them a kernel with that fix, they've been running for days without any
problem.  We'd like to see this in the ia64 tree ASAP.

Below are some e-mail excerpt which further explain the issue and
provide a method for reproducing it.

>> Routine __wtd_down() in arch/i386/kernel/semaphore.c is the
>> AIO equivalent
>> of routine __down(), basically handling down() failures and
>> blocking until
>> someone up()s the semaphore.  The major difference is that
>> "sem->sleepers++"  happens only once each time __down() is
>> called, when
>> the process blocks, the for loop does not re-execute the
>> "sem->sleepers++".
>> __wtd_down() on the other hand executes the "sem->sleepers++" multiple
>> times. __wtd_down is initially called from "__wtd_down_failed" in the
>> context of the process calling the AIO syscall, and when it
>> fails, again
>> from "__wtd_down_action()" when whoever owns the inode
>> semapore up()s it.
>> This causes the combination of the sem->sleepers and the
>> sem->count to get
>> into an inconsistant state thereby allowing multiple down()s
>> to the same
>> semaphore to succeed at the same time, but the count never
>> gets dropped
>> below zero!!!  Subsequent up()s to that semaphore result in
>> the sem->count
>> getting bumped to > 1.
>> The inode eventually gets freed and reallocated without resetting the
>> semaphore because inodes have a constructor in the slab
>> cache.  When the
>> inode gets used for a pipe, multiple down()s succeed and the pipe data
>> gets screwed up, leading to the infamous BUG in pipe.c.
>> We(and Oracle)
>> also think this is causing the Oracle listener process hangs
>> because pipes
>> are used between their processes.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
And here's a way to reproduce the problem:
> Reproducing the problem:
> Load 10+ tables simultaneously into Oracle using sqlldr. Repeat for
> 500-1000 iterations. The database must be configured with ASYNC IO ON,
> archive logging ON, and logs should be large (50 meg+) each. Wait for
> 20+ gigs of logs to accumulate. While still loading data, "rm" 20+ gig
> of log files. Using "rm" produces the most consistent results. Wait and
> repeat until the sqlldr logs show ORA-3113 errors or other TNS errors or
> loader processes simply hang. Eventually, all loader processes will
> simply hang. We can reproduce the problems 100% of the time using this
> method.
Actual results:

Expected results:

Additional info:

Comment 1 Van Okamura 2004-04-27 21:38:05 UTC
--- linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c.orig 2004-04-23
+++ linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c      2004-04-23
@@ -46,6 +46,7 @@ __up (struct semaphore *sem)
 static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED;
 void __wtd_down(struct semaphore * sem, struct worktodo *wtd);
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo
 void __wtd_down_action(void *data)
@@ -55,7 +56,7 @@ void __wtd_down_action(void *data)
        sem = wtd->data;
-       __wtd_down(sem, wtd);
+       __wtd_down_from_wakeup(sem, wtd);
 void __wtd_down_waiter(wait_queue_t *wait)
@@ -93,6 +94,33 @@ void __wtd_down(struct semaphore * sem, 
+ *  Same as __wtd_down, but sem->sleepers is not incremented when
coming from a
+ */
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo *wtd)
+       int gotit;
+       int sleepers;
+       init_waitqueue_func_entry(&wtd->wait, __wtd_down_waiter);
+       wtd->data = sem;
+       spin_lock_irq(&semaphore_lock);
+       sleepers = sem->sleepers;
+       gotit = add_wait_queue_exclusive_cond(&sem->wait, &wtd->wait,
+                       atomic_add_negative(sleepers - 1, &sem->count));
+       if (gotit)
+               sem->sleepers = 0;
+       else
+               sem->sleepers = 1;
+       spin_unlock_irq(&semaphore_lock);
+       if (gotit) {
+               wake_up(&sem->wait);
+               wtd_queue(wtd);
+       }
 /* Returns 0 if we acquired the semaphore, 1 if it was queued. */
 int wtd_down(struct worktodo *wtd, struct semaphore *sem)

Comment 5 Jason Baron 2004-07-28 20:22:15 UTC
this is in U5 for ia64, changing to modified.

Comment 6 John Flanagan 2004-08-18 14:41:47 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.