Bug 121796 - [PATCH] wtd_down bugfix on IA64
[PATCH] wtd_down bugfix on IA64
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Jason Baron
Brian Brock
:
Depends On:
Blocks: 116726
  Show dependency treegraph
 
Reported: 2004-04-27 17:36 EDT by Van Okamura
Modified: 2013-03-06 00:56 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-08-18 10:41:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Van Okamura 2004-04-27 17:36:29 EDT
Description of problem from mark.fasheh@oracle.com:
A while back, a bug in __wtd_down_action was fixed in AS 2.1 x86,
but somehow never made it's way into the ia64 tree. Attached is a
patch to port the bug fix to ia64. Basically in
arch/ia64/kernel/semaphore.c, __wtd_down_action() calls __wtd_down,
which incements sem->sleepers which resulted in some invalid sem
counts under heavy load.  This shouldn't happen, so the solution was
to call a new function __wtd_down_from_wakeup() which didn't increment
sem->sleeprs.

This resulted in a P1 bug for us from a large customer. After giving
them a kernel with that fix, they've been running for days without any
problem.  We'd like to see this in the ia64 tree ASAP.

Below are some e-mail excerpt which further explain the issue and
provide a method for reproducing it.
        --Mark

>> Routine __wtd_down() in arch/i386/kernel/semaphore.c is the
>> AIO equivalent
>> of routine __down(), basically handling down() failures and
>> blocking until
>> someone up()s the semaphore.  The major difference is that
>> "sem->sleepers++"  happens only once each time __down() is
>> called, when
>> the process blocks, the for loop does not re-execute the
>> "sem->sleepers++".
>>
>> __wtd_down() on the other hand executes the "sem->sleepers++" multiple
>> times. __wtd_down is initially called from "__wtd_down_failed" in the
>> context of the process calling the AIO syscall, and when it
>> fails, again
>> from "__wtd_down_action()" when whoever owns the inode
>> semapore up()s it.
>> This causes the combination of the sem->sleepers and the
>> sem->count to get
>> into an inconsistant state thereby allowing multiple down()s
>> to the same
>> semaphore to succeed at the same time, but the count never
>> gets dropped
>> below zero!!!  Subsequent up()s to that semaphore result in
>> the sem->count
>> getting bumped to > 1.
>>
>> The inode eventually gets freed and reallocated without resetting the
>> semaphore because inodes have a constructor in the slab
>> cache.  When the
>> inode gets used for a pipe, multiple down()s succeed and the pipe data
>> gets screwed up, leading to the infamous BUG in pipe.c.
>> We(and Oracle)
>> also think this is causing the Oracle listener process hangs
>> because pipes
>> are used between their processes.



Version-Release number of selected component (if applicable):
2.4.18-e.43

How reproducible:


Steps to Reproduce:
And here's a way to reproduce the problem:
> Reproducing the problem:
> Load 10+ tables simultaneously into Oracle using sqlldr. Repeat for
> 500-1000 iterations. The database must be configured with ASYNC IO ON,
> archive logging ON, and logs should be large (50 meg+) each. Wait for
> 20+ gigs of logs to accumulate. While still loading data, "rm" 20+ gig
> of log files. Using "rm" produces the most consistent results. Wait and
> repeat until the sqlldr logs show ORA-3113 errors or other TNS errors or
> loader processes simply hang. Eventually, all loader processes will
> simply hang. We can reproduce the problems 100% of the time using this
> method.
  
Actual results:


Expected results:


Additional info:
Comment 1 Van Okamura 2004-04-27 17:38:05 EDT
--- linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c.orig 2004-04-23
17:04:49.000000000-0700
+++ linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c      2004-04-23
17:09:23.000000000-0700
@@ -46,6 +46,7 @@ __up (struct semaphore *sem)
 static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED;
 
 void __wtd_down(struct semaphore * sem, struct worktodo *wtd);
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo
*wtd);
 
 void __wtd_down_action(void *data)
 {
@@ -55,7 +56,7 @@ void __wtd_down_action(void *data)
        wtd_pop(wtd);
        sem = wtd->data;
 
-       __wtd_down(sem, wtd);
+       __wtd_down_from_wakeup(sem, wtd);
 }
 
 void __wtd_down_waiter(wait_queue_t *wait)
@@ -93,6 +94,33 @@ void __wtd_down(struct semaphore * sem, 
        }
 }
 
+/*
+ *  Same as __wtd_down, but sem->sleepers is not incremented when
coming from a
wakeup.
+ */
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo *wtd)
+{
+       int gotit;
+       int sleepers;
+
+       init_waitqueue_func_entry(&wtd->wait, __wtd_down_waiter);
+       wtd->data = sem;
+
+       spin_lock_irq(&semaphore_lock);
+       sleepers = sem->sleepers;
+       gotit = add_wait_queue_exclusive_cond(&sem->wait, &wtd->wait,
+                       atomic_add_negative(sleepers - 1, &sem->count));
+       if (gotit)
+               sem->sleepers = 0;
+       else
+               sem->sleepers = 1;
+       spin_unlock_irq(&semaphore_lock);
+
+       if (gotit) {
+               wake_up(&sem->wait);
+               wtd_queue(wtd);
+       }
+}
+
 /* Returns 0 if we acquired the semaphore, 1 if it was queued. */
 int wtd_down(struct worktodo *wtd, struct semaphore *sem)
 {


Comment 5 Jason Baron 2004-07-28 16:22:15 EDT
this is in U5 for ia64, changing to modified.
Comment 6 John Flanagan 2004-08-18 10:41:47 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-327.html

Note You need to log in before you can comment on or make changes to this bug.