Bug 121796

Summary:	[PATCH] wtd_down bugfix on IA64
Product:	Red Hat Enterprise Linux 2.1	Reporter:	Van Okamura <van.okamura>
Component:	kernel	Assignee:	Jason Baron <jbaron>
Status:	CLOSED ERRATA	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	2.1	CC:	knoel, riel
Target Milestone:	---
Target Release:	---
Hardware:	ia64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-08-18 14:41:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	116726

Description Van Okamura 2004-04-27 21:36:29 UTC

Description of problem from mark.fasheh:
A while back, a bug in __wtd_down_action was fixed in AS 2.1 x86,
but somehow never made it's way into the ia64 tree. Attached is a
patch to port the bug fix to ia64. Basically in
arch/ia64/kernel/semaphore.c, __wtd_down_action() calls __wtd_down,
which incements sem->sleepers which resulted in some invalid sem
counts under heavy load.  This shouldn't happen, so the solution was
to call a new function __wtd_down_from_wakeup() which didn't increment
sem->sleeprs.

This resulted in a P1 bug for us from a large customer. After giving
them a kernel with that fix, they've been running for days without any
problem.  We'd like to see this in the ia64 tree ASAP.

Below are some e-mail excerpt which further explain the issue and
provide a method for reproducing it.
        --Mark

>> Routine __wtd_down() in arch/i386/kernel/semaphore.c is the
>> AIO equivalent
>> of routine __down(), basically handling down() failures and
>> blocking until
>> someone up()s the semaphore.  The major difference is that
>> "sem->sleepers++"  happens only once each time __down() is
>> called, when
>> the process blocks, the for loop does not re-execute the
>> "sem->sleepers++".
>>
>> __wtd_down() on the other hand executes the "sem->sleepers++" multiple
>> times. __wtd_down is initially called from "__wtd_down_failed" in the
>> context of the process calling the AIO syscall, and when it
>> fails, again
>> from "__wtd_down_action()" when whoever owns the inode
>> semapore up()s it.
>> This causes the combination of the sem->sleepers and the
>> sem->count to get
>> into an inconsistant state thereby allowing multiple down()s
>> to the same
>> semaphore to succeed at the same time, but the count never
>> gets dropped
>> below zero!!!  Subsequent up()s to that semaphore result in
>> the sem->count
>> getting bumped to > 1.
>>
>> The inode eventually gets freed and reallocated without resetting the
>> semaphore because inodes have a constructor in the slab
>> cache.  When the
>> inode gets used for a pipe, multiple down()s succeed and the pipe data
>> gets screwed up, leading to the infamous BUG in pipe.c.
>> We(and Oracle)
>> also think this is causing the Oracle listener process hangs
>> because pipes
>> are used between their processes.



Version-Release number of selected component (if applicable):
2.4.18-e.43

How reproducible:


Steps to Reproduce:
And here's a way to reproduce the problem:
> Reproducing the problem:
> Load 10+ tables simultaneously into Oracle using sqlldr. Repeat for
> 500-1000 iterations. The database must be configured with ASYNC IO ON,
> archive logging ON, and logs should be large (50 meg+) each. Wait for
> 20+ gigs of logs to accumulate. While still loading data, "rm" 20+ gig
> of log files. Using "rm" produces the most consistent results. Wait and
> repeat until the sqlldr logs show ORA-3113 errors or other TNS errors or
> loader processes simply hang. Eventually, all loader processes will
> simply hang. We can reproduce the problems 100% of the time using this
> method.
  
Actual results:


Expected results:


Additional info:

Comment 1 Van Okamura 2004-04-27 21:38:05 UTC

--- linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c.orig 2004-04-23
17:04:49.000000000-0700
+++ linux-2.4.18-e.43/arch/ia64/kernel/semaphore.c      2004-04-23
17:09:23.000000000-0700
@@ -46,6 +46,7 @@ __up (struct semaphore *sem)
 static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED;
 
 void __wtd_down(struct semaphore * sem, struct worktodo *wtd);
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo
*wtd);
 
 void __wtd_down_action(void *data)
 {
@@ -55,7 +56,7 @@ void __wtd_down_action(void *data)
        wtd_pop(wtd);
        sem = wtd->data;
 
-       __wtd_down(sem, wtd);
+       __wtd_down_from_wakeup(sem, wtd);
 }
 
 void __wtd_down_waiter(wait_queue_t *wait)
@@ -93,6 +94,33 @@ void __wtd_down(struct semaphore * sem, 
        }
 }
 
+/*
+ *  Same as __wtd_down, but sem->sleepers is not incremented when
coming from a
wakeup.
+ */
+void __wtd_down_from_wakeup(struct semaphore * sem, struct worktodo *wtd)
+{
+       int gotit;
+       int sleepers;
+
+       init_waitqueue_func_entry(&wtd->wait, __wtd_down_waiter);
+       wtd->data = sem;
+
+       spin_lock_irq(&semaphore_lock);
+       sleepers = sem->sleepers;
+       gotit = add_wait_queue_exclusive_cond(&sem->wait, &wtd->wait,
+                       atomic_add_negative(sleepers - 1, &sem->count));
+       if (gotit)
+               sem->sleepers = 0;
+       else
+               sem->sleepers = 1;
+       spin_unlock_irq(&semaphore_lock);
+
+       if (gotit) {
+               wake_up(&sem->wait);
+               wtd_queue(wtd);
+       }
+}
+
 /* Returns 0 if we acquired the semaphore, 1 if it was queued. */
 int wtd_down(struct worktodo *wtd, struct semaphore *sem)
 {

Comment 5 Jason Baron 2004-07-28 20:22:15 UTC

this is in U5 for ia64, changing to modified.

Comment 6 John Flanagan 2004-08-18 14:41:47 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-327.html