Bug 363911

Summary: oracle apps recieve EAGAIN error when attempting to use async io
Product: Red Hat Enterprise Linux 4 Reporter: Chuck Mead <csm>
Component: libaioAssignee: Jeff Moyer <jmoyer>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: urgent Docs Contact:
Priority: low    
Version: 4.5CC: evuraan, jwest, paul.hood
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-13 14:23:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chuck Mead 2007-11-02 14:46:04 UTC
Description of problem: ORA-27083: waiting for async I/Os failed
Linux-x86_64 Error: 7: Argument list too long
Wed Oct 17 16:37:03 2007
DBW0: terminating instance due to error 27083
oerr ORA 27083
27083, 00000, "waiting for async I/Os failed"
// *Cause:  The aio_waitn() library call returned an error.
// *Action: Check errno.



Version-Release number of selected component (if applicable):
libaio-0.3.105-2.x86_64


How reproducible:
Run oracle using aio.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info: Oracle asked us to open this bug report so they would have an
entry point into Red Hat for discussion about this error.

Comment 1 Anu Matthew 2007-11-02 14:49:27 UTC
Sadly, aio_waitn() does not seem like a Linux call..? 

That seems like a Generic Error message from Oracl's convenience. 

Comment 2 Jeff Moyer 2007-11-02 16:32:13 UTC
(In reply to comment #0)
> Description of problem: ORA-27083: waiting for async I/Os failed
> Linux-x86_64 Error: 7: Argument list too long
> Wed Oct 17 16:37:03 2007
> DBW0: terminating instance due to error 27083
> oerr ORA 27083
> 27083, 00000, "waiting for async I/Os failed"
> // *Cause:  The aio_waitn() library call returned an error.
> // *Action: Check errno.

As Anu mentions in comment #1, aio_waitn is not a libaio library call, nor is it
a libc call.  I looked through libaio-oracle, and it does not provide this
abstraction, either.  My guess is there is a portability layer within Oracle
that implements this for Linux, but that's just a guess.

So, could you please provide some data that points to a problem within either
libaio or the kernel's aio infrastructure?

Comment 3 Chuck Mead 2007-11-02 17:29:25 UTC
Jeff,
     Thanks for responding. We are pushing Oracle for a response here. Hopefully
it will come soon.

Comment 4 Paul Hood 2007-11-06 19:58:13 UTC
---
the e299_dbw0_5701.trc shows:
*** 2007-10-25 13:42:43.109
ksedmp: internal or fatal error
ORA-27083: waiting for async I/Os failed
Linux-x86_64 Error: 7: Argument list too long

the error: 7 should map to 
#define E2BIG            7      /* Arg list too long */

I expected to find this in the dbwr traces at the os level, but I do not find
E2BIG.  

the dbwr strace does show  EAGAIN on semtimdop numerous times:

3995  semtimedop(851971, 0x7fbfffdea0, 1, {1, 730000000}) = -1 EAGAIN (Resource
temporarily unavailable)

There was concern that this (EAGAIN) is a problem and possibly exhausting
something on the os side and it eventualy
results in the ORA-27083 ...Linux-x86_64 Error: 7: Argument list too long.

Frustratingly when trying to further trace the problem (ie errorstack in oracle)
to get more information on the ORA-27083, the customer indicates the problem
does not occur.  The same is true when trying to use strace on sqlplus.

we would like to get more information about the nature of the Linux-x86_64
Error: 7: Argument list too long..

Comment 5 Jeff Moyer 2007-11-06 20:44:18 UTC
(In reply to comment #4)
> ---
> the e299_dbw0_5701.trc shows:
> *** 2007-10-25 13:42:43.109
> ksedmp: internal or fatal error
> ORA-27083: waiting for async I/Os failed
> Linux-x86_64 Error: 7: Argument list too long
> 

> we would like to get more information about the nature of the Linux-x86_64
> Error: 7: Argument list too long..

See the man pages for semop and semtimedop:

       E2BIG  The argument nsops is greater than SEMOPM, the maximum number of
              operations allowed per system call.

       The  semval, sempid, semzcnt, and semnct values for a semaphore can all
       be retrieved using appropriate semctl(2) calls.

       The following limits on semaphore  set  resources  affect  the  semop()
       call:

       SEMOPM Maximum  number  of operations allowed for one semop() call (32)
              (on Linux, this limit can be read and  modified  via  the  third
              field of /proc/sys/kernel/sem).

I'm closing this as NOTABUG, as it seems clear to me that this is not a libaio
problem.

Cheers,

Jeff

Comment 6 Jeremy West 2007-11-08 16:29:45 UTC
Re-opening this bug so as to determine why the EAGAIN error is occurring.  If
this is not a libaio issue, then what is it?  

1.  What does Error: 7: Argument list too long mean?
2.  Why do we get the message "waiting for async I/Os failed"

--jwest

Comment 7 Chuck Mead 2007-11-08 18:11:46 UTC
I am sending a sysreport for one of the two hosts we're concerned with directly
to Jeremy. I cannot attach it as bugzilla says it is too big.

Comment 8 Jeff Moyer 2007-11-08 18:16:24 UTC
(In reply to comment #6)
> Re-opening this bug so as to determine why the EAGAIN error is occurring.  If
> this is not a libaio issue, then what is it?  
> 
> 1.  What does Error: 7: Argument list too long mean?

I already answered this in comment #5.

> 2.  Why do we get the message "waiting for async I/Os failed"

I cannot answer this as I don't have the source code for the software that
prints this error message.

Feel free to leave the bug open to track this, but I cannot be of further
assistance until you can show me that the libaio or the kernel aio subsystem is
responsible for the errors.

Comment 9 Jeremy West 2007-11-09 16:28:09 UTC
Chuck,

If you can tell me where to pull that sysreport from, I can grab it.  Jeff, I
appreciate your patience and assistance.

--jwest

Comment 10 Chuck Mead 2007-11-09 16:34:53 UTC
Doggone it... I sent it yesterday email direct from my Bloomberg account but it
appears I transposed the email address (dyslexic's UNTIE!). In any event it's
now been sent to you from my on site email account.

Comment 11 Jeff Moyer 2007-11-26 20:16:29 UTC
Please keep this bug in NEEDINFO state until you can provide information
implicating libaio or the kernel's AIO subsystem.

Thanks.

Comment 12 Jeff Moyer 2008-01-02 16:22:37 UTC
Is there any progress on this issue?

Comment 14 Jeremy West 2008-01-02 18:09:36 UTC
Jeff,

Can we leave this open a little longer?  The customer in this situation has been
given an aio stress test that runs outside of the oracle processes.  We're
waiting for those results.

Thanks
Jeremy West

Comment 15 Jeff Moyer 2008-01-02 19:40:13 UTC
I'm not convinced that aio-stress will reproduce the problem the customer is
experiencing.  In my opinion, it would be a better use of time to get more
debugging output from the application that produces the problem.