Bug 161362 - Oracle Hangs with directio and aio using NFS
Oracle Hangs with directio and aio using NFS
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
: 109096 169763 170271 (view as bug list)
Depends On:
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-06-22 13:31 EDT by Joseph Salisbury
Modified: 2007-11-30 17:07 EST (History)
10 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 14:10:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Purposed Patch (35.62 KB, patch)
2005-10-14 07:13 EDT, Steve Dickson
no flags Details | Diff
Supplemental Patch (333 bytes, patch)
2005-10-14 07:16 EDT, Steve Dickson
no flags Details | Diff
Patch proposed by Netapps (32.77 KB, patch)
2005-10-25 13:50 EDT, Steve Dickson
no flags Details | Diff
New RHEL4 patch (32.25 KB, patch)
2005-11-11 16:11 EST, Steve Dickson
no flags Details | Diff
ora10g_nfs_aiodio_hang.txt (52.99 KB, text/plain)
2005-11-21 11:39 EST, John Shakshober
no flags Details

  None (edit)
Description Joseph Salisbury 2005-06-22 13:31:41 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
Oracle cannot open datafiles on and NFS device using aio and directio.  Oracle will hang and never start.  Kernel version is: 2.6.9-11.ELsmp.  Hang occurs on NFS client.  NFS server is x86_64 2.6.9-11.ELsmp.  Similar to problem reported in bugzilla 154055.  The problem does not occur if only directio or only asyncio are uses.  However, using both at the same time causes the Oracle hang.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Enable directio and asyncio in Oracle init.ora file: filesystemio_options=setall

2. Ensure Oracle datafiles are on a NFS device.  

3. Try to start Oracle database:
sqlplus /nolog
conn / as sysdba
startup

Hang now occurs.
  

Actual Results:  Oracle Hangs during startup.

Expected Results:  Oracle starts without issue.

Additional info:
Comment 2 Jeffrey Moyer 2005-08-01 13:06:30 EDT
nfs_file_direct_read(struct kiocb *iocb, char __user *buf, size_t count, loff_t pos)
{
        ssize_t retval = -EINVAL;
        ...

        if (!is_sync_kiocb(iocb))
                goto out;
        ...
out:
        return retval;
}

Clearly the NFS support for async direct I/O is not there.  It is also not
available in the upstream kernels.

I'm reassigning this one to SteveD.  Steve, do you have any insight, here?  Is
this something that's on the roadmap?
Comment 6 Steve Dickson 2005-10-13 10:26:56 EDT
I'm going to close this as NOTABUG since there
will be no AIO support added to the RHEL3 NFS
client.
Comment 9 Steve Dickson 2005-10-14 07:10:41 EDT
Oops.. this is a RHEL4 bug, that was on the RHEL3 blocker list.
Moving to the correct Blocker list and reassigning to myself
Comment 10 Steve Dickson 2005-10-14 07:13:49 EDT
Created attachment 119967 [details]
Purposed Patch

Here's chuck's patch to get NFS aio + dio. He's got the full one
against mainline at
http://troy.citi.umich.edu/~cel/linux-2.6/2.6.13/release-notes.html

The tweaked for Rhel4u2 patch is attached.

Thanks!

Greg
Comment 11 Steve Dickson 2005-10-14 07:16:35 EDT
Created attachment 119968 [details]
Supplemental Patch
Comment 12 Larry Woodman 2005-10-17 11:35:47 EDT
*** Bug 169763 has been marked as a duplicate of this bug. ***
Comment 15 Steve Dickson 2005-10-20 16:15:13 EDT
*** Bug 170271 has been marked as a duplicate of this bug. ***
Comment 16 Steve Dickson 2005-10-25 13:50:25 EDT
Created attachment 120376 [details]
Patch proposed by Netapps
Comment 17 Steve Dickson 2005-10-25 13:57:25 EDT
Chuck,

Is the patch in Comment #16 the correct/latest one?
Comment 18 Chuck Lever 2005-10-25 14:23:30 EDT
i didn't diff it, but attachment 120376 [details] looks like it has all the most recent
changes.

this is against 2.6.9-22.EL, but there's a bugfix patch in the 22.EL spec file
that interferes with this patch when i put my patch where it belongs (around
patch 1228, in the NFS section).  so mine applies at the end of the patches
listed in the 22.EL spec file.
Comment 19 Steve Dickson 2005-11-08 14:56:12 EST
Should I used to gets this patch? 
Comment 20 Steve Dickson 2005-11-09 05:48:07 EST
Translation: What should I used to gets this patch tested?
Comment 21 Steve Dickson 2005-11-09 05:51:34 EST
*** Bug 109096 has been marked as a duplicate of this bug. ***
Comment 22 Chuck Lever 2005-11-09 15:08:09 EST
what do you use to test NFS direct I/O now?  that should be a start, as we don't
want any regressions.

then i think Van or Deepak should pass along their OraSim and Oracle
configurations for using aio + dio on NFS files.
Comment 24 Steve Dickson 2005-11-11 16:11:32 EST
Created attachment 120965 [details]
New RHEL4 patch

Through code a review a problem was found with
how the error code unwound itself....

Here is an interdiff of the changes...

diff -u linux-2.6.9/fs/nfs/inode.c linux-2.6.9/fs/nfs/inode.c
--- linux-2.6.9/fs/nfs/inode.c	2005-11-10 15:21:44.376056000 -0500
+++ linux-2.6.9/fs/nfs/inode.c	2005-11-11 15:57:11.718132000 -0500
@@ -1976,11 +1976,11 @@
 #ifdef CONFIG_PROC_FS
	rpc_proc_unregister("nfs");
 #endif
-	nfs_destroy_writepagecache();
 #ifdef CONFIG_NFS_DIRECTIO
-out0:
	nfs_destroy_directcache();
+out0:
 #endif
+	nfs_destroy_writepagecache();
 out1:
	nfs_destroy_readpagecache();
 out2:
Comment 25 Chuck Lever 2005-11-11 16:58:14 EST
confirmed, this is a bug.  i've created a patch in my series to correct it in
mainline.

however, this doesn't seem to the result of any of the aio+dio patches i have --
it looks like the problem was in an earlier direct I/O patch.
Comment 27 Steve Dickson 2005-11-14 12:00:00 EST
In http://people.redhat.com/steved/bz161362 are kernel
rpms, along with a src rpm, that contain the patch from
Comment #24.

I've done some basic regression and DIO testing but the
AIO-izms of the patch still need to be tested. Internally we
will be attempting to test AIO part of this patch, but any
help would be appreciated....
Comment 28 Chuck Lever 2005-11-14 12:11:24 EST
i've used this in the past:

  iozone -I -k 8 -a

where "8" is any arbitrary integer between 1 and 16 ;^)

one thing you need to be careful about is that iozone must generate the I/O via
io_submit(), not emulate it using user-level threads.  this works correctly on
my RHEL 3.0 system running 2.6 test kernels, but i've found my RHEL 4 system
runs iozone in emulation mode (could be because iozone was built a very long
while ago and calls for the wrong shared libraries).
Comment 31 John Shakshober 2005-11-21 09:51:57 EST
aio_stress passes, but oracle tpc-c continues to HANG at startup. 

[root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64  -t 1  /oraclenfs/t1
file size 10MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
write on /oraclenfs/t1 (490.94 MB/s) 10.00 MB in 0.02s
thread 0 write totals (10.25 MB/s) 10.00 MB in 0.98s
read on /oraclenfs/t1 (581.60 MB/s) 10.00 MB in 0.02s
thread 0 read totals (579.24 MB/s) 10.00 MB in 0.02s
random write on /oraclenfs/t1 (385.15 MB/s) 10.00 MB in 0.03s
thread 0 random write totals (17.47 MB/s) 10.00 MB in 0.57s
random read on /oraclenfs/t1 (592.14 MB/s) 10.00 MB in 0.02s
thread 0 random read totals (589.48 MB/s) 10.00 MB in 0.02s
[root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64  -t 1  /oraclenfs/t1
file size 10MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
write on /oraclenfs/t1 (466.77 MB/s) 10.00 MB in 0.02s
thread 0 write totals (10.24 MB/s) 10.00 MB in 0.98s
read on /oraclenfs/t1 (579.61 MB/s) 10.00 MB in 0.02s
thread 0 read totals (577.10 MB/s) 10.00 MB in 0.02s
random write on /oraclenfs/t1 (385.86 MB/s) 10.00 MB in 0.03s
thread 0 random write totals (17.44 MB/s) 10.00 MB in 0.57s
random read on /oraclenfs/t1 (592.94 MB/s) 10.00 MB in 0.02s
thread 0 random read totals (590.56 MB/s) 10.00 MB in 0.02s

Direct IO starts the database
filesystemio_options=directio #Needed for filesystem directio

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.
Database opened.
SQL>

filesystemio_options=setall #Needed for filesystem directio and aio
-bash-3.00$ sqlplus / as sysdba

SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 10:01:48 2005

Copyright (c) 1982, 2004, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.

HANGS HERE never returns to SQLPLUS prompt ... strace is attached as
ora10g_nfs_aiodio_bug.txt

oracle   32383     1  0 09:59 ?        00:00:00 ora_pmon_tpcc
oracle   32385     1  0 09:59 ?        00:00:00 ora_mman_tpcc
oracle   32387     1  0 09:59 ?        00:00:00 ora_dbw0_tpcc
oracle   32389     1  0 09:59 ?        00:00:00 ora_lgwr_tpcc
oracle   32391     1  0 09:59 ?        00:00:00 ora_ckpt_tpcc
oracle   32393     1  0 09:59 ?        00:00:00 ora_smon_tpcc
oracle   32395     1  0 09:59 ?        00:00:00 ora_reco_tpcc
oracle   32399     1  0 09:59 ?        00:00:00 oracletpcc
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Comment 32 John Shakshober 2005-11-21 11:39:22 EST
Created attachment 121301 [details]
ora10g_nfs_aiodio_hang.txt
Comment 33 John Shakshober 2005-11-21 14:46:06 EST
User Error - We needed Steve D's kernel on the client, not just the Server

Indeed Oracle over NFS works with AIO+DIO... perf results later.

-bash-3.00$ uname -a
Linux perf2.lab.boston.redhat.com 2.6.9-22.23.EL.stevedsmp #1 SMP Mon Nov 21
06:43:23 EST 2005 i686 i686 i386 GNU/Linux
-bash-3.00$ pwd
/oracle
-bash-3.00$ cd $ORACLE_HOME/dba
-bash: cd: /oracle/libaio/dba: No such file or directory
-bash-3.00$ cd $ORACLE_HOME/dbs
-bash-3.00$ !sql
sqlplus / as sysdba

SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 14:54:22 2005

Copyright (c) 1982, 2004, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.

Database opened.
Comment 34 John Shakshober 2005-11-21 16:28:02 EST
We have run 4 - 30 minute tpc-C runs against the database with NFS mounted with;

filesystemio_options=setall #Needed for filesystem directio and aio

Options - no problems found.
Comment 35 Linda Wang 2005-11-23 12:16:12 EST
committed in -22.25
Comment 37 John Shakshober 2005-11-30 20:17:14 EST
If you are running Oracle 9i, you need to relink the oracle binary 
by turning on async I/O lst using

make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk async_on

But for Oracle 10G, filesystemio_options=SETALL in the init.ora should be all 
that is needed.

 
Comment 41 Andrius Benokraitis 2006-01-09 10:21:52 EST
Action for NetApp: Please test and provide feedback ASAP.
Comment 42 Andrius Benokraitis 2006-02-02 10:30:14 EST
Chuck Lever (netapp) stated, "i've asked Sanjay Gulabani to look at testing
this.  However, Oracle has told me they are satisfied with this change." [NetApp
closes issue.]
Comment 43 Van Okamura 2006-02-02 13:20:16 EST
Testing on RHEL4 U3 beta shows that aio+dio over nfs looks ok:

aio+dio over nfs seems to be working properly on x86 (smp and hugemem) 
with 10.2.0.1

In some of the stress tests aio-nr reaches
hugemem - 2022K (2000 users)
smp     - 1308K (1000 users)
aio-max-nr being 3200K for both

Below are some statistics for tests done over NFS.
For the sake of homogeneity -
1. parameters:
   sga=1G,db_size=72G and db_block_size=8192
   time and number of connections have been varied as per the memory.

2. mount options:
   rw,rsize=32768,wsize=32768,hard,nointr,bg,nfsvers=3,tcp,timeo=600


I. tpm numbers obtained for U3 on NFS
---------------------------------------------------------------

1. smp (6G Memory)

   time=4hr users=1000

   a) aio + dio - 4968
   b) aio alone - 2774
   c) dio alone - 4389

2. hugemem (16G memory)

   time=4hr users=2000

   a) aio + dio - 4314
   b) aio alone - 4362
   c) dio alone - 4856

Comments:
In case of smp, aio alone gives less tpm.
In case of hugemem, just a slight difference in tpms.

---------------------------------------------------------------


II. tpm numbers for U2/U3 comparison on smp for aio, dio on nfs
---------------------------------------------------------------

time=1hr users=1000

1. U3

   a) aio alone - 5184
   b) dio alone - 4839

2. U2

   a) aio alone - 4672
   b) dio alone - 4758

There seems to be a performance improvement in U3
Comment 44 Van Okamura 2006-02-02 13:27:25 EST
Testing on x86_64 looks ok as well.  More updates from Anurag:

Further to my updates on aio+dio over nfs, here are some updates 
on x86_64 (single instance).  On x86_64 also the combination seems 
to be working fine.

Some tests have been done over nfs:
1. parameters:
   sga=1G, db_size=72G, db_block_size=8192, users=700, time=4hrs
2. mount options:
   (same as below)

tpm numbers obtained for U3 on nfs
-----------------------------------------
1. smp kernel (8G memory)

   a) aio + dio - 4094
   b) aio alone - 4118
   c) dio alone - 3998

tpm looks more or less similar.
==========
Note: these tests aren't really trying to tune for max performance.  They are
mainly stressing the system to see if there are problems or huge performance issues.
Comment 46 Red Hat Bugzilla 2006-03-07 14:10:28 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.