Bug 161362 - Oracle Hangs with directio and aio using NFS
Summary: Oracle Hangs with directio and aio using NFS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
: 109096 169763 170271 (view as bug list)
Depends On:
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-06-22 17:31 UTC by Joseph Salisbury
Modified: 2007-11-30 22:07 UTC (History)
10 users (show)

Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 19:10:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Purposed Patch (35.62 KB, patch)
2005-10-14 11:13 UTC, Steve Dickson
no flags Details | Diff
Supplemental Patch (333 bytes, patch)
2005-10-14 11:16 UTC, Steve Dickson
no flags Details | Diff
Patch proposed by Netapps (32.77 KB, patch)
2005-10-25 17:50 UTC, Steve Dickson
no flags Details | Diff
New RHEL4 patch (32.25 KB, patch)
2005-11-11 21:11 UTC, Steve Dickson
no flags Details | Diff
ora10g_nfs_aiodio_hang.txt (52.99 KB, text/plain)
2005-11-21 16:39 UTC, John Shakshober
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:808 0 normal SHIPPED_LIVE Important: kernel security update 2005-10-27 04:00:00 UTC
Red Hat Product Errata RHSA-2006:0132 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3 2006-03-09 16:31:00 UTC

Description Joseph Salisbury 2005-06-22 17:31:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

Description of problem:
Oracle cannot open datafiles on and NFS device using aio and directio.  Oracle will hang and never start.  Kernel version is: 2.6.9-11.ELsmp.  Hang occurs on NFS client.  NFS server is x86_64 2.6.9-11.ELsmp.  Similar to problem reported in bugzilla 154055.  The problem does not occur if only directio or only asyncio are uses.  However, using both at the same time causes the Oracle hang.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Enable directio and asyncio in Oracle init.ora file: filesystemio_options=setall

2. Ensure Oracle datafiles are on a NFS device.  

3. Try to start Oracle database:
sqlplus /nolog
conn / as sysdba
startup

Hang now occurs.
  

Actual Results:  Oracle Hangs during startup.

Expected Results:  Oracle starts without issue.

Additional info:

Comment 2 Jeff Moyer 2005-08-01 17:06:30 UTC
nfs_file_direct_read(struct kiocb *iocb, char __user *buf, size_t count, loff_t pos)
{
        ssize_t retval = -EINVAL;
        ...

        if (!is_sync_kiocb(iocb))
                goto out;
        ...
out:
        return retval;
}

Clearly the NFS support for async direct I/O is not there.  It is also not
available in the upstream kernels.

I'm reassigning this one to SteveD.  Steve, do you have any insight, here?  Is
this something that's on the roadmap?

Comment 6 Steve Dickson 2005-10-13 14:26:56 UTC
I'm going to close this as NOTABUG since there
will be no AIO support added to the RHEL3 NFS
client.

Comment 9 Steve Dickson 2005-10-14 11:10:41 UTC
Oops.. this is a RHEL4 bug, that was on the RHEL3 blocker list.
Moving to the correct Blocker list and reassigning to myself

Comment 10 Steve Dickson 2005-10-14 11:13:49 UTC
Created attachment 119967 [details]
Purposed Patch

Here's chuck's patch to get NFS aio + dio. He's got the full one
against mainline at
http://troy.citi.umich.edu/~cel/linux-2.6/2.6.13/release-notes.html

The tweaked for Rhel4u2 patch is attached.

Thanks!

Greg

Comment 11 Steve Dickson 2005-10-14 11:16:35 UTC
Created attachment 119968 [details]
Supplemental Patch

Comment 12 Larry Woodman 2005-10-17 15:35:47 UTC
*** Bug 169763 has been marked as a duplicate of this bug. ***

Comment 15 Steve Dickson 2005-10-20 20:15:13 UTC
*** Bug 170271 has been marked as a duplicate of this bug. ***

Comment 16 Steve Dickson 2005-10-25 17:50:25 UTC
Created attachment 120376 [details]
Patch proposed by Netapps

Comment 17 Steve Dickson 2005-10-25 17:57:25 UTC
Chuck,

Is the patch in Comment #16 the correct/latest one?

Comment 18 Chuck Lever 2005-10-25 18:23:30 UTC
i didn't diff it, but attachment 120376 [details] looks like it has all the most recent
changes.

this is against 2.6.9-22.EL, but there's a bugfix patch in the 22.EL spec file
that interferes with this patch when i put my patch where it belongs (around
patch 1228, in the NFS section).  so mine applies at the end of the patches
listed in the 22.EL spec file.

Comment 19 Steve Dickson 2005-11-08 19:56:12 UTC
Should I used to gets this patch? 

Comment 20 Steve Dickson 2005-11-09 10:48:07 UTC
Translation: What should I used to gets this patch tested?

Comment 21 Steve Dickson 2005-11-09 10:51:34 UTC
*** Bug 109096 has been marked as a duplicate of this bug. ***

Comment 22 Chuck Lever 2005-11-09 20:08:09 UTC
what do you use to test NFS direct I/O now?  that should be a start, as we don't
want any regressions.

then i think Van or Deepak should pass along their OraSim and Oracle
configurations for using aio + dio on NFS files.

Comment 24 Steve Dickson 2005-11-11 21:11:32 UTC
Created attachment 120965 [details]
New RHEL4 patch

Through code a review a problem was found with
how the error code unwound itself....

Here is an interdiff of the changes...

diff -u linux-2.6.9/fs/nfs/inode.c linux-2.6.9/fs/nfs/inode.c
--- linux-2.6.9/fs/nfs/inode.c	2005-11-10 15:21:44.376056000 -0500
+++ linux-2.6.9/fs/nfs/inode.c	2005-11-11 15:57:11.718132000 -0500
@@ -1976,11 +1976,11 @@
 #ifdef CONFIG_PROC_FS
	rpc_proc_unregister("nfs");
 #endif
-	nfs_destroy_writepagecache();
 #ifdef CONFIG_NFS_DIRECTIO
-out0:
	nfs_destroy_directcache();
+out0:
 #endif
+	nfs_destroy_writepagecache();
 out1:
	nfs_destroy_readpagecache();
 out2:

Comment 25 Chuck Lever 2005-11-11 21:58:14 UTC
confirmed, this is a bug.  i've created a patch in my series to correct it in
mainline.

however, this doesn't seem to the result of any of the aio+dio patches i have --
it looks like the problem was in an earlier direct I/O patch.

Comment 27 Steve Dickson 2005-11-14 17:00:00 UTC
In http://people.redhat.com/steved/bz161362 are kernel
rpms, along with a src rpm, that contain the patch from
Comment #24.

I've done some basic regression and DIO testing but the
AIO-izms of the patch still need to be tested. Internally we
will be attempting to test AIO part of this patch, but any
help would be appreciated....

Comment 28 Chuck Lever 2005-11-14 17:11:24 UTC
i've used this in the past:

  iozone -I -k 8 -a

where "8" is any arbitrary integer between 1 and 16 ;^)

one thing you need to be careful about is that iozone must generate the I/O via
io_submit(), not emulate it using user-level threads.  this works correctly on
my RHEL 3.0 system running 2.6 test kernels, but i've found my RHEL 4 system
runs iozone in emulation mode (could be because iozone was built a very long
while ago and calls for the wrong shared libraries).

Comment 31 John Shakshober 2005-11-21 14:51:57 UTC
aio_stress passes, but oracle tpc-c continues to HANG at startup. 

[root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64  -t 1  /oraclenfs/t1
file size 10MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
write on /oraclenfs/t1 (490.94 MB/s) 10.00 MB in 0.02s
thread 0 write totals (10.25 MB/s) 10.00 MB in 0.98s
read on /oraclenfs/t1 (581.60 MB/s) 10.00 MB in 0.02s
thread 0 read totals (579.24 MB/s) 10.00 MB in 0.02s
random write on /oraclenfs/t1 (385.15 MB/s) 10.00 MB in 0.03s
thread 0 random write totals (17.47 MB/s) 10.00 MB in 0.57s
random read on /oraclenfs/t1 (592.14 MB/s) 10.00 MB in 0.02s
thread 0 random read totals (589.48 MB/s) 10.00 MB in 0.02s
[root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64  -t 1  /oraclenfs/t1
file size 10MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
write on /oraclenfs/t1 (466.77 MB/s) 10.00 MB in 0.02s
thread 0 write totals (10.24 MB/s) 10.00 MB in 0.98s
read on /oraclenfs/t1 (579.61 MB/s) 10.00 MB in 0.02s
thread 0 read totals (577.10 MB/s) 10.00 MB in 0.02s
random write on /oraclenfs/t1 (385.86 MB/s) 10.00 MB in 0.03s
thread 0 random write totals (17.44 MB/s) 10.00 MB in 0.57s
random read on /oraclenfs/t1 (592.94 MB/s) 10.00 MB in 0.02s
thread 0 random read totals (590.56 MB/s) 10.00 MB in 0.02s

Direct IO starts the database
filesystemio_options=directio #Needed for filesystem directio

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.
Database opened.
SQL>

filesystemio_options=setall #Needed for filesystem directio and aio
-bash-3.00$ sqlplus / as sysdba

SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 10:01:48 2005

Copyright (c) 1982, 2004, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.

HANGS HERE never returns to SQLPLUS prompt ... strace is attached as
ora10g_nfs_aiodio_bug.txt

oracle   32383     1  0 09:59 ?        00:00:00 ora_pmon_tpcc
oracle   32385     1  0 09:59 ?        00:00:00 ora_mman_tpcc
oracle   32387     1  0 09:59 ?        00:00:00 ora_dbw0_tpcc
oracle   32389     1  0 09:59 ?        00:00:00 ora_lgwr_tpcc
oracle   32391     1  0 09:59 ?        00:00:00 ora_ckpt_tpcc
oracle   32393     1  0 09:59 ?        00:00:00 ora_smon_tpcc
oracle   32395     1  0 09:59 ?        00:00:00 ora_reco_tpcc
oracle   32399     1  0 09:59 ?        00:00:00 oracletpcc
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))



Comment 32 John Shakshober 2005-11-21 16:39:22 UTC
Created attachment 121301 [details]
ora10g_nfs_aiodio_hang.txt

Comment 33 John Shakshober 2005-11-21 19:46:06 UTC
User Error - We needed Steve D's kernel on the client, not just the Server

Indeed Oracle over NFS works with AIO+DIO... perf results later.

-bash-3.00$ uname -a
Linux perf2.lab.boston.redhat.com 2.6.9-22.23.EL.stevedsmp #1 SMP Mon Nov 21
06:43:23 EST 2005 i686 i686 i386 GNU/Linux
-bash-3.00$ pwd
/oracle
-bash-3.00$ cd $ORACLE_HOME/dba
-bash: cd: /oracle/libaio/dba: No such file or directory
-bash-3.00$ cd $ORACLE_HOME/dbs
-bash-3.00$ !sql
sqlplus / as sysdba

SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 14:54:22 2005

Copyright (c) 1982, 2004, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=inittpcc.ora;
ORACLE instance started.

Total System Global Area  914358272 bytes
Fixed Size                   781784 bytes
Variable Size             220467752 bytes
Database Buffers          692060160 bytes
Redo Buffers                1048576 bytes
Database mounted.

Database opened.


Comment 34 John Shakshober 2005-11-21 21:28:02 UTC
We have run 4 - 30 minute tpc-C runs against the database with NFS mounted with;

filesystemio_options=setall #Needed for filesystem directio and aio

Options - no problems found.

Comment 35 Linda Wang 2005-11-23 17:16:12 UTC
committed in -22.25

Comment 37 John Shakshober 2005-12-01 01:17:14 UTC
If you are running Oracle 9i, you need to relink the oracle binary 
by turning on async I/O lst using

make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk async_on

But for Oracle 10G, filesystemio_options=SETALL in the init.ora should be all 
that is needed.

 

Comment 41 Andrius Benokraitis 2006-01-09 15:21:52 UTC
Action for NetApp: Please test and provide feedback ASAP.

Comment 42 Andrius Benokraitis 2006-02-02 15:30:14 UTC
Chuck Lever (netapp) stated, "i've asked Sanjay Gulabani to look at testing
this.  However, Oracle has told me they are satisfied with this change." [NetApp
closes issue.]

Comment 43 Van Okamura 2006-02-02 18:20:16 UTC
Testing on RHEL4 U3 beta shows that aio+dio over nfs looks ok:

aio+dio over nfs seems to be working properly on x86 (smp and hugemem) 
with 10.2.0.1

In some of the stress tests aio-nr reaches
hugemem - 2022K (2000 users)
smp     - 1308K (1000 users)
aio-max-nr being 3200K for both

Below are some statistics for tests done over NFS.
For the sake of homogeneity -
1. parameters:
   sga=1G,db_size=72G and db_block_size=8192
   time and number of connections have been varied as per the memory.

2. mount options:
   rw,rsize=32768,wsize=32768,hard,nointr,bg,nfsvers=3,tcp,timeo=600


I. tpm numbers obtained for U3 on NFS
---------------------------------------------------------------

1. smp (6G Memory)

   time=4hr users=1000

   a) aio + dio - 4968
   b) aio alone - 2774
   c) dio alone - 4389

2. hugemem (16G memory)

   time=4hr users=2000

   a) aio + dio - 4314
   b) aio alone - 4362
   c) dio alone - 4856

Comments:
In case of smp, aio alone gives less tpm.
In case of hugemem, just a slight difference in tpms.

---------------------------------------------------------------


II. tpm numbers for U2/U3 comparison on smp for aio, dio on nfs
---------------------------------------------------------------

time=1hr users=1000

1. U3

   a) aio alone - 5184
   b) dio alone - 4839

2. U2

   a) aio alone - 4672
   b) dio alone - 4758

There seems to be a performance improvement in U3

Comment 44 Van Okamura 2006-02-02 18:27:25 UTC
Testing on x86_64 looks ok as well.  More updates from Anurag:

Further to my updates on aio+dio over nfs, here are some updates 
on x86_64 (single instance).  On x86_64 also the combination seems 
to be working fine.

Some tests have been done over nfs:
1. parameters:
   sga=1G, db_size=72G, db_block_size=8192, users=700, time=4hrs
2. mount options:
   (same as below)

tpm numbers obtained for U3 on nfs
-----------------------------------------
1. smp kernel (8G memory)

   a) aio + dio - 4094
   b) aio alone - 4118
   c) dio alone - 3998

tpm looks more or less similar.
==========
Note: these tests aren't really trying to tune for max performance.  They are
mainly stressing the system to see if there are problems or huge performance issues.

Comment 46 Red Hat Bugzilla 2006-03-07 19:10:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html



Note You need to log in before you can comment on or make changes to this bug.