From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: Oracle cannot open datafiles on and NFS device using aio and directio. Oracle will hang and never start. Kernel version is: 2.6.9-11.ELsmp. Hang occurs on NFS client. NFS server is x86_64 2.6.9-11.ELsmp. Similar to problem reported in bugzilla 154055. The problem does not occur if only directio or only asyncio are uses. However, using both at the same time causes the Oracle hang. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Enable directio and asyncio in Oracle init.ora file: filesystemio_options=setall 2. Ensure Oracle datafiles are on a NFS device. 3. Try to start Oracle database: sqlplus /nolog conn / as sysdba startup Hang now occurs. Actual Results: Oracle Hangs during startup. Expected Results: Oracle starts without issue. Additional info:
nfs_file_direct_read(struct kiocb *iocb, char __user *buf, size_t count, loff_t pos) { ssize_t retval = -EINVAL; ... if (!is_sync_kiocb(iocb)) goto out; ... out: return retval; } Clearly the NFS support for async direct I/O is not there. It is also not available in the upstream kernels. I'm reassigning this one to SteveD. Steve, do you have any insight, here? Is this something that's on the roadmap?
I'm going to close this as NOTABUG since there will be no AIO support added to the RHEL3 NFS client.
Oops.. this is a RHEL4 bug, that was on the RHEL3 blocker list. Moving to the correct Blocker list and reassigning to myself
Created attachment 119967 [details] Purposed Patch Here's chuck's patch to get NFS aio + dio. He's got the full one against mainline at http://troy.citi.umich.edu/~cel/linux-2.6/2.6.13/release-notes.html The tweaked for Rhel4u2 patch is attached. Thanks! Greg
Created attachment 119968 [details] Supplemental Patch
*** Bug 169763 has been marked as a duplicate of this bug. ***
*** Bug 170271 has been marked as a duplicate of this bug. ***
Created attachment 120376 [details] Patch proposed by Netapps
Chuck, Is the patch in Comment #16 the correct/latest one?
i didn't diff it, but attachment 120376 [details] looks like it has all the most recent changes. this is against 2.6.9-22.EL, but there's a bugfix patch in the 22.EL spec file that interferes with this patch when i put my patch where it belongs (around patch 1228, in the NFS section). so mine applies at the end of the patches listed in the 22.EL spec file.
Should I used to gets this patch?
Translation: What should I used to gets this patch tested?
*** Bug 109096 has been marked as a duplicate of this bug. ***
what do you use to test NFS direct I/O now? that should be a start, as we don't want any regressions. then i think Van or Deepak should pass along their OraSim and Oracle configurations for using aio + dio on NFS files.
Created attachment 120965 [details] New RHEL4 patch Through code a review a problem was found with how the error code unwound itself.... Here is an interdiff of the changes... diff -u linux-2.6.9/fs/nfs/inode.c linux-2.6.9/fs/nfs/inode.c --- linux-2.6.9/fs/nfs/inode.c 2005-11-10 15:21:44.376056000 -0500 +++ linux-2.6.9/fs/nfs/inode.c 2005-11-11 15:57:11.718132000 -0500 @@ -1976,11 +1976,11 @@ #ifdef CONFIG_PROC_FS rpc_proc_unregister("nfs"); #endif - nfs_destroy_writepagecache(); #ifdef CONFIG_NFS_DIRECTIO -out0: nfs_destroy_directcache(); +out0: #endif + nfs_destroy_writepagecache(); out1: nfs_destroy_readpagecache(); out2:
confirmed, this is a bug. i've created a patch in my series to correct it in mainline. however, this doesn't seem to the result of any of the aio+dio patches i have -- it looks like the problem was in an earlier direct I/O patch.
In http://people.redhat.com/steved/bz161362 are kernel rpms, along with a src rpm, that contain the patch from Comment #24. I've done some basic regression and DIO testing but the AIO-izms of the patch still need to be tested. Internally we will be attempting to test AIO part of this patch, but any help would be appreciated....
i've used this in the past: iozone -I -k 8 -a where "8" is any arbitrary integer between 1 and 16 ;^) one thing you need to be careful about is that iozone must generate the I/O via io_submit(), not emulate it using user-level threads. this works correctly on my RHEL 3.0 system running 2.6 test kernels, but i've found my RHEL 4 system runs iozone in emulation mode (could be because iozone was built a very long while ago and calls for the wrong shared libraries).
aio_stress passes, but oracle tpc-c continues to HANG at startup. [root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64 -t 1 /oraclenfs/t1 file size 10MB, record size 64KB, depth 64, ios per iteration 8 max io_submit 8, buffer alignment set to 4KB threads 1 files 1 contexts 1 context offset 2MB verification off adding file /oraclenfs/t1 thread 0 write on /oraclenfs/t1 (490.94 MB/s) 10.00 MB in 0.02s thread 0 write totals (10.25 MB/s) 10.00 MB in 0.98s read on /oraclenfs/t1 (581.60 MB/s) 10.00 MB in 0.02s thread 0 read totals (579.24 MB/s) 10.00 MB in 0.02s random write on /oraclenfs/t1 (385.15 MB/s) 10.00 MB in 0.03s thread 0 random write totals (17.47 MB/s) 10.00 MB in 0.57s random read on /oraclenfs/t1 (592.14 MB/s) 10.00 MB in 0.02s thread 0 random read totals (589.48 MB/s) 10.00 MB in 0.02s [root@perf2 aio]# ./aio-stress -s 10m -r 64 -d 64 -t 1 /oraclenfs/t1 file size 10MB, record size 64KB, depth 64, ios per iteration 8 max io_submit 8, buffer alignment set to 4KB threads 1 files 1 contexts 1 context offset 2MB verification off adding file /oraclenfs/t1 thread 0 write on /oraclenfs/t1 (466.77 MB/s) 10.00 MB in 0.02s thread 0 write totals (10.24 MB/s) 10.00 MB in 0.98s read on /oraclenfs/t1 (579.61 MB/s) 10.00 MB in 0.02s thread 0 read totals (577.10 MB/s) 10.00 MB in 0.02s random write on /oraclenfs/t1 (385.86 MB/s) 10.00 MB in 0.03s thread 0 random write totals (17.44 MB/s) 10.00 MB in 0.57s random read on /oraclenfs/t1 (592.94 MB/s) 10.00 MB in 0.02s thread 0 random read totals (590.56 MB/s) 10.00 MB in 0.02s Direct IO starts the database filesystemio_options=directio #Needed for filesystem directio SQL> startup pfile=inittpcc.ora; ORACLE instance started. Total System Global Area 914358272 bytes Fixed Size 781784 bytes Variable Size 220467752 bytes Database Buffers 692060160 bytes Redo Buffers 1048576 bytes Database mounted. Database opened. SQL> filesystemio_options=setall #Needed for filesystem directio and aio -bash-3.00$ sqlplus / as sysdba SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 10:01:48 2005 Copyright (c) 1982, 2004, Oracle. All rights reserved. Connected to an idle instance. SQL> startup pfile=inittpcc.ora; ORACLE instance started. Total System Global Area 914358272 bytes Fixed Size 781784 bytes Variable Size 220467752 bytes Database Buffers 692060160 bytes Redo Buffers 1048576 bytes Database mounted. HANGS HERE never returns to SQLPLUS prompt ... strace is attached as ora10g_nfs_aiodio_bug.txt oracle 32383 1 0 09:59 ? 00:00:00 ora_pmon_tpcc oracle 32385 1 0 09:59 ? 00:00:00 ora_mman_tpcc oracle 32387 1 0 09:59 ? 00:00:00 ora_dbw0_tpcc oracle 32389 1 0 09:59 ? 00:00:00 ora_lgwr_tpcc oracle 32391 1 0 09:59 ? 00:00:00 ora_ckpt_tpcc oracle 32393 1 0 09:59 ? 00:00:00 ora_smon_tpcc oracle 32395 1 0 09:59 ? 00:00:00 ora_reco_tpcc oracle 32399 1 0 09:59 ? 00:00:00 oracletpcc (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Created attachment 121301 [details] ora10g_nfs_aiodio_hang.txt
User Error - We needed Steve D's kernel on the client, not just the Server Indeed Oracle over NFS works with AIO+DIO... perf results later. -bash-3.00$ uname -a Linux perf2.lab.boston.redhat.com 2.6.9-22.23.EL.stevedsmp #1 SMP Mon Nov 21 06:43:23 EST 2005 i686 i686 i386 GNU/Linux -bash-3.00$ pwd /oracle -bash-3.00$ cd $ORACLE_HOME/dba -bash: cd: /oracle/libaio/dba: No such file or directory -bash-3.00$ cd $ORACLE_HOME/dbs -bash-3.00$ !sql sqlplus / as sysdba SQL*Plus: Release 10.1.0.3.0 - Production on Mon Nov 21 14:54:22 2005 Copyright (c) 1982, 2004, Oracle. All rights reserved. Connected to an idle instance. SQL> startup pfile=inittpcc.ora; ORACLE instance started. Total System Global Area 914358272 bytes Fixed Size 781784 bytes Variable Size 220467752 bytes Database Buffers 692060160 bytes Redo Buffers 1048576 bytes Database mounted. Database opened.
We have run 4 - 30 minute tpc-C runs against the database with NFS mounted with; filesystemio_options=setall #Needed for filesystem directio and aio Options - no problems found.
committed in -22.25
If you are running Oracle 9i, you need to relink the oracle binary by turning on async I/O lst using make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk async_on But for Oracle 10G, filesystemio_options=SETALL in the init.ora should be all that is needed.
Action for NetApp: Please test and provide feedback ASAP.
Chuck Lever (netapp) stated, "i've asked Sanjay Gulabani to look at testing this. However, Oracle has told me they are satisfied with this change." [NetApp closes issue.]
Testing on RHEL4 U3 beta shows that aio+dio over nfs looks ok: aio+dio over nfs seems to be working properly on x86 (smp and hugemem) with 10.2.0.1 In some of the stress tests aio-nr reaches hugemem - 2022K (2000 users) smp - 1308K (1000 users) aio-max-nr being 3200K for both Below are some statistics for tests done over NFS. For the sake of homogeneity - 1. parameters: sga=1G,db_size=72G and db_block_size=8192 time and number of connections have been varied as per the memory. 2. mount options: rw,rsize=32768,wsize=32768,hard,nointr,bg,nfsvers=3,tcp,timeo=600 I. tpm numbers obtained for U3 on NFS --------------------------------------------------------------- 1. smp (6G Memory) time=4hr users=1000 a) aio + dio - 4968 b) aio alone - 2774 c) dio alone - 4389 2. hugemem (16G memory) time=4hr users=2000 a) aio + dio - 4314 b) aio alone - 4362 c) dio alone - 4856 Comments: In case of smp, aio alone gives less tpm. In case of hugemem, just a slight difference in tpms. --------------------------------------------------------------- II. tpm numbers for U2/U3 comparison on smp for aio, dio on nfs --------------------------------------------------------------- time=1hr users=1000 1. U3 a) aio alone - 5184 b) dio alone - 4839 2. U2 a) aio alone - 4672 b) dio alone - 4758 There seems to be a performance improvement in U3
Testing on x86_64 looks ok as well. More updates from Anurag: Further to my updates on aio+dio over nfs, here are some updates on x86_64 (single instance). On x86_64 also the combination seems to be working fine. Some tests have been done over nfs: 1. parameters: sga=1G, db_size=72G, db_block_size=8192, users=700, time=4hrs 2. mount options: (same as below) tpm numbers obtained for U3 on nfs ----------------------------------------- 1. smp kernel (8G memory) a) aio + dio - 4094 b) aio alone - 4118 c) dio alone - 3998 tpm looks more or less similar. ========== Note: these tests aren't really trying to tune for max performance. They are mainly stressing the system to see if there are problems or huge performance issues.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html