Bug 160568 - aio to gfs filesystem fails with aio-stress and oracle10g tpcc
aio to gfs filesystem fails with aio-stress and oracle10g tpcc
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
x86_64 Linux
high Severity medium
: ---
: ---
Assigned To: Wendy Cheng
GFS Bugs
: FutureFeature
Depends On:
Blocks: 164915
  Show dependency treegraph
 
Reported: 2005-06-15 16:24 EDT by John Shakshober
Modified: 2010-01-11 22:05 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2006-0234
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-09 14:45:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Draft patch. (9.81 KB, patch)
2005-11-19 01:08 EST, Wendy Cheng
no flags Details | Diff
gfs_aio.patch.v1 (13.00 KB, patch)
2005-11-20 01:03 EST, Wendy Cheng
no flags Details | Diff
gfs_aio.patch.v2 (12.95 KB, patch)
2005-11-27 00:37 EST, Wendy Cheng
no flags Details | Diff

  None (edit)
Description John Shakshober 2005-06-15 16:24:59 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041215 Firefox/1.0 Red Hat/1.0-12.EL4

Description of problem:
Recently tested RHEL4 U1, with GFS 6.1 and am getting a process hang when attempting to use AIO and DIO with GFS mounted file systems.

strace show Oracle simply calling io_getevents every 30 seconds.  

Linux bigbaddell2.lab.boston.redhat.com 2.6.9-11.ELsmp #1 SMP Fri May 20 18:25:30 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux

We could then reproduce the problem using aio-stress to the gfs filesystem 




Version-Release number of selected component (if applicable):
 2.6.9-11.ELsmp, GFS-kernel-smp-2.6.9-35.4

How reproducible:
Always

Steps to Reproduce:
1. run aio-stress w/ -O (for O_direct option) to a gfs filesystem
2. or try to startup an Oracle database with dio and aio options.
3.
  

Actual Results:  

[root@bigbaddell2 aio]#  ./aio-stress -s 1024 -r 64 -t 1 /oraclegfs/t1
file size 1024MB, record size 64KB, depth 64, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
adding file /oraclenfs/t1 thread 0
ret -22 (Invalid argument) on io_submit
error -1 on run_built

Or you can see Oracle's hang ... Oracle should check the results value from io_submit .... instead they are just hanging on io_getevents.

ps -ef | grep ora

oracle   13977 13956  0 09:35 pts/3    00:00:00 sqlplus   as sysdba
oracle   13983     1  0 09:35 ?        00:00:00 ora_pmon_tpcc
oracle   13985     1  0 09:35 ?        00:00:00 ora_mman_tpcc
oracle   13987     1  0 09:35 ?        00:00:00 ora_dbw0_tpcc
oracle   13989     1  0 09:35 ?        00:00:00 ora_lgwr_tpcc
oracle   13991     1  0 09:35 ?        00:00:00 ora_ckpt_tpcc
oracle   13993     1  0 09:35 ?        00:00:00 ora_smon_tpcc
oracle   13995     1  0 09:35 ?        00:00:00 ora_reco_tpcc
oracle   13999     1  0 09:35 ?        00:00:00 oracletpcc (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     14010  4997  0 09:41 pts/2    00:00:00 ps -ef

[root@bigbaddell2 oracle]# strace -p 13999
Process 13999 attached - interrupt to quit
io_getevents(0x2a96b9d000, 0x1, 0x400, 0x7fbffeee70, 0x7fbfff6f50) = 0
io_getevents(0x2a96b9d000, 0x1, 0x400, 0x7fbffeee70, 0x7fbfff6f50 <unfinished ...>

Additional info:
Comment 1 Jeffrey Moyer 2005-06-15 16:35:23 EDT
From a brief look at the gfs source code, it seems they don't support AIO.  The
AIO subsystem is failing the submit operation here:

ssize_t aio_setup_iocb(struct kiocb *kiocb)
{
	struct file *file = kiocb->ki_filp;
	ssize_t ret = 0;

	switch (kiocb->ki_opcode) {
	case IOCB_CMD_PREAD:
	        ...
		ret = -EINVAL;
		if (file->f_op->aio_read)
			kiocb->ki_retry = aio_pread;
		break;
        ...
	if (!kiocb->ki_retry)
		return ret;
Comment 8 Wendy Cheng 2005-11-19 01:08:07 EST
Created attachment 121258 [details]
Draft patch. 

Draft patch - target completion date: Dec. 15, 2005.
Comment 9 Wendy Cheng 2005-11-20 01:03:15 EST
Created attachment 121272 [details]
gfs_aio.patch.v1

Draft version 1: this version successfully run thru the default setting of
aiocp.c test program (by Daniel McNeil daniel@osdl.org).
Comment 10 Wendy Cheng 2005-11-27 00:37:51 EST
Created attachment 121516 [details]
gfs_aio.patch.v2

Run successfully with various options of aiocp.c 
and aio-stress.c on one single node, except 
./aio-stress -S testfile (i.e. 1024MB data file 
with O_SYNC option) which eventually finishes but 
*very* slow. 

So two test items for next week:

1. figure out why O_SYNC is so slow
2. run test on >= 2 nodes (need to re-write the 
   test cases though).
Comment 17 Red Hat Bugzilla 2006-03-09 14:45:34 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html

Note You need to log in before you can comment on or make changes to this bug.