Bug 165687 - Async IO generates high iowait and poor performance in 8 way SMP config
Summary: Async IO generates high iowait and poor performance in 8 way SMP config
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 170417
TreeView+ depends on / blocked
 
Reported: 2005-08-11 12:07 UTC by Steve Russell
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-17 20:09:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Preliminary per-cpu aio thread implementation (4.12 KB, patch)
2005-08-31 15:31 UTC, Jeff Moyer
no flags Details | Diff

Description Steve Russell 2005-08-11 12:07:36 UTC
Description of problem:
Any async IO process will, under heavy IO load, spinlock while waiting for kvec 
function with kernel aio subsystem.  IOwait increases to 70-80% of total CPU 
time and performance of aio is worse than sync io.  During this test low memory 
is not constrained (Oracle SGA allocated to hugeTLB, so 16G config isn't nearly 
as tight as is typical on smp kernel).

Version-Release number of selected component (if applicable):
HP ProLiant DL740 G1     Linux RedHat_3     2.4.21-27.0.2.ELsmp
8x2998MHz (Intel(R)_Xe)     15.7Gb memory    4.9Tb filesystem (1.2Tb free)
EMC PowerPath 4.3.1-036
Qlogic 23xx 7.03 driver

How reproducible:
Any aio workload will create this, but the easiest way to reproduce is:
Oracle 9.2.0.6
async and directio enabled
2 dbwr processes
create 100GB tablespace

Actual results:
Time to complete is > than sync io.
IOwait 70-80% of cpu (basically 7 cpu's waiting for kernel process on 1 cpu)

Expected results:
aio should be faster
iowait should be minimal.

Comment 1 Jeff Moyer 2005-08-11 14:25:58 UTC
In RHEL 3, the O_DIRECT flag is not supported with AIO.  So, when doing AIO to
file systems, it is not uncommon to get worse performance than SYNC I/O.  One
reason for this is that the AIO code path does not do readahead.

It would be more useful to get information on how AIO performs on an 8-way
system as compared to a 4-way.  If the performance degrades when adding CPUs,
then we can flag this as a scalability issue.

Do you have profiling data for this?  Can you point to specific code paths or
locks that are under contention?

Comment 2 Steve Russell 2005-08-11 14:53:01 UTC
Hi,

the directio/asyncio config reference was for Oracle (which puts different 
processes in different modes, i.e. lgwr=direct and dbwr=async though it is 
unclear if dbwr is able to switch between modes).  While the general case 
you're stating is true (async reads after write in a o_direct model will be 
slower because the data is not in page cache) it should not affect this 
particular test case as Oracle tends not to go to read information if it is in 
pagecache (as its cached in SGA the disk io request never gets initiated).  
There are other more 'real world' work loads that tweak this behavior, but this 
seems to be the easiest to duplicate.  

I'll try to dig up the strace data we got, but the scalability comment was 
coming from RvR when we shared the trace with him several months ago.  Iain 
Grey just asked me to open this as a tracking bug so they could test in 
Westford. 

tx

Steve

Comment 3 Jeff Moyer 2005-08-11 15:23:39 UTC
I think you may have misinterpreted what I said above.

In the event that a process doing AIO specifies the O_DIRECT flag, the O_DIRECT
flag is ignored.  Therefore, all reads and writes go through the page cache,
incurring all of the extra overhead associated with that.  In the AIO code path
for file systems, we do not perform readahead, which is something that you would
get without AIO.

In our experience with RHEL 3, AIO on file systems is slower than SYNC I/O to
file systems.  I cannot speak to how the Oracle processes perform their I/O for
the above configuration (i.e. whether the dbwriter or logwriter does O_DIRECT +
aio).  I will look into this further to see if I can verify what you said above.


We currently have an 8-way system setup in our lab for testing.  I would like
clarification on one thing.  You mention that you created a 100GB table space. 
Is it the operation of creating the table space that took longer than the sync
io case?  So to reproduce, we need only create the table space?

Also, one last quick question.  Does "RvR" refer to Rik van Riel?

Thanks in advance.

Comment 4 John Shakshober 2005-08-11 16:07:45 UTC
FYI - we too have measured Oracle performance of TPC-C mini benchmarks 
on a 2-cpu/2-ht system when using aio+dio on rhel3 to ext3 fileystems;

http://people.redhat.com/dshaks/.perf/results_oracle/tpcc_ora10g_rhel4_u2.sxc

Thus we should NOT need the 8-cpu system to reproduce.  Our suggestion is to
use DIO but not AIO for RHEL3 Oracle EXT3 databases where the database size
exceeds the size of memory.  In the case above, we had a 36 GB database on our
4-cpu system with 4.0 GB of memory.


Comment 10 Steve Russell 2005-08-12 15:39:47 UTC
In a quick test of my 'create tablespace' suggestion, the results are:
aio+dio: 37:28.31
aio only: 22:10.02
dio only: 40:17.36
none: 25:11.49

which would seem to provide a contrary opinion....

Having said that, the block readahead features of rhel4 might turn this all on 
its head.

RvR is indeed Rik.  Creating the tablespace is sufficient to create the problem 
condition.  There are more realistic and complex ways of creating the 
situation, but that's a good start.  I'll try to create another case without 
Oracle.

Also, since it is relevant:
/proc/sys/fs/aio-max-size 2147483648

tx

Steve

Comment 11 Steve Russell 2005-08-12 15:50:10 UTC
though in this specific test, async did beat none and trounced o_dsync and thus 
is a slightly atypical result ;-).  


Comment 13 John Shakshober 2005-08-17 21:20:49 UTC
I rechecked the oracle image and found we needed to link it with the aio libs,

Once done, the data is roughly the same as Joe's earlier data;

 1) SIO (sync) performing best, 21m8s
 2) AIO (no dio) 2nd            26m5s
 3) DIO only (no aio) 3rd       27m10s

 * AIO+DIO (effectivelty shuts off DIO) same as AIO
IO is still happening at 70 MB/sec but for a 200 MB/sec fc, which can do 150-180
MB/sec aio
this is a hard performance hit.

Looks like on 8-cpu we are out of gas with keventd being only 1 process to
take care of aio request (see top output for cpu7 keventd pegged)

Also checkout the profile data if you want to see if we can speed up
memcpy_from_kvec_dst?

Shak

[root@xeon2 aio]# more orabuild_top.txt
95 processes: 93 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    2.3%    0.0%   11.7%   0.0%     0.6%    8.0%   77.2%
           cpu00    8.5%    0.0%    0.3%   0.0%     0.0%    0.0%   91.0%
           cpu01    0.0%    0.0%    0.0%   0.0%     0.0%    2.1%   97.8%
           cpu02    9.1%    0.0%    3.5%   0.0%     5.1%    0.0%   82.0%
           cpu03    0.7%    0.0%    6.5%   0.0%     0.0%   39.2%   53.3%
           cpu04    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
           cpu05    0.0%    0.0%    4.9%   0.0%     0.0%    0.0%   95.0%
           cpu06    0.1%    0.0%    0.7%   0.0%     0.0%    0.1%   98.8%
           cpu07    0.0%    0.0%   77.2%   0.0%     0.0%   22.7%    0.0%
Mem:  8205784k av, 8190420k used,   15364k free,       0k shrd,   15552k buff
                   6095448k actv, 1168128k in_d,  132980k in_c
Swap: 2040244k av,       0k used, 2040244k free                 7555784k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
   10 root      25   0     0    0     0 RW    9.6  0.0   4:09   7 keventd
30223 oracle    16   0 21584  14M  9068 S     2.3  0.1   1:19   2 oracle
   19 root      18   0     0    0     0 SW    1.0  0.0   4:56   5 kswapd
 3497 root      16   0     0    0     0 SW    0.7  0.0   4:04   3 kjournald
    1 root      15   0   508  508   444 S     0.0  0.0   0:13   0 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0

profile:

0000000000000000 total                                     45258   0.0263
0000000002109100 default_idle                              38813 485.1625
0000000002144c60 memcpy_from_kvec_dst                       3540   5.8224
0000000002155020 launder_page                                 160   0.0524
00000000022a4220 __br_write_lock                             136   2.1250
000000000214cc90 __iodesc_write_page                         131   0.1300
00000000021685a0 get_unused_buffer_head                      113   0.4708
0000000002148ce0 unlock_page                                  87   0.7768
0000000002126a23 .text.lock.sched                             82   0.1320
0000000002147bf0 __remove_inode_page                          71   0.4931
0000000002156680 rebalance_laundry_zone                       68   0.0264
00000000022a4260 __br_write_unlock                            66   1.7838
00000000021548ab .text.lock.swap                              66   0.5641
0000000002140780 follow_page                                  66   0.0897
0000000002159440 rmqueue                                      63   0.0729
Wed Aug 17 11:58:51 EDT 2005

procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
 2  3      0  19156 409088 7157528    0    0     1     9    5     5  0  0  0  4
 1  1      0  16444 409088 7160508    0    0     0 76010  882   523  4 15  6 75
 3  1      0  15508 409088 7161228    0    0     2 70134  837   489  3 15  6 76

Linux 2.4.21-34.ELhugemem (xeon2.lab.boston.redhat.com)         08/17/2005



avg-cpu:  %user   %nice    %sys %iowait   %idle
           3.45    0.00   15.46    5.52   75.57

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util

ida/c0d1     0.50   2.60  0.10  0.90    4.80   28.00     2.40    14.00    32.80
    0.07   73.00  71.00   7.10
ida/c0d1p1   0.50   2.60  0.10  0.90    4.80   28.00     2.40    14.00    32.80
    0.07   73.00  71.00   7.10
sdf          0.00 16401.20  0.00 1128.50    0.00 140240.00     0.00 70120.00  
124.27   248.67  220.67   0.50  55.90
sdf1         0.00 16401.20  0.00 1128.50    0.00 140240.00     0.00 70120.00  
124.27   248.67  220.67   0.50  55.90

Comment 14 Steve Russell 2005-08-23 20:24:54 UTC
Any update?

Comment 15 Jeff Moyer 2005-08-23 20:48:22 UTC
While the data does show a lot of CPU time taken up by keventd, it does not show
a decrease in I/O throughput for the system.

I would like to know precisely why this bugzilla has been filed.  The summary
mentions high I/O wait time during heavy I/O, and his is exactly what you get
under heavy I/O;  the O/S has to wait for I/Os to complete.  Now, what precisely
is meant by poor performance?

From further comments in the bug, it sounds like the real concern is that AIO
doesn't buy you much, if anything, under RHEL 3, when used with file systems. 
We have corroborated that data, and provided some advice on tuning.

So, what is the problem being reported?

Comment 16 Steve Russell 2005-08-24 11:17:06 UTC
Hi,

Fundamentally, we're seeing two behaviors, one iowait is going up long before 
it should and absolute IO performance is going down.  From comment 13, your 
testing is showing roughly a 50% decrease in throughput(compared to sync) and 
net performance that is 30-40% slower (which I would categorize as a problem).  

I can see the behavior with as little as 5-10MB/sec in throughput (to a 10 disk 
strip with 4 200M channels to a dedicated array with 4G of cache). Basically 
looks like the 7 processors are waiting on kevent which seems to write out less 
efficiently than it should and thus makes kio slower than standard sync (which 
it shouldn't be).

If you've provided any advice on tuning (other than to suggest turning on dio 
which is the slowest of all the tests that either of us have performed) it must 
be in a private comment.  Happy to test any tuning suggestions you have once I 
can see them.

tx

Steve

Comment 17 Jeff Moyer 2005-08-24 13:15:44 UTC
Steve,

OK, thanks for clarification.  I'll work on a patch to remove some of the
bottleneck from keventd.  No timeframes yet on availability, but I'll let you
know when I have something that works.

Thanks,

Jeff

Comment 18 John Shakshober 2005-08-30 19:20:08 UTC
Output from new kernel;
2.4.21-34.ELaio.percpu.1hugemem #1 SMP Thu Aug 25 16:30:16 EDT

A few changes since the last runs;  using 2 db writers and hughmem kernels
for all runs (prior report had sync IO using 1 db writer).

bash-2.05b$ grep real *3.txt

cr_jeffm_sync3.txt:real         25m11.812s  = 1511.81 s
cr_jeffm_async3.txt:real        26m31.007s  = 1591.01 s

Thus Jeff's patch has effectively removed the time in keventd0
out to the aio threads which are no longer saturating a single cpu.

Without faster IO underneath the lun, the overall performance of the tablespace
create has remains within 5.2% of the baseline result (1591s/1511s).

(Please correct previous runtime baseline for 2-dbwriters to 25m39.046s)

In both cases, using more dbwriters and/or using asynchronous multi-threaded IO
does not help improve the performance of an single file database create. 
Other loads which are writing to multiple files and file systems or on systems with
IO capabilities may benefit more from AIO.

Shak


Using Oracle aio_only;

92 processes: 89 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
          total    2.8%    0.0%   11.6%   0.0%     0.4%    8.9%   76.1%
          cpu00    0.7%    0.0%    2.5%   0.0%     0.0%    0.3%   96.2%
          cpu01    0.0%    0.0%    7.5%   0.0%     0.0%    4.5%   87.8%
          cpu02    0.0%    0.0%    7.7%   0.0%     1.9%   50.6%   39.5%
          cpu03    0.0%    0.0%    1.7%   0.0%     1.3%    0.0%   96.8%
          cpu04   17.7%    0.0%   19.9%   0.0%     0.0%    5.9%   56.3%
          cpu05    0.9%    0.0%   19.5%   0.0%     0.0%    4.1%   75.2%
          cpu06    2.9%    0.0%   23.3%   0.0%     0.0%    2.5%   71.1%
          cpu07    0.0%    0.0%   10.7%   0.0%     0.0%    3.3%   85.8%
Mem:  8205772k av,  946812k used, 7258960k free,       0k shrd,   12888k buff
      841928k active,              10856k inactive
Swap: 2040244k av,       0k used, 2040244k free                  800172k cached

 PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
  29 root      16   0     0    0     0 SW    2.8  0.0   0:01   6 aio/6
1861 oracle    24   0 29904  22M 17396 R     2.8  0.2   0:02   4 oracle
  28 root      16   0     0    0     0 RW    2.4  0.0   0:02   5 aio/5
  27 root      16   0     0    0     0 SW    2.3  0.0   0:02   4 aio/4
  30 root      16   0     0    0     0 SW    1.2  0.0   0:00   7 aio/7
  24 root      16   0     0    0     0 SW    0.9  0.0   0:00   1 aio/1
1791 root      15   0     0    0     0 SW    0.9  0.0   0:00   2 kjournald
  10 root      15   0     0    0     0 SW    0.3  0.0   0:00   3 keventd
   1 root      15   0   508  508   448 S     0.0  0.0   0:08   2 init
   2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0

-bash-2.05b$ iostat -x 10 | grep sdf1
sdf1         0.03 767.93  1.04 27.10   14.33 6368.66     7.17  3184.33   226.83
   12.06  425.79   1.39   3.91
sdf1         0.00 15211.20 14.70 544.00  116.80 126147.20    58.40 63073.60  
226.00   220.51  393.81   1.03  57.50
sdf1         0.00 16784.50 13.50 701.50  108.80 139723.20    54.40 69861.60  
195.57   134.84  198.43   1.03  74.00
sdf1         0.00 11036.60 15.60 597.60  124.00 93309.60    62.00 46654.80  
152.37   165.26  263.86   0.66  40.60
sdf1         0.00 13632.20 13.80 705.40  110.40 114704.80    55.20 57352.40  
159.64   259.45  360.74   0.90  64.40
sdf1         0.00 12817.10 14.00 771.70  112.00 108712.80    56.00 54356.40  
138.51   213.49  272.43   0.67  52.90
sdf1         0.00 13701.60 14.30 925.20  114.40 117057.60    57.20 58528.80  
124.72   227.56  243.78   0.58  54.10
sdf1         0.00 14114.90 14.70 892.40  117.60 120079.20    58.80 60039.60  
132.51   229.90  251.84   0.61  54.90
sdf1         0.00 13628.97 13.79 743.66  110.29 114965.03    55.14 57482.52  
151.93   219.74  289.27   0.71  53.85
sdf1         0.00 13229.60 14.30 670.60  114.40 111203.20    57.20 55601.60  
162.53   213.42  311.70   0.78  53.1


With SYNC IO:



92 processes: 90 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
          total    1.6%    0.0%   10.7%   0.0%     5.0%    1.1%   81.2%
          cpu00    0.0%    0.0%    5.7%   0.0%     0.0%    0.0%   94.2%
          cpu01    0.9%    0.0%    1.9%   0.0%     0.0%    0.9%   96.1%
          cpu02    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
          cpu03    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
          cpu04   12.5%    0.0%   78.8%   0.0%     0.0%    8.6%    0.0%
          cpu05    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
          cpu06    0.0%    0.0%    0.0%   0.0%    40.3%    0.0%   59.6%
          cpu07    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
Mem:  8205772k av, 8190496k used,   15276k free,       0k shrd,   49944k buff
                   258108k actv, 5866336k in_d,  133168k in_c
Swap: 2040244k av,       0k used, 2040244k free                 7791572k cached

 PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
2052 oracle    25   0 29692  22M 17180 R    11.3  0.2   5:05   4 oracle
  19 root      16   0     0    0     0 SW    0.7  0.0   2:06   0 kswapd
2085 oracle    20   0  1124 1124   892 R     0.2  0.0   0:00   1 top
   1 root      15   0   304  304   244 S     0.0  0.0   0:08   1 init
   2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0 

Comment 19 Steve Russell 2005-08-31 07:18:07 UTC
Thanks for this, looks very promising.  Can I get a link to download an SMP 
version of this kernel (we don't use hugemem when other 3rd party software is 
involved)?  We'll stick it in the more 'real world' workloads and let you know 
how it does when aio is unleashed.  Related question, I assume both of our dio 
tests were so bad (compared to sync) because the tablespace create does lots of 
re-reads (haven't traced the process so don't know this to be fact)?  Its a 
logical assumption, but wondered if any of your tracing showed that behavior?

Lastly, we're fairly late in the beta cycle here, but is the intent to roll 
this into u6?

tx

Steve

tx

Steve

Comment 20 Jeff Moyer 2005-08-31 14:10:12 UTC
I'll kick of an SMP build, but please note that this change has NOT been checked
into CVS and is NOT a supported build.  I'll attach a copy of the patch to this
bugzilla, too, so there is some record of it.  Note that the patch, in its
current state, is not acceptable for merging.  There are some known issues with
UP builds that will have to be addressed.

In order to put a patch like this into the next update we would need a
substantial amount of testing under varied workloads to ensure it does not
introduce regressions.

Do you have a baseline to compare this kernel to?  How are you going to quantify
the performance impact?

Comment 21 Jeff Moyer 2005-08-31 15:31:39 UTC
Created attachment 118304 [details]
Preliminary per-cpu aio thread implementation

Here is the patch that was used during the db create runs that shak
benchmarked.

Comment 22 Jeff Moyer 2005-08-31 15:35:22 UTC
You can find the smp kernel rpm for i686 at:

http://people.redhat.com/.bz165687/kernel-smp-2.4.21-34.ELaio.percpu.1.i686.rpm

Please let us know how your testing goes.

Thanks.

Comment 23 Ernie Petrides 2005-08-31 18:45:04 UTC
Steve, in answer to the last question in comment #19, U6 is already closed.

Comment 24 Jeff Moyer 2005-08-31 21:57:50 UTC
(In reply to comment #22)
> You can find the smp kernel rpm for i686 at:
> 
> http://people.redhat.com/.bz165687/kernel-smp-2.4.21-34.ELaio.percpu.1.i686.rpm

Of course, I meant:

http://people.redhat.com/jmoyer/.bz165687/kernel-smp-2.4.21-34.ELaio.percpu.1.i686.rpm

Comment 25 Steve Russell 2005-09-02 08:18:04 UTC
Hi,

We'll download it today, stick it together with powerpath and potentially get 
some results this afternoon/early next week.  The tests that we'll use are:
- parallel insert test, sql load of 2, 4, 8, 16, 32 clients doing 10,000 4k 
inserts each (though as this gates mostly on lgwr don't expect a huge 
improvement here).
- hammerora (models tpc-c)
- potentially parallel index rebuilds on some big tables, should stress the 
dbwr processes without gating on lgwr (which always does sync io).

tx

Steve

Comment 26 Steve Russell 2005-09-02 09:30:56 UTC
Looking at this a bit more, the directio stuff looks a bit dodgey too.  dio 
slower than sync io and both sets show almost no read activity (and thus 
presumeably no re-read activity).  Any thoughts on why?  Certainly seems to 
diverge from Shak's testing of rhel4 (which is obviously different in more ways 
than it is useful to count).

tx

Steve

Comment 27 Steve Russell 2005-09-14 18:19:18 UTC
Sorry for the delay on this.  I was out for vacation and the guys didn't have 
enough free time to test in my absence.  Will try to get something out this 
week.  Any thoughts on the dio vs sync discrepancy?

Tx

Steve

Comment 28 Steve Russell 2005-09-21 14:00:00 UTC
Hi,

Here's some of the long awaited results.  Summary, better, but still odd.  Looks
like in our test (unlike yours) the aio threads are added, but only 2 of them
get dispatched to (thus shrinking aio time by half, but still leaving it much
higher than everything else).  Did you integrate your kernel with PowerPath?  We
suspect that's making both the original situation and the testing of your fix
less than optimal.  here's the data (sorry for the long post), 10GB tablespace
rather than 100 to save time:

Summary
-------

Neither Direct IO or Async	Elapsed: 00:05:28.39
Direct IO Only			Elapsed: 00:06:30.73
Async IO only			Elapsed: 00:20:01.98


Regards

Jeremy



--------------------------------------------------------------------------------------------

Direct IO Only

Elapsed: 00:06:30.73


xldn1336por(parkerjb)::/$ iostat -x 30|grep emcpowerm1
emcpowerm1   1.34 1828.08  0.01  0.00   61.12 16300.61    30.56  
8150.31 1700766.53     1.79    0.25 103271.67  99.35
emcpowerm1   0.00   1.77  0.00  0.00    0.53   54.93     0.27    27.47     
0.00   831.94    0.00   0.00 100.00
emcpowerm1   0.00   3.10  0.00  0.00    0.00  117.07     0.00    58.53     
0.00   929.08    0.00   0.00 100.00
emcpowerm1   0.00   1.20  0.00  0.00    0.00   26.40     0.00    13.20     
0.00   375.04    0.00   0.00 100.00
emcpowerm1   0.00   1.23  0.00  0.00 1785.13  127.33   892.57    63.67     
0.00   336.05    0.00   0.00 100.00
emcpowerm1   0.00 3055.27  0.00  0.00  468.00 45179.60   234.00 
22589.80     0.00     5.16    0.00   0.00 100.00
emcpowerm1   0.00 3937.70  0.00  0.00   24.80 63556.20    12.40 
31778.10     0.00    78.67    0.00   0.00 100.00
emcpowerm1   0.00 1577.30  0.00  0.00    4.27 33154.80     2.13 
16577.40     0.00    18.88    0.00   0.00 100.00
emcpowerm1   0.00 1268.37  0.00  0.00   21.33 27344.17    10.67 
13672.08     0.00   408.74    0.00   0.00 100.00
emcpowerm1   0.00 632.17  0.00  0.00    5.87 13291.37     2.93  6645.68     
0.00  1129.82    0.00   0.00 100.00
emcpowerm1   0.00 3134.77  0.00  0.00   17.07 29010.20     8.53 
14505.10     0.00  1037.75    0.00   0.00 100.00
emcpowerm1   0.00 2713.47  0.00  0.00    3.73 22705.67     1.87 
11352.83     0.00    74.73    0.00   0.00 100.00
emcpowerm1   0.00 3118.13  0.00  0.00   20.27 26067.93    10.13 
13033.97     0.00   476.41    0.00   0.00 100.00
emcpowerm1   0.00 2960.83  0.00  0.00    4.27 24739.27     2.13 
12369.63     0.00   605.07    0.00   0.00 100.00
emcpowerm1   0.00 10782.07  0.00  0.00   57.33 89999.27    28.67 
44999.63     0.00   400.00    0.00   0.00 100.00
emcpowerm1   0.00 12510.30  0.00  0.00  248.80 104398.47   124.40 
52199.23     0.00   204.41    0.00   0.00 100.00
emcpowerm1   0.00 12513.27  0.00  0.00    8.00 104446.47     4.00 
52223.23     0.00   617.52    0.00   0.00 100.00
emcpowerm1   0.00 12538.67  0.00  0.00   19.20 104641.80     9.60 
52320.90     0.00  1272.01    0.00   0.00 100.00
emcpowerm1   0.00 1866.74  0.00  0.00   72.78 15659.11    36.39  
7829.56     0.00  1055.58    0.00   0.00  99.97
emcpowerm1   0.00   1.40  0.00  0.00   16.00   74.93     8.00    37.47     
0.00  1051.46    0.00   0.00 100.00
emcpowerm1   0.00   1.40  0.00  0.00    4.27   49.73     2.13    24.87     
0.00   206.68    0.00   0.00 100.00
emcpowerm1   0.00   1.40  0.00  0.00   16.53   51.93     8.27    25.97     
0.00    14.35    0.00   0.00 100.00
emcpowerm1   0.00   1.40  0.00  0.00    4.27   49.67     2.13    24.83     
0.00   146.39    0.00   0.00 100.00
emcpowerm1   0.00   1.40  0.00  0.00   16.00   49.73     8.00    24.87     
0.00  1100.26    0.00   0.00 100.00





--------------------------------------------------------------------------------------------

Using async IO only

Elapsed: 00:20:01.98

xldn1336por(parkerjb)::/$ iostat -x 30|grep emcpowerm1
emcpowerm1   1.45 1913.43  0.01  0.00   66.11 16588.80    33.05  
8294.40 1573317.49     0.81    0.25 93789.69  99.28
emcpowerm1  13.14 5422.64  0.00  0.00  422.01 71277.09   211.00 
35638.55     0.00   310.72    0.00   0.00 100.03
emcpowerm1   0.03 416.93  0.00  0.00    5.33 18970.67     2.67  9485.33     
0.00   128.49    0.00   0.00 100.00
emcpowerm1   0.00 179.14  0.00  0.00    1.60 6596.47     0.80  3298.23     
0.00   156.17    0.00   0.00  99.97
emcpowerm1   0.00  87.67  0.00  0.00    0.80 5608.27     0.40  2804.13     
0.00   622.60    0.00   0.00 100.00
emcpowerm1   0.00  88.73  0.00  0.00    0.80 5514.13     0.40  2757.07     
0.00   160.67    0.00   0.00 100.00
emcpowerm1   0.00  86.93  0.00  0.00    0.80 5068.80     0.40  2534.40     
0.00    77.42    0.00   0.00 100.00
emcpowerm1   0.00  98.20  0.00  0.00    0.53 5133.33     0.27  2566.67     
0.00   814.05    0.00   0.00 100.00
emcpowerm1   0.00 125.53  0.00  0.00    0.80 5850.93     0.40  2925.47     
0.00   358.20    0.00   0.00 100.00
emcpowerm1   0.00 147.23  0.00  0.00    0.80 5952.53     0.40  2976.27     
0.00   123.99    0.00   0.00 100.00
emcpowerm1   0.00 194.37  0.00  0.00    1.87 7236.00     0.93  3618.00     
0.00   242.19    0.00   0.00 100.00
emcpowerm1   0.03 174.33  0.00  0.00    4.53 6754.93     2.27  3377.47     
0.00   167.82    0.00   0.00 100.00
emcpowerm1   0.00 147.87  0.00  0.00    1.33 6210.40     0.67  3105.20     
0.00   373.88    0.00   0.00 100.00
emcpowerm1   0.00 138.23  0.00  0.00    0.53 6503.73     0.27  3251.87     
0.00   401.56    0.00   0.00 100.00
emcpowerm1   0.00 171.87  0.00  0.00    0.80 7093.60     0.40  3546.80     
0.00   145.92    0.00   0.00 100.00
emcpowerm1   0.00 239.27  0.00  0.00    0.53 6769.07     0.27  3384.53     
0.00   661.91    0.00   0.00 100.00
emcpowerm1   0.00 637.53  0.00  0.00    2.40 10381.33     1.20  5190.67     
0.00   405.76    0.00   0.00 100.00
emcpowerm1   0.00 1239.13  0.00  0.00    2.67 12441.07     1.33  
6220.53     0.00   954.73    0.00   0.00 100.00
emcpowerm1   0.00  85.23  0.00  0.00    1.33 6705.87     0.67  3352.93     
0.00   414.92    0.00   0.00 100.00
emcpowerm1   0.00  97.87  0.00  0.00    1.33 7205.60     0.67  3602.80     
0.00   172.06    0.00   0.00 100.00
emcpowerm1   0.00 400.37  0.00  0.00   11.20 11632.92     5.60  5816.46     
0.00   838.73    0.00   0.00  99.97
emcpowerm1   0.17 849.63  0.00  0.00   54.13 20847.73    27.07 10423.87     
0.00   349.42    0.00   0.00 100.00
emcpowerm1   0.00 371.30  0.00  0.00    9.87 7137.87     4.93  3568.93     
0.00   410.53    0.00   0.00 100.00
emcpowerm1   0.00 316.50  0.00  0.00    8.27 7054.13     4.13  3527.07     
0.00    62.71    0.00   0.00 100.00
emcpowerm1   0.00 273.70  0.00  0.00    7.47 7039.47     3.73  3519.73     
0.00   211.97    0.00   0.00 100.00
emcpowerm1   0.10 322.97  0.00  0.00   15.73 7799.47     7.87  3899.73     
0.00   672.85    0.00   0.00 100.00
emcpowerm1   0.00 577.83  0.00  0.00    7.73 9243.47     3.87  4621.73     
0.00    29.68    0.00   0.00 100.00
emcpowerm1   0.00 1051.40  0.00  0.00    8.00 11900.27     4.00  
5950.13     0.00    31.37    0.00   0.00 100.00
emcpowerm1   0.00 1157.30  0.00  0.00    2.40 14036.80     1.20  
7018.40     0.00   818.47    0.00   0.00 100.00
emcpowerm1   0.00 894.33  0.00  0.00    2.67 12834.40     1.33  6417.20     
0.00   160.16    0.00   0.00 100.00
emcpowerm1   0.00 801.63  0.00  0.00    0.53 11448.80     0.27  5724.40     
0.00   157.34    0.00   0.00 100.00
emcpowerm1   0.00 738.40  0.00  0.00    1.07 11672.53     0.53  5836.27     
0.00   460.79    0.00   0.00 100.00
emcpowerm1   0.00 1159.67  0.00  0.00    0.53 18200.80     0.27  
9100.40     0.00   140.92    0.00   0.00 100.00
emcpowerm1   0.00 2565.10  0.00  0.00    0.00 43179.47     0.00 
21589.73     0.00   133.86    0.00   0.00 100.00
emcpowerm1   0.00 1500.93  0.00  0.00    0.00 33383.47     0.00 
16691.73     0.00   422.59    0.00   0.00 100.00
emcpowerm1   0.00 1196.77  0.00  0.00    0.00 26195.47     0.00 
13097.73     0.00   243.73    0.00   0.00 100.00
emcpowerm1   0.00 654.92  0.00  0.00    0.27 13652.52     0.13  6826.26     
0.00   591.95    0.00   0.00  99.97
emcpowerm1   0.00 3958.87  0.00  0.00    0.00 34173.07     0.00 
17086.53     0.00   504.45    0.00   0.00 100.00
emcpowerm1   0.00 2770.70  0.00  0.00    0.00 22968.53     0.00 
11484.27     0.00   353.74    0.00   0.00 100.00
emcpowerm1   0.00 3341.63  0.00  0.00    0.00 27723.73     0.00 
13861.87     0.00   674.83    0.00   0.00 100.00
emcpowerm1   0.00 3463.19  0.00  0.00    0.00 28706.73     0.00 
14353.36     0.00   562.95    0.00   0.00 100.03
emcpowerm1   0.00 11889.89  0.00  0.00    0.00 98695.93     0.00 
49347.97     0.00   611.48    0.00   0.00  99.97
emcpowerm1   0.00   1.77  0.00  0.00    0.27   57.87     0.13    28.93     
0.00   216.47    0.00   0.00 100.00
emcpowerm1   0.00   1.73  0.00  0.00    0.00   53.07     0.00    26.53     
0.00    58.16    0.00   0.00 100.00




Three sample top outputs:


 14:13:54  up  4:33,  5 users,  load average: 3.42, 3.54, 2.68
219 processes: 218 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total    0.9%    0.0%    2.1%   0.0%     0.0%   43.9%   
52.8%
           cpu00    0.3%    0.0%    0.7%   0.0%     0.0%   98.8%    
0.0%
           cpu01    0.1%    0.0%    1.1%   0.0%     0.0%   96.4%    
2.1%
           cpu02    0.1%    0.0%   13.7%   0.0%     0.1%   85.8%    
0.0%
           cpu03    0.1%    0.0%    0.1%   0.0%     0.0%   70.4%   
29.1%
           cpu04    6.3%    0.0%    0.5%   0.0%     0.1%    0.0%   
92.8%
           cpu05    0.1%    0.0%    0.1%   0.0%     0.1%    0.0%   
99.4%
           cpu06    0.1%    0.0%    0.0%   0.0%     0.0%    0.0%   
99.8%
           cpu07    0.0%    0.0%    0.3%   0.0%     0.0%    0.0%   
99.6%
Mem:  16423520k av, 13852980k used, 2570540k free,       0k shrd,   
38520k buff
                   8128736k actv,  333468k in_d,  359212k in_c
Swap: 8388600k av,       0k used, 8388600k free                 
9047436k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
   25 root      15   0     0    0     0 DW    1.5  0.0   1:31   2 aio/2
24929 oracle    15   0 1155M 1.1G 1143M S     0.8  7.1   0:39   4 
oracle
   21 root      15   0     0    0     0 DW    0.1  0.0   1:42   1 
bdflush
    1 root      15   0   524  524   460 S     0.0  0.0   0:06   6 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 
migration/0
    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1 
migration/1
    4 root      RT   0     0    0     0 SW    0.0  0.0   0:00   2 
migration/2
    5 root      RT   0     0    0     0 SW    0.0  0.0   0:00   3 
migration/3
    6 root      RT   0     0    0     0 SW    0.0  0.0   0:00   4 
migration/4
    7 root      RT   0     0    0     0 SW    0.0  0.0   0:00   5 
migration/5
    8 root      RT   0     0    0     0 SW    0.0  0.0   0:00   6 
migration/6
    9 root      RT   0     0    0     0 SW    0.0  0.0   0:00   7 
migration/7
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 
keventd
   11 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 
ksoftirqd/0
   12 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1 
ksoftirqd/1
   13 root      34  19     0    0     0 SWN   0.0  0.0   0:00   2 
ksoftirqd/2
   14 root      39  19     0    0     0 SWN   0.0  0.0   0:00   3 
ksoftirqd/3
   15 root      34  19     0    0     0 SWN   0.0  0.0   0:00   4 
ksoftirqd/4
   16 root      39  19     0    0     0 SWN   0.0  0.0   0:00   5 
ksoftirqd/5
   17 root      39  19     0    0     0 SWN   0.0  0.0   0:00   6 
ksoftirqd/6
   18 root      39  19     0    0     0 SWN   0.0  0.0   0:00   7 
ksoftirqd/7
   19 root      15   0     0    0     0 SW    0.0  0.0   2:22   7 
kswapd
   20 root      15   0     0    0     0 SW    0.0  0.0   0:01   7 
kscand
   22 root      15   0     0    0     0 SW    0.0  0.0   0:15   5 
kupdated
   23 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 aio/0
   24 root      15   0     0    0     0 SW    0.0  0.0   0:00   1 aio/1
   26 root      15   0     0    0     0 SW    0.0  0.0   0:00   3 aio/3
   27 root      15   0     0    0     0 SW    0.0  0.0   0:00   4 aio/4
   28 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 aio/5
   29 root      15   0     0    0     0 SW    0.0  0.0   0:00   6 aio/6
   30 root      15   0     0    0     0 SW    0.0  0.0   0:00   7 aio/7
   31 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 
mdrecoveryd




 14:07:28  up  4:27,  5 users,  load average: 3.64, 3.14, 2.13
219 processes: 218 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total    0.1%    0.0%    0.7%   0.0%     0.1%   40.9%   
57.8%
           cpu00    0.5%    0.0%    0.0%   0.0%     0.0%    0.0%   
99.4%
           cpu01    0.0%    0.0%    0.0%   0.0%     0.0%   38.2%   
61.7%
           cpu02    0.0%    0.0%    3.1%   0.0%     0.1%   96.6%    
0.0%
           cpu03    0.1%    0.0%    1.3%   0.0%     0.1%   90.2%    
7.9%
           cpu04    0.5%    0.1%    0.9%   0.0%     0.0%    0.0%   
98.2%
           cpu05    0.1%    0.3%    0.0%   0.0%     0.5%    0.0%   
98.8%
           cpu06    0.0%    0.0%    0.1%   0.0%     0.1%    3.1%   
96.4%
           cpu07    0.0%    0.0%    0.5%   0.0%     0.0%   99.4%    
0.0%
Mem:  16423520k av, 10143028k used, 6280492k free,       0k shrd,   
32344k buff
                   4423732k actv,  333508k in_d,  356492k in_c
Swap: 8388600k av,       0k used, 8388600k free                 
5346464k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
   25 root      15   0     0    0     0 DW    0.3  0.0   0:47   2 aio/2
    1 root      19   0   524  524   460 S     0.0  0.0   0:06   3 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 
migration/0
    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1 
migration/1
    4 root      RT   0     0    0     0 SW    0.0  0.0   0:00   2 
migration/2
    5 root      RT   0     0    0     0 SW    0.0  0.0   0:00   3 
migration/3
    6 root      RT   0     0    0     0 SW    0.0  0.0   0:00   4 
migration/4
    7 root      RT   0     0    0     0 SW    0.0  0.0   0:00   5 
migration/5
    8 root      RT   0     0    0     0 SW    0.0  0.0   0:00   6 
migration/6
    9 root      RT   0     0    0     0 SW    0.0  0.0   0:00   7 
migration/7
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 
keventd
   11 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 
ksoftirqd/0
   12 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1 
ksoftirqd/1
   13 root      34  19     0    0     0 SWN   0.0  0.0   0:00   2 
ksoftirqd/2
   14 root      39  19     0    0     0 SWN   0.0  0.0   0:00   3 
ksoftirqd/3
   15 root      34  19     0    0     0 SWN   0.0  0.0   0:00   4 
ksoftirqd/4
   16 root      39  19     0    0     0 SWN   0.0  0.0   0:00   5 
ksoftirqd/5
   17 root      39  19     0    0     0 SWN   0.0  0.0   0:00   6 
ksoftirqd/6
   18 root      39  19     0    0     0 SWN   0.0  0.0   0:00   7 
ksoftirqd/7
   21 root      15   0     0    0     0 DW    0.0  0.0   1:36   3 
bdflush
   19 root      15   0     0    0     0 SW    0.0  0.0   2:22   5 
kswapd
   20 root      15   0     0    0     0 SW    0.0  0.0   0:01   1 
kscand
   22 root      15   0     0    0     0 SW    0.0  0.0   0:15   6 
kupdated
   23 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 aio/0
   24 root      15   0     0    0     0 SW    0.0  0.0   0:00   1 aio/1
   26 root      15   0     0    0     0 SW    0.0  0.0   0:00   3 aio/3
   27 root      15   0     0    0     0 SW    0.0  0.0   0:00   4 aio/4
   28 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 aio/5
   29 root      15   0     0    0     0 SW    0.0  0.0   0:00   6 aio/6
   30 root      15   0     0    0     0 SW    0.0  0.0   0:00   7 aio/7






 14:05:37  up  4:25,  5 users,  load average: 3.02, 2.92, 1.95
220 processes: 219 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total    0.3%    0.0%    2.1%   0.0%     0.7%   33.2%   
63.4%
           cpu00    2.1%    0.0%    0.5%   0.0%     0.0%    0.0%   
97.2%
           cpu01    0.3%    0.3%    3.1%   0.0%     0.0%   68.9%   
27.0%
           cpu02    0.1%    0.0%   10.7%   0.0%     1.3%   87.6%    
0.0%
           cpu03    0.0%    0.0%    0.1%   0.0%     0.7%    0.0%   
99.0%
           cpu04    0.1%    0.0%    0.1%   0.0%     1.3%    0.0%   
98.2%
           cpu05    0.0%    0.0%    0.1%   0.0%     2.3%   24.3%   
73.1%
           cpu06    0.0%    0.0%    1.9%   0.0%     0.0%   84.0%   
13.9%
           cpu07    0.0%    0.0%    0.1%   0.0%     0.0%    0.7%   
99.0%
Mem:  16423520k av, 9631480k used, 6792040k free,       0k shrd,   
30528k buff
                   3903120k actv,  333056k in_d,  365384k in_c
Swap: 8388600k av,       0k used, 8388600k free                 
4836372k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
   25 root      15   0     0    0     0 DW    1.3  0.0   0:41   2 aio/2
   21 root      15   0     0    0     0 SW    0.3  0.0   1:35   1 
bdflush
24929 oracle    15   0 1155M 1.1G 1143M S     0.3  7.1   0:16   0 
oracle
32492 root      15   0     0    0     0 DW    0.2  0.0   0:51   6 
kjournald
    1 root      15   0   524  524   460 S     0.0  0.0   0:06   3 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 
migration/0
    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1 
migration/1
    4 root      RT   0     0    0     0 SW    0.0  0.0   0:00   2 
migration/2
    5 root      RT   0     0    0     0 SW    0.0  0.0   0:00   3 
migration/3
    6 root      RT   0     0    0     0 SW    0.0  0.0   0:00   4 
migration/4
    7 root      RT   0     0    0     0 SW    0.0  0.0   0:00   5 
migration/5
    8 root      RT   0     0    0     0 SW    0.0  0.0   0:00   6 
migration/6
    9 root      RT   0     0    0     0 SW    0.0  0.0   0:00   7 
migration/7
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 
keventd
   11 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 
ksoftirqd/0
   12 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1 
ksoftirqd/1
   13 root      34  19     0    0     0 SWN   0.0  0.0   0:00   2 
ksoftirqd/2
   14 root      39  19     0    0     0 SWN   0.0  0.0   0:00   3 
ksoftirqd/3
   15 root      34  19     0    0     0 SWN   0.0  0.0   0:00   4 
ksoftirqd/4
   16 root      39  19     0    0     0 SWN   0.0  0.0   0:00   5 
ksoftirqd/5
   17 root      39  19     0    0     0 SWN   0.0  0.0   0:00   6 
ksoftirqd/6
   18 root      39  19     0    0     0 SWN   0.0  0.0   0:00   7 
ksoftirqd/7
   19 root      15   0     0    0     0 SW    0.0  0.0   2:22   5 
kswapd
   20 root      15   0     0    0     0 SW    0.0  0.0   0:01   7 
kscand
   22 root      15   0     0    0     0 SW    0.0  0.0   0:15   6 
kupdated
   23 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 aio/0
   24 root      15   0     0    0     0 SW    0.0  0.0   0:00   1 aio/1
   26 root      15   0     0    0     0 SW    0.0  0.0   0:00   3 aio/3
   27 root      15   0     0    0     0 SW    0.0  0.0   0:00   4 aio/4
   28 root      15   0     0    0     0 SW    0.0  0.0   0:00   5 aio/5
   29 root      15   0     0    0     0 SW    0.0  0.0   0:00   6 aio/6
   30 root      16   0     0    0     0 SW    0.0  0.0   0:00   7 aio/7
   31 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 
mdrecoveryd


--------------------------------------------------------------------------------------------


Neither Direct IO or Async

Elapsed: 00:05:28.39


xldn1336por(parkerjb)::/$ iostat -x 30|grep emcpowerm1
emcpowerm1   1.44 1870.89  0.01  0.00   64.16 16072.76    32.08  
8036.38 1445633.40     1.45    0.25 88909.57  99.25
emcpowerm1   0.00   1.73  0.00  0.00    0.00   53.62     0.00    26.81     
0.00   456.93    0.00   0.00 100.03
emcpowerm1   0.00   1.73  0.00  0.00    0.00   53.33     0.00    26.67     
0.00  1121.24    0.00   0.00 100.00
emcpowerm1   0.00   1.77  0.00  0.00    0.00   53.07     0.00    26.53     
0.00   162.69    0.00   0.00 100.00
emcpowerm1   0.00 5981.03  0.00  0.00    0.00 91080.00     0.00 
45540.00     0.00   535.18    0.00   0.00 100.00
emcpowerm1   0.00 1850.80  0.00  0.00    0.00 36775.47     0.00 
18387.73     0.00   172.98    0.00   0.00 100.00
emcpowerm1   0.00 1374.57  0.00  0.00    0.00 30032.00     0.00 
15016.00     0.00   513.38    0.00   0.00 100.00
emcpowerm1   0.00 725.53  0.00  0.00    0.00 16170.93     0.00  8085.47     
0.00   410.84    0.00   0.00 100.00
emcpowerm1   0.00 2715.60  0.00  0.00    0.00 27600.80     0.00 
13800.40     0.00   450.69    0.00   0.00 100.00
emcpowerm1   0.00 3088.17  0.00  0.00    0.00 25614.13     0.00 
12807.07     0.00  1032.48    0.00   0.00 100.00
emcpowerm1   0.00 2933.53  0.00  0.00    0.00 24372.00     0.00 
12186.00     0.00   501.49    0.00   0.00 100.00
emcpowerm1   0.00 3226.83  0.00  0.00    0.00 26796.53     0.00 
13398.27     0.00    83.28    0.00   0.00 100.00
emcpowerm1   0.00 12181.53  0.00  0.00    0.53 100927.20     0.27 
50463.60     0.00   444.34    0.00   0.00 100.00
emcpowerm1   0.00 18399.07  0.00  0.00    0.00 152347.35     0.00 
76173.68     0.00    16.72    0.00   0.00  99.97
emcpowerm1   0.00 18037.50  0.00  0.00    4.27 149355.73     2.13 
74677.87     0.00   849.80    0.00   0.00 100.00
emcpowerm1   0.00 413.70  0.00  0.00    2.13 3550.67     1.07  1775.33     
0.00   761.65    0.00   0.00 100.00


--------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------

Comment 29 John Shakshober 2005-09-21 14:14:55 UTC
We did not have EMC storage so did not/could not use powerpath in our testing.

Are we basically running at the speed of the disk/IO subsystem.  In this case 
Multithreading the I/O using AIO will only add overhead preventing and 
improvements that AIO might be able to sustain.

Our experience with our partners has been AIO only has substanial payoff on 
large IO subsystems?


Comment 30 Steve Russell 2005-09-21 14:31:32 UTC
Interestingly the LUN shows loaded at 100% either way (from the above iostat), 
but the throughput rates on the sync/dio case is a multiple of the aio case.  
Overall we're not pushing this too hard, its a dedicated array with 3G of write 
cache and 10 spinning bits of copper for the LUN with 2 200MB/sec channels to 
the array.  In the sync IO case, there's a burst in the 150M/sec rate where the 
aio case struggles to get past 30-40MB/sec and is generally at 10-12M/sec.

tx

steve

Comment 31 Jeff Moyer 2005-09-22 13:57:33 UTC
How many threads does oracle use to submit I/O?  The patch I posted will queue
processing to a thread on the same CPU that the I/O was scheduled.  Thus, if the
app is only doing AIO from two threads, and they are always scheduled on the
same CPUs, then that would explain the behaviour you are seeing.

Actually, looking at your top output above, it appears that only one aio thread
is getting any CPU time.

Hmm, do you really have 16GB of memory?  If so, we should probably be testing
with -hugemem.  The smp kernel performs quite poorly with 16GB.  We'll continue
with the smp kernel, so as not to invalidate all of the numbers in this bz. 
However, please note that you should deploy hugemem in production.  Also, could
you get the output from sysrq-m while the tests are running?  This will help to
see if we are running into memory pressure issues.


So, I'm putting together another kernel that will schedule I/O round-robin
between the aio threads.  I'll post a pointer to the kernel when it is ready.

Thanks.

Comment 32 John Shakshober 2005-09-22 15:07:22 UTC
As a fix to note #29, the current testbed we are using to test the AIO kernels 
is an 8-cpu Xeon, with 8GB  of memory using qlogic QLA2300 an EMC Clarion ... 
but we have not been using powerpath (yet) as we have only been using 1 FC 
adapter.

Sorry for any confusion as I am out of town on a business trip this week. 

Comment 34 Jeff Moyer 2005-10-21 15:32:55 UTC
I'm changing this back to ASSIGNED as it's really waiting on me.

Comment 37 Jeff Moyer 2007-01-17 20:09:58 UTC
Issue was closed, and does not meet the RHEL 3.9 release criteria.


Note You need to log in before you can comment on or make changes to this bug.