Bug 104633

Summary: The synchronous write() system call of RHEL3.0 is slower than that of RHEL2.1.
Product: Red Hat Enterprise Linux 3 Reporter: L3support <linux-sid>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: baldessari, bnocera, chrismcc, dg, equus, gabor.kondorosi, jkeating, joe, johnstul, jon, link, linux-sid, mario.lorenz, mkunjal, pamadio, pdemauro, redhat, richard.cunningham, sct, suman, tao, t.h.amundsen, tom.obrien, Winfrid.Tschiedel
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-05-11 21:07:35 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 103278, 106771    
Description Flags
write.c none

Description L3support 2003-09-18 05:22:37 EDT
Description of problem:
The synchronous write() system call of RHEL3.0 is slower than that of RHEL2.1,
when the write data exceeds 4 K bytes.
We measured the performance using the test program
which writes to ext3 file system on SCSI disk(sym53c8xx).
We used the data size of 2KB, 4KB, 8KB, and 16KB for writing,
the write system call of RHEL3.0 takes as twice time as that of RHEL2.1,
in case of 16KB.
The following is the summary of the result.
 size    Processing time
 2048    77.801379sec
 4096    40.136581sec
 8192    20.301038sec
16384    10.209447sec

 size    Processing time
 2048    77.334858sec
 4096    40.356672sec
 8192    27.636047sec
16384    21.505170sec

Version-Release number of selected component (if applicable):

How reproducible:
  Every Time

Steps to Reproduce:
1.Compile the attached test program as follows:
   $ cc -o write write.c
2.Run the program specifying a write size.
  (write.c creates 10M bytes of file.)
   $ ./write 2048
   $ ./write 4096
   $ ./write 8192
   $ ./write 16384

Actual results:
The synchronous write() system call is much slower.

Expected results:
The synchronous write() system call of RHLE3.0 has (at least) the same 
performance as RHEL2.1.

Additional info:
Comment 1 Fuchi Hideshi 2003-09-18 05:53:42 EDT
Created attachment 94578 [details]

from fujitsu.
Comment 2 Bill Nottingham 2003-09-18 11:44:11 EDT
*** Bug 104634 has been marked as a duplicate of this bug. ***
Comment 3 Bill Nottingham 2003-09-18 11:46:03 EDT
*** Bug 104636 has been marked as a duplicate of this bug. ***
Comment 4 Jeffrey Moyer 2003-10-13 15:05:27 EDT
The performance issue at hand was introduced with
linux-2.4.21-scsi-affine.patch.  Essentially, there is an unconditional
unplug_device at the end of __make_request in ll_rw_block.c.  This prevents
write aggregation on a non-busy system doing sequential writes through a filesystem.

At this time it is unclear whether this "bug" is triggered by real-world system
Comment 6 Doug Ledford 2004-01-22 11:21:10 EST
*** Bug 113171 has been marked as a duplicate of this bug. ***
Comment 8 Tom Coughlan 2004-01-23 10:13:54 EST
*** Bug 109618 has been marked as a duplicate of this bug. ***
Comment 9 Doug Ledford 2004-01-26 11:41:37 EST
*** Bug 114052 has been marked as a duplicate of this bug. ***
Comment 10 Doug Ledford 2004-01-26 11:55:03 EST
*** Bug 114135 has been marked as a duplicate of this bug. ***
Comment 11 Jesse Keating 2004-01-26 12:12:35 EST
It is very triggered by real world loads.  A customer of ours has some
systems set up for database operations.  An operation with the RHEL2.1
system will take 50% less time than the opteraion with RHEL3.  Of
course, this percentage is exponential.  As the operation gets larger,
the time split between 2.1 and 3 gets larger as well.  This is a
pretty critical bug that needs to be addressed.
Comment 12 Doug Ledford 2004-01-26 18:48:30 EST
I've built a new set of test rpms for this issue.  The test rpms can
be found at http://people.redhat.com/dledford/.qohsdf/

If people will test this and let me know if the performance issues go
away I would greatly appreciate it (please note that these files are
only available for the next 48 hours, after that they will be removed
Comment 13 Jesse Keating 2004-01-26 19:18:34 EST
Initial tests w/ tiobench show significant improvement in the
Sequential and random read rates.

dledford kernel:
         File   Block  Num  Seq Read    Rand Read   Seq Write  Rand Write
  Dir    Size   Size   Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%)
------- ------ ------- --- ----------- ----------- ----------- -----------
   .     1792   4096    1  705.3 99.9% 686.8 87.9% 37.62 32.5% 2.439 0.78%
   .     1792   4096    2  847.2 139.% 857.3 109.% 38.03 47.5% 2.382 0.99%
   .     1792   4096    4  979.4 199.% 1006. 171.% 38.16 69.3% 2.343 1.74%
   .     1792   4096    8  1056. 235.% 1059. 203.% 26.28 54.6% 2.286 2.01%

RHEL3 Q3 update kernel:
         File   Block  Num  Seq Read    Rand Read   Seq Write  Rand Write
  Dir    Size   Size   Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%)
------- ------ ------- --- ----------- ----------- ----------- -----------
   .     1792   4096    1  357.5 56.4% 70.83 9.06% 38.68 33.0% 2.144 0.68%
   .     1792   4096    2  436.0 75.0% 82.55 13.2% 37.48 45.8% 2.191 1.05%
   .     1792   4096    4  561.1 119.% 120.4 23.1% 36.94 64.2% 2.176 1.76%
   .     1792   4096    8  658.1 152.% 155.6 32.3% 25.53 51.4% 2.216 2.09%
Comment 14 Jesse Keating 2004-01-27 16:17:49 EST
As a friend kindly pointed out, my file size wasn't nearly big enough.
 Here are the tests again, 10gig file:

dledford kernel:
         File   Block  Num  Seq Read    Rand Read   Seq Write  Rand Write
  Dir    Size   Size   Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%)
------- ------ ------- --- ----------- ----------- ----------- -----------
   .    10240   4096    1  57.94 19.5% 0.926 0.71% 36.45 33.0% 1.995 1.02%
   .    10240   4096    2  28.34 13.9% 1.029 0.82% 36.14 44.3% 1.983 1.26%
   .    10240   4096    4  22.36 10.7% 1.159 0.96% 35.81 58.4% 1.879 1.84%
   .    10240   4096    8  19.54 9.19% 1.269 1.19% 35.59 69.0% 1.837 2.11%

RHEL3 Q3 Update kernel:
         File   Block  Num  Seq Read    Rand Read   Seq Write  Rand Write
  Dir    Size   Size   Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%)
------- ------ ------- --- ----------- ----------- ----------- -----------
   .    10240   4096    1  21.04 14.2% 0.653 1.08% 36.86 32.6% 1.740 0.89%
   .    10240   4096    2  18.11 12.1% 0.807 1.39% 36.29 43.7% 1.795 1.20%
   .    10240   4096    4  16.81 9.98% 0.911 1.53% 35.77 59.6% 1.763 1.76%
   .    10240   4096    8  16.03 8.88% 1.004 1.63% 34.30 67.8% 1.772 2.04%

The results aren't quite "shocking", but there is still decent speed
improvement.  What are the chances of seeing this pushed out into
production updates?
Comment 15 L3support 2004-01-28 01:27:13 EST
performance problem was solved as a result of the test in
The test used the program of Bugzilla ID:#104633.
The following is the summary of the result.

[RHEL v2.1]
file size:10485760

block size    Processing time
      2048    77.897604sec
      4096    40.514986sec
      8192    20.426124sec
     16384    10.270638sec
     32768     5.371835sec
     65536     2.794967sec
    131072     1.556833sec

[RHEL v3(kernel-smp-2.4.21-9.EL.noaffine2)]
file size:10485760

block size    Processing time
      2048    77.532978sec
      4096    40.567836sec
      8192    20.330292sec
     16384     10.42086sec
     32768     5.347417sec
     65536     2.793838sec
    131072     1.543779sec
Comment 16 Mahesh Kunjal 2004-01-28 14:24:59 EST
Comments from Lans Carstensen of Dreamworks - 
Great news, this worked just fine for us.  Good work!
Now we just need this in a formal upcoming release.  Will this make U2?

Comment 17 Doug Ledford 2004-01-28 15:11:47 EST
Discussion are going on internally about what to do on this issue. 
Removing the scsi-affine patch solved the problem, but will create
other problems under different workloads.  Whether to just yank the
patch or to try and write a new patch that solves the performance
problems under all workloads is the discussion.

Note: Since this test kernel has solved people's problems, for now it
is *not* being removed from the people.redhat.com web site.  It will
remain available until an official update kernel with the problem
solved is released.
Comment 18 Wendy Cheng 2004-02-05 13:17:02 EST
Two ia64 customers would like to try out the patch too. Could you add
it ?  
Comment 20 Doug Ledford 2004-02-09 09:46:41 EST
A complete set of kernel RPMs for all arches has been put in place of
the original set of RPMs on my web site.
Comment 21 Christopher McCrory 2004-02-18 10:30:16 EST
Will this fix be included in the RHEL3 errata for CAN-2004-0075


Comment 22 Doug Ledford 2004-02-18 16:44:10 EST
*** Bug 115273 has been marked as a duplicate of this bug. ***
Comment 27 Heikki Simperi 2004-03-16 16:32:49 EST
I am running Mysql database server on Intel SRCS14L SATA RAID 
controller (gdth module, btw there is typo in certified hardware 
list) and this update helped totally.

Before, when The system was backupping, all system´s resource were in 
IOWAIT state. Now, I can take any kind of backup, and service is 
working fine.

I am wondering that why this update is not published in RHN, because 
now other kernel updates could not be installed by up2date.
Comment 28 Doug Ledford 2004-03-18 15:33:04 EST
It's slated for the next update kernel (which is different than an
errata update).
Comment 29 Chris Wilkinson 2004-03-22 14:51:10 EST
Might I inquire as to when we'll see the next kernel update in the
mainstream production product?

I've made similar observations to the problems noted in previous
comments. Our RH 7.3 server is at 2.4.20-20.7smp and the new RHEL 3 AS
system is at 2.4.21-9.0.1.ELsmp. Both have largely identical hardware
(2 x 2.8GHz Xeon, 1 GB memory, Adaptec 39160, U160 disks).

The test used the program of Bugzilla ID:#104633.
File size: 10485760. Test iterations gte 3.  

                 RHEL 3 AS     RH 7.3
block size       time (sec)    time (sec)
      2048       5.32139       2.62641
      4096       4.95804       1.00350
      8192       2.84885       1.73938
     16384       1.75112       0.51394
     32768       1.18536       0.32445
     65536       0.91370       0.23469
    131072       0.77962       0.19020

Any information on the ETA would be appreciated. I would like to solve
the throughput issue before the new RHEL system is put into service.
Comment 30 Michele Baldessari 2004-03-24 05:58:29 EST
I'll just add a "me too" comment. This bug bites us too on three 
Poweredge 2550 with kernel 2.4.21-9.0.1EL, making different apps that
do lots of write's() slow down considerably.
Looking forward to the update ;)
Comment 31 Tom O'Brien 2004-03-25 11:44:51 EST
The problem also affects writing small files. Testing an SMTP server 
accepting 10K messages (avg. size 20K, out of test set of 50 email, 
all but 2 are 3-5K in size), showed the following results:

Version    Kernel                 Time to accept 10k messages
RH 7.3:    kernel-2.4.20-28.7     470 sec
RHEL 3.0ES kernel-2.4.21-4.EL     504 sec
"          kernel-2.4.21-9.EL     504 sec
"          kernel-2.4.21-9.01.EL  500 sec
"    kernel-2.4.21-9.EL.noaffine2 430 sec
Comment 32 Terje Bless 2004-03-29 09:00:20 EST
Doug: cf. Comment #17, any chance you could build "noaffine" RPMs of the current update 
kernel (2.4.21-9.0.1.EL)? Or detail briefly how to do the rebuild (just removing the patch 
from the spec file causes the build to fail on a RHEL 3.0 AS box)?

This issue appears to be the culprit -- the noaffine2 kernels fixed it -- in absolutely killing 
our read performance towards a Dell|EMC CX500 (SAN); but unfortunately I can't use the 
2.4.21-9.EL.noaffine2 kernel since the EMC software (binary kernel modules *sigh*) won't 
work with anything but the 2.4.21-9.0.1.EL kernel.

With a looming deadline and the threat of having to revert this project to use `doze 
instead, (the boss is getting skittish) you might say I'm rather anxious to find a 
workaround fairly soon. IOW I'd be happy to do any testing (or providing tiobench results 
or whatever) you would find helpful. :-|
Comment 33 Joe Goyette 2004-03-29 14:34:37 EST
Count me in too as a member of the "I need a fix for this now" list. 
Management is ready to toss the whole Red Hat project altogether 
since our NFS performance is so poor. 
Comment 34 Eric Swenson 2004-03-29 19:42:03 EST
Count us in as well.  Write performance on our external SCSI-to-IDE
array with 12 7200rpm spindles went from 55-65Megabytes/second under
Redhat 8.0 down to 14-24Megabytes/second under Redhat 3.0 ES.  I'm
having a hard time trying to convince the boss that we spend over a
thousand dollars on this Redhat Enterprise license for "increased
performance and stability", when in fact we've experienced just the
opposite.  We'd be happy to test a patched kernel as well.
Comment 35 Doug Ledford 2004-03-30 15:41:40 EST
The kernel that includes this fix is already available in the Beta
channel of RHN.  So, for instance, if you go into your RHN account,
select the machine, select channels, go into the Beta channel, then go
to the third page of k packages, you should see the
kernel-2.4.21-12.EL packages.  Those include this fix.  You should be
able to schedule them for installation then via RHN.
Comment 36 Terje Bless 2004-03-30 23:01:13 EST
Initial -- and I emphasise *initial :-) -- testing with the 2.4.21-12.EL kernel suggests a 
tenfold improvement in Sequential Read rates in a single-thread test.

I'm currently spinning through my test matrix (with tiobench) and will post the results here 
when done. Thanks Doug!
Comment 37 Need Real Name 2004-03-31 03:03:35 EST
 The 2.4.21-12.EL kernel (similarly to the noaffine kernel) yields 
about 45-50% (23-25 MB/s) of the read performance I used to have with 
RHEL 2.1 AS/2.4.9-e.27smp (50 MB/s) on a set of Dell PE1750s 
(PERC4/Di; RAID1; Maxtor Atlas 36G 10K U320 disks). This is with the 
default max-readahead=31 and min-readahead=3 settings.

 By using the QU1 production kernels (2.4.21-9 or 9.0.1) but setting 
the max-readahead to something like 2048 for instance gives ~70% of 
the original RHEL2.1 read rates. Without this, with the default 
values, the read performance is a mere ~10-12% (5-6 MB/s).

 The combination of the two (i.e. noaffine/beta kernel plus 
increasing max-readahead) brings me to up to ~60% of the RHEL 2.1 

 On another set of PE1750s (same ROMB and RAID setting but Seagate 
disks), the read performance with the RHEL3 QU1 kernels and the 
default settings is more or less acceptable (~30 MB/s) but tweaking 
max-readahead brings it up to 40MB/s. On RHEL 2.1 systems, increasing 
the defult max-readahead has no effect or has an adverse affect on 
the generally good read performance.

 Any chance to see the same (at least) read performance as with RHEL 
Comment 38 Doug Ledford 2004-03-31 07:15:32 EST
I suspect the answer to that is no.  There is a huge amount of
difference between the AS2.1 and RHEL3 virtual memory subsystems.  I
suspect that the difference in read rates is that the RHEL3 VM is
slower at allocating pages for reading and therefore slowing down the
read rate, but better at swap related activities.
Comment 39 Chris Wilkinson 2004-03-31 15:21:35 EST
Just FYI, these are my initial test results on the new kernel.

Running the original test code from Bugzilla ID:#104633 gives the
following comparison:

                2.4.21-9.0.1.ELsmp    2.4.21-12.EL.smp      %-diff
      2048      5.2999 sec            2.5625 sec            210 %
      4096      4.9348 sec            1.5760 sec	    313 %
      8192      2.8311 sec            0.8484 sec            333 %
     16384      1.7320 sec            0.4826 sec            358 %
     32768      1.1782 sec            0.3053 sec            385 %
     65536      0.9138 sec            0.2164 sec            422 %
    131072      0.7780 sec            0.1728 sec            450 %

We main deal with large data files, which are continuously written and
rewritten directly over NFS. This means a more accurate indicator for
us are test results from programs like bonnie++ v1.03. Here they are,
filesize is 2G:

                 2.4.21-9.0.1.ELsmp    2.4.21-12.EL.smp      %-diff
Seq write:       57940 K/sec           54174 K/sec            94 %
Seq rewrite:     13321 K/sec           20372 K/sec           153 %
Seq read:        19975 K/sec           47501 K/sec           238 %
Random Seeks:    237.0 seeks/sec       370.35 seeks/sec      156 %

Obviously, as is visible in both tests, the 2.4.21-12.EL.smp kernel
grants very improved throughput results, placing us back into the
"acceptible" range across the board for our implimentation.

Thanks, Doug, for getting this much needed fix to us! It is much
Comment 40 Doug Ledford 2004-04-22 16:47:48 EDT
*** Bug 102194 has been marked as a duplicate of this bug. ***
Comment 41 john stultz 2004-05-03 18:05:44 EDT
Any chance the patch that resolved this issue could be attached to the
Comment 42 Doug Ledford 2004-05-06 19:35:07 EDT
It wasn't a simple patch.  It was drop the affine queuing patch from
the spec file, remove the last patch hunk from the end of the affine
queuing patch and tack it onto the selected-acbits patch, then rebuild.
Comment 43 Bastien Nocera 2004-05-07 06:35:06 EDT
Note that a kernel with that patch is already available in the beta
channel for RHEL 3.
Comment 44 John Flanagan 2004-05-11 21:07:36 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

Comment 45 Ernie Petrides 2004-12-21 18:05:33 EST
*** Bug 114054 has been marked as a duplicate of this bug. ***