204988 – Disk I/O Slows During Heavy Disk Usage

Bug 204988 - Disk I/O Slows During Heavy Disk Usage

Summary: Disk I/O Slows During Heavy Disk Usage

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	204989 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-09-01 21:12 UTC by Brendan Illingworth
Modified:	2012-06-20 16:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-20 16:19:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
shows incremental and total transfer rate of large files (3.26 KB, text/plain) 2006-09-01 21:12 UTC, Brendan Illingworth	no flags	Details
top vmstat disk_test output (6.21 KB, text/plain) 2006-09-06 22:28 UTC, Brendan Illingworth	no flags	Details
100GB version (3.26 KB, text/plain) 2006-09-06 22:29 UTC, Brendan Illingworth	no flags	Details
50GB version (3.26 KB, text/plain) 2006-09-06 22:30 UTC, Brendan Illingworth	no flags	Details
Brendan Illingworth - Profile Dump (4.74 KB, text/plain) 2006-09-07 19:15 UTC, Brendan Illingworth	no flags	Details
alt+sysrq+p output while 5MBps state (15.41 KB, text/plain) 2006-09-08 22:16 UTC, Brendan Illingworth	no flags	Details
Profile Info (218.69 KB, text/plain) 2006-09-08 22:19 UTC, Brendan Illingworth	no flags	Details
View All

Description Brendan Illingworth 2006-09-01 21:12:28 UTC

Description of problem:

Disk I/O slows dramatically after continual heavy disk usage.  When 
transferring a large file (>3GB) Linux appears to introduce a huge amount of 
CPU overhead after transferring part of the file and slows the disk I/O to 
about one fourth of the prior transfer rate.  On some systems the a 100GB file 
transfers at full speed for about 80GB (a 160GB disk) and then the rest of the 
file is transfered at approximately one quarter of the original rate.  On 
another system (an 80GB disk) a 20GB file is transfered at full speed for 
about 3GB and then the ENTIRE remainder is transferred at the much lower rate 
(in this case about one eighth the orginal rate).  The application uses 
DIRECT_IO and this occurs in both SMP and NON-SMP.  It has also been reported 
by other users that a similar problem occurs when continually writing files 
that are only 100-200MB.  Occurs on RAIDs and single disks.


How reproducible:

The best way to replicate the slowing disk I/O issue is to compile the 
attached program using " gcc -o disk_test disk_test.c -D_GNU_SOURCE ".  The 
program transfers 512KB blocks and every 100 blocks (50MB) outputs a status 
message showing the time (in milli-seconds) that the last 100 blocks took to 
transfer.  If you do not see an increase in these times edit disk_test.c 
(attached) and change the line " block_count = 5000; " to a much higher number 
(80% of the disk space available is likely sufficient).

Steps to Reproduce:
1. download disk_test.c
2. gcc -o disk_test disk_test.c -D_GNU_SOURCE 
3. run disk_test; edit the number of blocks and recompile if the issue is not 
seen
  
Actual results:

On a single SATA drive I will see a sustained transfer rate of about 33MBps 
for up to 80GB (disk size is 160GB, disk cache is 8MB, system has 3GB RAM, 
using DIRECT_IO, ext3, default scheduler) when transferring a 100GB file.  The 
remaining 20GB is transferred at an unchanging rate of about 8MBps.

Comment 1 Brendan Illingworth 2006-09-01 21:12:28 UTC

Created attachment 135412 [details]
shows incremental and total transfer rate of large files

Comment 3 Larry Woodman 2006-09-05 16:10:22 UTC

I am having a hard time reproducing this problem as described.  However, I do
see some variation in the transfer rate but it does not drop like a
step-function as described.  When the application starts running I am seeing
transer rates of ~45MB/second yet by the time the application finishes writing
~60GB that rate drops to ~30MB/second.  This gradual drop over time is probably
a disk geometry issue.  Ths outer most disk cylinders contain more data then the
inner cylinders so the tranfer rate drops accordingly with a constant rotational
speed.  Also, I do see an occasional drop from ~40MB/second down to ~10MB/second
when kjournald runs but that is periodic throughout the run and has very little
consequence in the speed of the application as a whole.  Finally, I am guessing
that the reporter's system is running into a fragmentation issue.  When the disk
becomes very full the last chunks of data written to disk are likely much more
fradmented than the early ones.  This can probably cause the serious performance
step-function degradation that you are seeing.

Larry Woodman

Comment 4 Larry Woodman 2006-09-05 17:32:38 UTC

OK, here is the outputs of "vmstat 60" durring the entire run of the application
creating a 60GB file.  As you can see the system started writing at ~40MB/sec
initially and degraded down to ~26MB/sec.  The reason for this was determined to
be related to the disk geometry.  Since there is a difference in the amount of 
data on different cylinders onthe disk and it rotates at a constant rate, the
actual tranfer rate varies accordingly.

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 621456 181356  76432    0    0     0 40211 1179   181  0  4 50 46
 1  0      0 616336 186344  76476    0    0     0 39991 1180   182  0  4 50 46
 0  1      0 611152 191432  76488    0    0     0 39996 1180   182  0  4 50 46
 0  1      0 606160 196308  76440    0    0     0 39062 1173   177  0  4 50 46
 0  1      0 601424 200924  76448    0    0     0 36190 1168   168  0  3 50 47
 0  1      0 596624 205484  76444    0    0     0 34969 1161   162  0  3 50 47
 0  1      0 592144 209960  76388    0    0     0 33803 1157   158  0  3 50 47
 0  1      0 587728 214236  76464    0    0     0 33852 1153   157  0  3 50 47
 0  1      0 583440 218356  76492    0    0     0 31791 1148   149  0  3 50 47
 0  1      0 579088 222324  76468    0    0     0 30752 1143   145  0  3 50 47
 0  1      0 574608 226212  76456    0    0     0 29131 1137   139  0  3 50 47
 0  1      0 570384 229856  76484    0    0     0 28145 1130   135  0  3 49 48
 0  1      0 566352 233352  76456    0    0     0 26227 1125   127  0  3 50 48
 0  1      0 564304 235084  76424    0    0     0 26419 1123   127  0  3 50 48
 0  0      0 564184 235264  76448    0    0     0   933 1040    26  0  0 93  7
--------------------------------------------------------------------------------

Also, you can see the itteration time measured by the application correlates
with MB/sec reported by vmstat.

--------------------------------------------------------------------------------
[root@lovok tmp]# ./disk_test
cnt=100 +1133ms
cnt=200 +1186ms
cnt=300 +1130ms
cnt=400 +1183ms
cnt=500 +1172ms
...
...
...
cnt=106300 +1863ms
cnt=106400 +1902ms
cnt=106500 +1836ms
cnt=106600 +1964ms
cnt=106700 +1826ms
------------------------------------------------------------------------------

Since I am having trouble reproducing the "step-function" like performance
degradation reported in this Bug can I have Brendan Illingworth gather the same
statistics on his machine and post the results to this bug?

Please start with the output of the disk_test program as well as a "vmstat 60"
output while it is running.

Thanks Larry Woodman

Comment 5 Brendan Illingworth 2006-09-06 22:28:34 UTC

Created attachment 135702 [details]
top vmstat disk_test output

Comment 6 Brendan Illingworth 2006-09-06 22:29:16 UTC

Created attachment 135703 [details]
100GB version

Comment 7 Brendan Illingworth 2006-09-06 22:30:11 UTC

Created attachment 135704 [details]
50GB version

Comment 8 Brendan Illingworth 2006-09-06 22:32:32 UTC

The attached disk_test_data.txt file contains output obtained while running 
the disk_test_100000 code (50GB file on a disk with 65GB free; 77% usage).  In 
addition to the requested information contained in disk_test_data.txt I have 
attached two additional files (disk_test_100000.c and disk_test.c).  Both 
files have been modified to provide incremental output every 1000 blocks; 
additionally disk_test.c has been modified to create a 200,000 block (100GB) 
file.  If the experimenter still does not see the issue, increase the file 
size as much as the disk will allow.  I suspect you will see the issue at some 
file size, if not please post the results of running; df , vmstat 60, 
disk_test, df; as I would like to see that you are indeed creating a 
sufficiently large file (try to use 90% or more of the free space).

Comment 9 Larry Woodman 2006-09-07 02:53:59 UTC

 
OK, this should be debuggable.  As you can see the system time jumps up from 4%
to 83% when the disk IO rate drops from 33MB/s to 8MB/s.
--------------------------------------------------------------------------------
 0  1      0 5061676  59020 179640    0    0     1 32265 1166   363  2  4  0 94
 0  1      0 5058132  60944 179796    0    0     4 30077 1218   577  7  4  0 89
 0  1      0 5054204  62856 179840    0    0     1 30452 1218   458  2  4  0 94
 1  1      0 5047996  65540 179864    0    0    20 25045 1201   472  3 19  0 78
					<><><><><><><>
 1  1      0 5047076  66148 179968    0    0     0  8283 1110   197  4 82  0 14
 1  1      0 5046372  66724 179968    0    0     0  8476 1056   142  2 83  0 15
 1  1      0 5045732  67300 179968    0    0     0  8487 1053   141  2 82  0 16
 0  1      0 5045092  67880 179968    0    0     0  8517 1055   142  2 82  0 16
 1  1      0 5044388  68476 179968    0    0     0  8697 1056   143  2 84  0 15
--------------------------------------------------------------------------------

I still cant get the same behavior myself, can you get me several AltSysrq-W
outputs when this happens so I can see what the system is doing?

Comment 10 Brendan Illingworth 2006-09-07 16:31:52 UTC

It sounds like you are seeing a less severe version of what we are seeing.  
The indication that your transfer rate drops to 10MBps for some of the blocks 
is indeed very significant as this shows the disk throughput decreasing by 75% 
for at least 5 seconds (50MB block/10MBps), where the throughput goes to a 
crawl.  What activity is going on in the system at that point in time that 
would deprive the user program of 75% of the throughput?  

Additionally, the transfer rates shown in your disk_test output shows the 
throughput dropping from (50MB/~12sec) to (50MB/~18sec) which is a 33% 
decrease in throughput from the the begining to the end of the file.  While 
indeed the velocity decreases as the disk head slowly seeks inward toward the 
spindle, on most disks this is compensated by the fact that the bit density 
increases resulting in a nearly constant data rate. Also would it be possible 
run the test with several larger files (~80% of the disk), and post the same 
data as you did for your first results.

Comment 11 Larry Woodman 2006-09-07 16:50:28 UTC

I dont think I am seeing the same problem as you are.  When you hit this problem
the system time goes from 4% to 83% as your disk IO degradation occurs.  Mine
shows no such behavior.  Either several AltSysrq-W outputs or a kernel profile
are needed to figure out what is happening.  Can you provide them?

Thanks, Larry Woodman

Comment 12 Brendan Illingworth 2006-09-07 17:17:24 UTC

How large a file have you tried?
How large is the disk?
Have you tried this on more than one system?
Have you tried different disks?

The "system time" may not be highly correlated to the true cause of the 
probem, we should not focus on this as the only metric.
It sounds like you are seeing ~33% and more drop in performance, which is very 
significant.

I will post your requested information shortly.

Comment 13 Brendan Illingworth 2006-09-07 19:15:43 UTC

Created attachment 135802 [details]
Brendan Illingworth - Profile Dump

Comment 14 Larry Woodman 2006-09-07 21:27:40 UTC

Can you also get me several AltSysrq-W outputs so I can see the actual
call backs ?

Larry

Comment 15 Larry Woodman 2006-09-08 14:31:45 UTC

Brendan, are you sure that the kernel profile you posted is right?  When you are
getting 33MB/S your system is using only 4% system time and when it drops off to
5MB/S the system time spikes up to 83% yet the profile shows ~80% system time in
both.  Can you try it again and get me more data as well as AltSysrq-W data as
well?  Thanks for your help.

Larry Woodman

Comment 16 Brendan Illingworth 2006-09-08 15:40:20 UTC

The posted profiles were obtained before, durring and after running the 
application.  I am unfamiliar with the output format; how do you deduce that 
the system time is ~80% from the profiles?  By AltSysrg-W do you mean a screen 
shot saved to file?  (If not please be more specific).  As soon as you have 
time, please answer the questions I previously posted as this might help us 
get on the same page.

Comment 17 Larry Woodman 2006-09-08 16:02:45 UTC

The profile showed almost no differences when you were getting 33MB/S or 5MB/S
yet the vstat output showed almost all time was being spent in the kernel when
the performance dropped to 5MB/S.  The profiles *should* be very different
especially WRT to idle times but they are not.

Yes, please get the system to the 5MB/S and 83% system time state and get me
several AltSysrq-W outputs when that happens.  All of the data will b written to
/var/log/messages.

Thanks, Larry Woodman

Comment 18 Larry Woodman 2006-09-08 17:47:08 UTC

Please just run this in a shell script:

#!/bin/sh
while /bin/true; do
echo;date
/usr/sbin/readprofile -v | sort -nr +2 | head -15
/usr/sbin/readprofile -r
sleep 5
done

Comment 19 Larry Woodman 2006-09-08 19:28:12 UTC

One last question, are there any error messages being written to the console or
/var/log/messages when the slowdown occurs?

Larry

Comment 20 Brendan Illingworth 2006-09-08 22:16:14 UTC

Created attachment 135886 [details]
alt+sysrq+p output while 5MBps state

Comment 21 Brendan Illingworth 2006-09-08 22:19:05 UTC

Created attachment 135887 [details]
Profile Info 

This was perfect timing, the first block of output in this file is while we are
crusing at 33MBps the rest are when the system crawls to about 5MBps.

Comment 22 Brendan Illingworth 2006-09-08 22:22:38 UTC

VMSTAT AT SLOWDOWN BOUNDARY
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1      0 4786044 225396 248496    0    0    15  6977 1051   196  1 15 65 19
 1  1      0 4785268 225904 248532    0    0     0  3999 1047   110  1 91  0  8

MAKEPROFILE SCRIPT
cat /dev/null > readprofile.txt
while /bin/true;
do
echo; date >> readprofile.txt
/usr/sbin/readprofile -v | sort -nr +2 | head -15 >> readprofile.txt
/usr/sbin/readprofile -r >> readprofile.txt
sleep 10
done

Let me know what you make of this.

-Brendan

Comment 23 Larry Woodman 2006-09-11 18:49:02 UTC

Can you attach the /var/log/messages files after the slowdown occurs so I can
look for any error messages?

Thanks, Larry

Comment 24 Brendan Illingworth 2006-09-11 22:06:08 UTC

I have run the application many times and never see any output to 
var/log/messages.  Have you had any success recreating the problem on your end 
using significantly larger files?

Comment 25 Larry Woodman 2006-09-12 14:30:48 UTC

As of right now we have not been able to reproduce the behavior you are seeing
where there is a distinct 400% dropoff in diskIO performance.  We are only seing
a gradual degratation like the one I showed in comment #4.  We do however see a
serious performance degradation with random writes and various block sizes. 
That is BZ 203634.

Larry Woodman

Comment 27 Larry Woodman 2006-09-13 17:43:47 UTC

FYI, we added a 250GB SATA drive formated as a single partition and used
disk_test to fill it totally.  While I do see the performance degradation that I
talked about in comments #3 and #4 I dont see the radical IO performance
drop-off and the associated system time spike.

Still trying but so far we have have not been able to reproduce this problem
internally.

Larry

Comment 28 Harald Hoyer 2006-09-25 14:09:53 UTC

*** Bug 204989 has been marked as a duplicate of this bug. ***

Comment 29 Brendan Illingworth 2006-09-25 22:47:56 UTC

Larry, 

As mentioned previously it seems like you are reproducing the problem to some 
degree.  The %30 slowdown is significant and could be only a fraction of the %
400 slowdown that we see due to system differences.  Have you made any 
progress on identifing the cause of slowdown?  

Also, I am surprised that your team hasn't been able to create a setup that 
illicites the problem to a much higher degree (e.g. a %400 slowdown), as we 
see this performance drop on six of six different machines running RHEL WS4.

Please post the output of disk_test (with a 100GB and 200GB file) on your two 
systems with 250GB drives.  

This bug does not involve random writes or varying block sizes.  This 
performance degradation could limit the ablity to gain market share as it 
disqualfies RHEL for use in sustained data acquistion applications.  If we 
need to request allocation of more engineers on this specific bug perhaps we 
should attempt to escalate the severity of this bug.

One again, could you please run the 100GB-200GB version of disk_test on 
several machines and post the output.

Comment 30 John Shakshober 2006-09-27 09:50:47 UTC

Can we get a list of the complete system info.

Complete Server brand, cputype, memory, specific I/O controller?
I know your using SATA but what is the system/driver you are using.

I could not reproduce this on my ultra SCSI or Fiber luns.

Comment 31 Larry Woodman 2006-09-27 18:45:25 UTC

Brendan, sorry for the delay.  We have not been able to reproduce the behavior
you were seeing.  I do see a very smooth degradation from ~40MB/sec to ~25MB/sec
as we fill up a 250GB drive.  This was determined to be a disk geometry issue
simply bevause there is less data in the inner tracks than the outer tracks and
the disk rotates at a constant rate.

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 621456 181356  76432    0    0     0 40211 1179   181  0  4 50 46
 *****************************************************************************
 0  1      0 564304 235084  76424    0    0     0 26419 1123   127  0  3 50 48
------------------------------------------------------------------------------

Can you tell us exactly what system you are using so we can try to reproduce the
problem on that hardware?

Thanks, Larry Woodman

Comment 32 Jiri Pallich 2012-06-20 16:19:05 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.