Bug 115273 - bad disk I/O performance with the 2.4.21-4.ELsmp kernel
bad disk I/O performance with the 2.4.21-4.ELsmp kernel
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Doug Ledford
Depends On:
  Show dependency treegraph
Reported: 2004-02-09 17:30 EST by Suman Puthana
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-05-11 21:08:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Suman Puthana 2004-02-09 17:30:41 EST
Description of problem:

We have an application in which there are multiple processes writing 
to disk simultaneously, each process writing anywhere from 16 - 64 K 
Bytes at a time to its own file and sleeping for about 50 ms 
depending on how long the write() call takes to finish. It is 
important that the write()finishes within 50 ms for this application. 
The file's are opened with the open() call with the standard O_RDWR 
and O_CREAT flags.

On a system with the 2.4.18 kernel ( RedHat 8.0 distribution ), 25 of 
these processes take up about 15% of CPU on a dual Xeon Pentium 4 2.4 
GHz processor with 1 MB of memory for a total throughput of about 7 
MBps which is very good.  The write() calls typically take less than 
5 ms to complete and we sleep for the remaining 45 ms or so.

But on the same system when we installed the 2.4.21 kernel ( RedHat 
Enterprise Linux 3 distribution ), these 25 processes take anywhere 
between 40 - 70 % of the CPU depending on how  much CPU the iowait is 
taking up. The iowait part of it varies all the way from 10 - 60 %. 
(The iowait CPU is within 1% on the 2.4.18 kernel.).  What is even 
worse - sometimes the write() calls take anywhere between 1 - 2 
seconds   (yes seconds!) to return, which degrades the performance of 
our application server pretty badly. 

We have observed this problem with all the popular journal’ed file 
systems ( XFS, JFS, Reiserfs). The above numbers are froma reiserfs 
filesystem. The people at Reiserfs have confirmed that the problem 
exists on the 2.4.21 kernel. This problem is most probably happening 
due to bad I/O scheduling in the RHEL 3 kernel.

Something else of interest here is that when the iowait part of the 
CPUincreases, iostat shows a significantly  higher throughput than 
what we are actually trying to write to disk. (When sampled over 30 
seconds, when we are trying to write 7 MBps, iostat actually shows 11-
12 MBps when iowait goes upto 30-40 %). It's almost like there's some 
internal copies going on.

On a large EMC clariion CX400) we could write about 25 MBps with the 
2.4.18 kernel before we start seeing significant loss of data ( which 
means more and more writes take longer than 50 ms), but with the ES 
3.0 kernel, we could write only  about 6-7 MBps before the writes 
start taking too long. An interesting thing here is that by enabling 
caching on the EMC systems we could write about 15-17 MBps before we 
start losing data again because of the write performance degrading.

Version-Release number of selected component (if applicable):2.4.21-

How reproducible: Every Time.

Steps to Reproduce:

1.Run about 10 instances of the program obtained by compiling the 
following code on a standard 7200 rpm disk. The program writes to a 
file on disk taking the filename, frequency(eg. 20 for 50 ms) and the 
write size (eg. 64000 Bytes) as input parameters. All writes taking 
more than 50 ms are logged into /tmp/error.log.


#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>

#define LOG_FILENAME "/tmp/error.log"
#define MAX_FILE_NAME	32
#define MILLISEC_DIFF(new, old, diff)   
{                               \
        (diff) = (((new).tv_sec  - (old).tv_sec)  * 1000L ) 
+           \
                 (((new).tv_usec - (old).tv_usec) / 
1000L );            \

int main(int argc, char* argv[]) {

	char* data, log_str[1024], filename[MAX_FILE_NAME];
	int frequency, chunk_len, flags, elapsed, interval, 
		fd = -1, fd2 = -1; 
	struct timeval t1, t2;
	mode_t          mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP 

	if(argc != 4) {
		printf("usage: test_writes filename frequency(eg. 20 
for 50 ms) write_size(Bytes)\n");
	bzero(filename, MAX_FILE_NAME);
	strncpy(filename, argv[1], MAX_FILE_NAME -1); 
	frequency = atoi(argv[2]);
	chunk_len = atoi(argv[3]);

	printf("filename <%s> frequency <%d> chunk_len <%d>\n", 
filename, frequency, chunk_len);

	flags = O_CREAT | O_WRONLY | O_TRUNC;
	fd2 = open(LOG_FILENAME, O_WRONLY | O_CREAT | O_APPEND, mode);
	fd = open(filename, flags);

	if(fd < 0) {
		perror("Unable to open the input file");
	if(fd2 < 0) {
		perror("Unable to open the log file");

	data = (char*) malloc(sizeof(char)*chunk_len);
	if(data == NULL) {
		perror("Unable to allocate the input buffer");

	interval = (int)(((double)1000/frequency));
	while(1) {
		gettimeofday(&t1, NULL);
		write(fd, data, chunk_len);
		gettimeofday(&t2, NULL);
		MILLISEC_DIFF(t2, t1, elapsed);
		if(elapsed >= interval)  {
			sprintf(log_str,"<%ld><%ld> write to file <%
s><%d>\n", t1.tv_sec, t1.tv_usec/1000L, filename, elapsed);
			write(fd2, log_str, strlen(log_str));
		sleep_time = (interval - elapsed)*1000;
		if(sleep_time > 0) 	


Additional info:


./ver_linux output:

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux petey 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 
i686 i386

Gnu C                  3.2.3
Gnu make               3.79.1
util-linux             2.11y
mount                  2.11y
modutils               2.4.25
e2fsprogs              1.32
jfsutils               1.1.2
reiserfsprogs          3.6.4
pcmcia-cs              3.1.31
PPP                    2.4.1
isdn4k-utils           3.1pre4
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 2.0.13
Net-tools              1.60
Kbd                    1.08
Sh-utils               4.5.3
Modules Loaded         nfs lockd sunrpc ide-cd cdrom sg parport_pc lp
parport autofs e1000 floppy microcode keybdev mousedev hid input usb-
Comment 1 Arjan van de Ven 2004-02-09 17:33:07 EST
fwiw iowait time is actually cpu idle time
Comment 2 Suman Puthana 2004-02-09 17:37:50 EST
Yes Arjan, you had explained that to me in the past and thank you for 
that. I would really not have a problem if all I was seeing was the 
iowait cpu. But we can clearly see a massive degradation in 
performance when compared with 2.4.18. The reason I gave the iowait 
information, was may be it  could give a clue as to what might have 
been going on with the actual I/O. And that is the only obvious thing 
that is different from 2.4.18.
Comment 4 Doug Ledford 2004-02-18 16:44:05 EST

*** This bug has been marked as a duplicate of 104633 ***
Comment 5 John Flanagan 2004-05-11 21:08:32 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.