Red Hat Bugzilla – Bug 115273
bad disk I/O performance with the 2.4.21-4.ELsmp kernel
Last modified: 2007-11-30 17:07:00 EST
Description of problem: We have an application in which there are multiple processes writing to disk simultaneously, each process writing anywhere from 16 - 64 K Bytes at a time to its own file and sleeping for about 50 ms depending on how long the write() call takes to finish. It is important that the write()finishes within 50 ms for this application. The file's are opened with the open() call with the standard O_RDWR and O_CREAT flags. On a system with the 2.4.18 kernel ( RedHat 8.0 distribution ), 25 of these processes take up about 15% of CPU on a dual Xeon Pentium 4 2.4 GHz processor with 1 MB of memory for a total throughput of about 7 MBps which is very good. The write() calls typically take less than 5 ms to complete and we sleep for the remaining 45 ms or so. But on the same system when we installed the 2.4.21 kernel ( RedHat Enterprise Linux 3 distribution ), these 25 processes take anywhere between 40 - 70 % of the CPU depending on how much CPU the iowait is taking up. The iowait part of it varies all the way from 10 - 60 %. (The iowait CPU is within 1% on the 2.4.18 kernel.). What is even worse - sometimes the write() calls take anywhere between 1 - 2 seconds (yes seconds!) to return, which degrades the performance of our application server pretty badly. We have observed this problem with all the popular journalâed file systems ( XFS, JFS, Reiserfs). The above numbers are froma reiserfs filesystem. The people at Reiserfs have confirmed that the problem exists on the 2.4.21 kernel. This problem is most probably happening due to bad I/O scheduling in the RHEL 3 kernel. Something else of interest here is that when the iowait part of the CPUincreases, iostat shows a significantly higher throughput than what we are actually trying to write to disk. (When sampled over 30 seconds, when we are trying to write 7 MBps, iostat actually shows 11- 12 MBps when iowait goes upto 30-40 %). It's almost like there's some internal copies going on. On a large EMC clariion CX400) we could write about 25 MBps with the 2.4.18 kernel before we start seeing significant loss of data ( which means more and more writes take longer than 50 ms), but with the ES 3.0 kernel, we could write only about 6-7 MBps before the writes start taking too long. An interesting thing here is that by enabling caching on the EMC systems we could write about 15-17 MBps before we start losing data again because of the write performance degrading. Version-Release number of selected component (if applicable):2.4.21- 4.ELsmp How reproducible: Every Time. Steps to Reproduce: 1.Run about 10 instances of the program obtained by compiling the following code on a standard 7200 rpm disk. The program writes to a file on disk taking the filename, frequency(eg. 20 for 50 ms) and the write size (eg. 64000 Bytes) as input parameters. All writes taking more than 50 ms are logged into /tmp/error.log. ***************************************************************** #include <stdio.h> #include <unistd.h> #include <errno.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/time.h> #define LOG_FILENAME "/tmp/error.log" #define MAX_FILE_NAME 32 #define MILLISEC_DIFF(new, old, diff) { \ (diff) = (((new).tv_sec - (old).tv_sec) * 1000L ) + \ (((new).tv_usec - (old).tv_usec) / 1000L ); \ } int main(int argc, char* argv[]) { char* data, log_str[1024], filename[MAX_FILE_NAME]; int frequency, chunk_len, flags, elapsed, interval, sleep_time, fd = -1, fd2 = -1; struct timeval t1, t2; mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; if(argc != 4) { printf("usage: test_writes filename frequency(eg. 20 for 50 ms) write_size(Bytes)\n"); _exit(1); } bzero(filename, MAX_FILE_NAME); strncpy(filename, argv[1], MAX_FILE_NAME -1); frequency = atoi(argv[2]); chunk_len = atoi(argv[3]); printf("filename <%s> frequency <%d> chunk_len <%d>\n", filename, frequency, chunk_len); flags = O_CREAT | O_WRONLY | O_TRUNC; fd2 = open(LOG_FILENAME, O_WRONLY | O_CREAT | O_APPEND, mode); fd = open(filename, flags); if(fd < 0) { perror("Unable to open the input file"); _exit(2); } if(fd2 < 0) { perror("Unable to open the log file"); _exit(2); } data = (char*) malloc(sizeof(char)*chunk_len); if(data == NULL) { perror("Unable to allocate the input buffer"); _exit(3); } interval = (int)(((double)1000/frequency)); while(1) { gettimeofday(&t1, NULL); write(fd, data, chunk_len); gettimeofday(&t2, NULL); MILLISEC_DIFF(t2, t1, elapsed); if(elapsed >= interval) { sprintf(log_str,"<%ld><%ld> write to file <% s><%d>\n", t1.tv_sec, t1.tv_usec/1000L, filename, elapsed); write(fd2, log_str, strlen(log_str)); } sleep_time = (interval - elapsed)*1000; if(sleep_time > 0) usleep(sleep_time); } } ********************************************************************* Additional info: ********************************************************************** ./ver_linux output: If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux petey 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux Gnu C 3.2.3 Gnu make 3.79.1 util-linux 2.11y mount 2.11y modutils 2.4.25 e2fsprogs 1.32 jfsutils 1.1.2 reiserfsprogs 3.6.4 pcmcia-cs 3.1.31 PPP 2.4.1 isdn4k-utils 3.1pre4 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 2.0.13 Net-tools 1.60 Kbd 1.08 Sh-utils 4.5.3 Modules Loaded nfs lockd sunrpc ide-cd cdrom sg parport_pc lp parport autofs e1000 floppy microcode keybdev mousedev hid input usb- uhci
fwiw iowait time is actually cpu idle time
Yes Arjan, you had explained that to me in the past and thank you for that. I would really not have a problem if all I was seeing was the iowait cpu. But we can clearly see a massive degradation in performance when compared with 2.4.18. The reason I gave the iowait information, was may be it could give a clue as to what might have been going on with the actual I/O. And that is the only obvious thing that is different from 2.4.18.
*** This bug has been marked as a duplicate of 104633 ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html