Dear Redhat support person, We would like to report a problem regarding missing glibc buffering in Redhat 6.0. The problem has been analyzed by one of my colleagues here at Adobe. The full report is attached here: ===> begin I've attached both a performance analysis comparing Linux 2.0.X with 2.2.X and with NT 4.0, all on the same system and the program used to collect the data. The problem statement is: Contrary to the C language specification, glibc does not buffer file input when the input file is opened with standard/default options. The standard says that I/O is buffered by default, that the buffer size is given by BUFSIZ and must be at least 256 bytes. File buffer size may be altered by use of setbuf or setvbuf. This results is performance that is slower for 2.2.X when compared with 2.0.X. 2.0.X is faster than NT 4.0. 2.2.X is slower than NT 4.0. 2.2.5 NT 4.0 2.0.36 ======================== Time in seconds 52 6 5 16Mb file, read size 1 byte 63 <1 <1 24Kb file (NFS), read size 1 The performance degradation is more significant for small read operations and disasterous when the target of the read is an NFS mounted file. First, behaviour before the new libc. I used an old system, linked with the static library so that the performance could be tested on the new kernel, but more importantly, on the same hardware. Test - using static library compile on 2.0.36 running on 2.2.5-15smp Target file is /disks/flex/apd2/dowling/test/buf1 (NFS) Read Size 8192, Bytes read 24576, Time 0 Read Size 4096, Bytes read 24576, Time 0 Read Size 2048, Bytes read 24576, Time 0 Read Size 1024, Bytes read 24576, Time 0 Read Size 512, Bytes read 24576, Time 0 Read Size 256, Bytes read 24576, Time 0 Read Size 128, Bytes read 24576, Time 0 Read Size 64, Bytes read 24576, Time 1 Read Size 32, Bytes read 24576, Time 0 Read Size 16, Bytes read 24576, Time 0 Read Size 8, Bytes read 24576, Time 0 Read Size 4, Bytes read 24576, Time 0 Read Size 2, Bytes read 24576, Time 0 Read Size 1, Bytes read 24576, Time 0 ================= New libc - static to show that it isn't a static/shared problem Test - using static library compile on 2.2.5-15smp running on 2.2.5-15smp Target file is /disks/flex/apd2/dowling/test/buf1 (NFS) Read Size 8192, Bytes read 24576, Time 0 Read Size 4096, Bytes read 24576, Time 0 Read Size 2048, Bytes read 24576, Time 0 Read Size 1024, Bytes read 24576, Time 0 Read Size 512, Bytes read 24576, Time 0 Read Size 256, Bytes read 24576, Time 0 Read Size 128, Bytes read 24576, Time 0 Read Size 64, Bytes read 24576, Time 1 Read Size 32, Bytes read 24576, Time 2 Read Size 16, Bytes read 24576, Time 4 Read Size 8, Bytes read 24576, Time 7 Read Size 4, Bytes read 24576, Time 15 Read Size 2, Bytes read 24576, Time 32 Read Size 1, Bytes read 24576, Time 63 ================== New libc - shared - performance is essentially identical to static case - the expected result. Test - using shared library compile on 2.2.5-15smp running on 2.2.5-15smp Target file is /disks/flex/apd2/dowling/test/buf1 (NFS) Read Size 8192, Bytes read 24576, Time 1 Read Size 4096, Bytes read 24576, Time 0 Read Size 2048, Bytes read 24576, Time 0 Read Size 1024, Bytes read 24576, Time 0 Read Size 512, Bytes read 24576, Time 0 Read Size 256, Bytes read 24576, Time 0 Read Size 128, Bytes read 24576, Time 1 Read Size 64, Bytes read 24576, Time 0 Read Size 32, Bytes read 24576, Time 2 Read Size 16, Bytes read 24576, Time 4 Read Size 8, Bytes read 24576, Time 7 Read Size 4, Bytes read 24576, Time 15 Read Size 2, Bytes read 24576, Time 31 Read Size 1, Bytes read 24576, Time 61 ========================================================== ========================================================== Using local (/tmp) file with much larger file. This shows that the penalty imposed by the new libc is significant (about 10X) even for local files. ======= Old libc Test - using static library compile on 2.0.36 running on 2.2.5-15smp Target file is /tmp/buf1 Read Size 8192, Bytes read 16777216, Time 1 Read Size 4096, Bytes read 16777216, Time 0 Read Size 2048, Bytes read 16777216, Time 1 Read Size 1024, Bytes read 16777216, Time 0 Read Size 512, Bytes read 16777216, Time 0 Read Size 256, Bytes read 16777216, Time 0 Read Size 128, Bytes read 16777216, Time 1 Read Size 64, Bytes read 16777216, Time 0 Read Size 32, Bytes read 16777216, Time 1 Read Size 16, Bytes read 16777216, Time 0 Read Size 8, Bytes read 16777216, Time 1 Read Size 4, Bytes read 16777216, Time 2 Read Size 2, Bytes read 16777216, Time 3 Read Size 1, Bytes read 16777216, Time 5 ======== Test - using shared library compile on 2.2.5-15smp running on 2.2.5-15smp Target file is /tmp/buf1 Read Size 8192, Bytes read 16777216, Time 1 Read Size 4096, Bytes read 16777216, Time 0 Read Size 2048, Bytes read 16777216, Time 0 Read Size 1024, Bytes read 16777216, Time 0 Read Size 512, Bytes read 16777216, Time 1 Read Size 256, Bytes read 16777216, Time 0 Read Size 128, Bytes read 16777216, Time 1 Read Size 64, Bytes read 16777216, Time 1 Read Size 32, Bytes read 16777216, Time 2 Read Size 16, Bytes read 16777216, Time 4 Read Size 8, Bytes read 16777216, Time 7 Read Size 4, Bytes read 16777216, Time 13 Read Size 2, Bytes read 16777216, Time 26 Read Size 1, Bytes read 16777216, Time 52 ========================================================================= ========================================================================= Test run on NT 4.0 Server - this is the same hardware as above this machine is dual boot. ========================================================================= Test - using VC 5.0 target is /disks/flex/apd2/dowling/test/buf1 Read Size 512, Bytes read 24576, Time 0 Read Size 256, Bytes read 24576, Time 0 Read Size 128, Bytes read 24576, Time 0 Read Size 64, Bytes read 24576, Time 0 Read Size 32, Bytes read 24576, Time 0 Read Size 16, Bytes read 24576, Time 0 Read Size 8, Bytes read 24576, Time 0 Read Size 4, Bytes read 24576, Time 0 Read Size 2, Bytes read 24576, Time 0 Read Size 1, Bytes read 24576, Time 0 ============ Test - using VC 5.0 target is c:\buf1 Read Size 512, Bytes read 16777216, Time 2 Read Size 256, Bytes read 16777216, Time 1 Read Size 128, Bytes read 16777216, Time 1 Read Size 64, Bytes read 16777216, Time 1 Read Size 32, Bytes read 16777216, Time 1 Read Size 16, Bytes read 16777216, Time 2 Read Size 8, Bytes read 16777216, Time 1 Read Size 4, Bytes read 16777216, Time 3 Read Size 2, Bytes read 16777216, Time 3 Read Size 1, Bytes read 16777216, Time 6 Note that BUFSIZ for VC 5.0 is 512. this should actually be a disadvantage for NT. [ text/plain ] : #include <stdio.h> #include <stdlib.h> #include <time.h> char buf[BUFSIZ]; int main(int argc, char **argv){ FILE *in; int i, j, k, l, m, n; time_t begin, end; double difference; n = BUFSIZ; if(argc != 2){ fprintf(stderr,"Usage: buff_test tempfilename\n"); return 1; } in = fopen(argv[1],"r"); if(in != NULL){ fprintf(stderr,"tempfile: %s, already exists\n",argv[1]); return 1; } in = fopen(argv[1],"w"); if(in == NULL){ fprintf(stderr,"Unable to open file: %s, for write\n",argv[1]); return 1; } for(i=0;i<48;i++){ k = fwrite(buf,n,1,in); if(k != 1){ fprintf(stderr,"Error writing to: %s\n",argv[1]); return 1; } } fclose(in); for(;;){ in = fopen(argv[1],"r"); if(in == NULL) return 1; l = 0; begin = time(NULL); for(;;){ k = fread(buf,n,1,in); if(k != 1) break; l += n; } end = time(NULL); difference = end - begin; printf("Read Size %5d, Bytes read %9d, Time %8g\n", n,l,difference); if(n == 1) break; fclose(in); n /= 2; } remove(argv[1]); return 0; } -- Freddy Jensen, Sr. Computer Scientist, Adobe Systems Incorporated 345 Park Avenue, San Jose, CA 95110-2704, Phone 408 536-2869 / 536-6000 Email: jensen, URL: http://www.adobe.com --
For additional supporting evidence one can strace the test program. On a 6.1-based system here this shows that the program is actually issuing tiny read() syscalls; on a 5.1-based system the smallest the read() gets is 4096 bytes.
The test is completely bogus. libc5 (old libc) is using a buffer size of 1024, whereas glibc is using a buffer size of 8192. There is no compare between the amount of work that the kernel needs to do to read 1024 versus 8192 bytes at a time. Change your test program to use 1024 instead of BUFSIZ for read sizes, and immedately after the fopen() call do a setbuffer(in, buf, 1024) Secondly, the fread in glibc is a thread-safe function, whcih means that it has to go through a lot of stuff to ensure proper locking et all. Use the fread_unlocked is you don't need locking on the IO operations in this test. Once you do that you will see performance aproaching what you have obtained with the old-libc statically compiled binary. It will still be slower (about 25-30%), but that is a far cry from the ten fold times you have got as a result of your test. And considering the added functionality, thread safety and increased complexity of handling a syscall in newer kernels the performance penalty is not bad at all.
gafton completely misses the point. The current implementation of glibc is in violation of the ANSI C standard since it does not provide at least BUFSIZ buffering by default. Note that a "workaround" that requires adding code is not acceptable. The program that caused us to notice the problem was gcc! The assertion that "thread-safe" has to be that much slower is just "bogus" and lazy.
The point is that the ANSI C standard requires at least BUFSIZ buffering by default. A "workaround" does not resolve the issue at all. We discovered the problem by observing the behavior of gcc and cp! "Thread-safe" performance issues as an excuse is just that, an excuse. Output is buffered by default and if the workaround is used (setvbuf) performance is OK. Is gafton asserting: 1) buffered output is not thread-safe? 2) input with the use of setvbuf is not thread-safe? There are really quite a few problems with the new glibc, not the least of which is its unnecessary intrusions into variable name space, again in violation of the ANSI C standard.