Description of problem: On RHEL 5, when the included program is compiled as a 64 bit binary and executed on a system equipped with Intel processors, performance is considerably worse than when the same program is compiled as a 32 bit binary and executed on the same machine. Execution times on 64 bit AMD hardware are virtually the same for the 64 and 32 bit binaries. How reproducible: Customer is noting a much larger impact on his specific hardware platform (roughly 4x as slow for 64 bit than 32 bit). I am able to reproduce a significant performance difference on hardware I have tested, but it is in the scope of 2x as slow. Steps to Reproduce: (I have two lab machines reserved to reproduce this, if you need the addresses to confirm) # gcc -O3 sorttest.c -lm -o sorttest64 # gcc -O3 sorttest.c -lm -m32 -o sorttest32 # time ./sorttest64 20 # time ./sorttest32 20 Compare resulting execution times on Intel and AMD hardware. Actual results: on Intel hardware: # time ./sorttest64 20 real 3m30.056s user 3m29.295s sys 0m0.724s # time ./sorttest32 20 real 1m30.943s user 1m30.164s sys 0m0.762s on AMD hardware: # time ./sorttest64 20 real 2m20.643s user 2m19.949s sys 0m0.644s # time ./sorttest32 20 real 2m0.964s user 2m0.070s sys 0m0.850s Since these are different boxes with different specs, the actual execution time differences from one box to the other are not important. Rather, the important thing to note is how much better the performance of the 32 bit code is compared to the 64 bit code on the Intel hardware, while the AMD hardware shows very little difference. Expected results: Not such a huge performance hit for the 64 bit code on Intel hardware. Additional info: Customer has tested this on the same Intel hardware with RHEL 4, 5, two versions of SuSe, F8, and a few other distributions. The poor performance of the 64 bit code is only apparent in RHEL. To try to further flush this out, I copied: /lib64/ld-linux-x86-64.so.2 /lib64/libc.so.6 /lib64/libm.so.6 from a 64 bit F8 box to /root/f8 on the same Intel box that I ran the above tests on. I then ran: # time /root/f8/ld-linux-x86-64.so.2 --library-path /root/f8 /root/sorttest64 20 real 1m15.522s user 1m14.777s sys 0m0.732s With the above, it looks like I can also reproduce the customer's statement that this is not a problem on F8. ----------------------------------------------------------- /* compile: cc sorttest.c -O3 -lm -o sorttest */ #include <stdio.h> #include <stdlib.h> #include <math.h> #define N 10000000 /* vector of 10-M */ int scr[N], i, cnt, n, posx[N], srt_f(const void *, const void *); double a, aaa, posy[N]; int srt_f(const void *a, const void *b) { aaa = posy[*((int *)a)] - posy[*((int *)b)]; if ( aaa < 0. ) return(-1); return( aaa > 0. ); } int main(int argc, char *argv[]) { cnt = ( argc == 1 ) ? 1 : atoi(argv[1]); for ( n = 0; n < cnt; ++n ) { for ( i = 0; i < N; ++i ) { posx[i] = i; a += .001; posy[i] = sin(a); } qsort((void *) posx, i, sizeof(i), srt_f); for ( i = 0; i < N; ++i ) scr[posx[i]] = (1000*i)/N; } }
OK. Here's the short answer. The performance problem that they are seeing with the RHEL, x86_64 combination and not with Fedora is quite simple. There was a change in libc for Fedora Core 8 for msort and qsort. http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=fedora-branch&r1=1.8782.2.274&r2=1.8782.2.275 +2007-10-04 Jakub Jelinek <jakub> + + * stdlib/msort.c: Include stdint.h. + (struct msort_param): New type. + (msort_with_tmp): Use struct msort_param pointer for unchanging + parameters. Add optimized handling for several common sizes + and indirect sorting mode. + (qsort): Adjust msort_with_tmp callers. For big S use indirect + sorting. + Suggested by Belazougui Djamel . + + * stdlib/Makefile (tests): Add tst-qsort2. + * stdlib/tst-qsort2.c: New test. http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c.diff?cvsroot=glibc&only_with_tag=fedora-branch&r1=1.21&r2=1.21.2.1 see : http://www.cygwin.com/ml/libc-alpha/2007-05/msg00022.html for the rationale Doing a bit of hacking and code substitution with the FC 8 version of msort and qsort gcc -O3 -I. -g -o sorttestnew sorttest.c msortnew.c qsortnew.c -lm msortnew.c: In function ‘msort_with_tmp’: msortnew.c:142: warning: cast to pointer from integer of different size msortnew.c:148: warning: cast to pointer from integer of different size [alanm@dt1 TEST]$ time ./sorttestnew 20 66.331u 0.637s 1:07.10 99.7% 0+0k 0+0io 0pf+0w [alanm@dt1 TEST]$ gcc -m32 -O3 -I. -g -o sorttestnew.m32 sorttest.c msortnew.c qsortnew.c -lm [alanm@dt1 TEST]$ time ./sorttestnew.m32 20 69.026u 0.659s 1:09.71 99.9% 0+0k 0+0io 2pf+0w ## This is the RHEL5 libc [alanm@dt1 TEST]$ time ./sorttest 20 187.127u 0.642s 3:07.80 99.9% 0+0k 0+0io 0pf+0w [alanm@dt1 TEST]$ time ./sorttest.m32 20 71.298u 0.683s 1:12.12 99.7% 0+0k 0+0io 4pf+0w Note that there is a 3X improvement in 64 bit mode for this particular testcase. This would be a better result than you would get because qsort and msort are not compiled with -fPIC in the hacked together case. I'll post the longer answer (how I arrived at this conclusion later today).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0080.html
Do you think that this problem applys on RHEL4, too? If it is so, do you have any plan to fix it on RHEL4? We are having the same experience on RHEL4.
Yes, this problem does apply to RHEL4 as well. The BZ filed against RHEL4 can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=436115 For RHEL4, this bug is CLOSED WONTFIX, as it is very late in the RHEL4 lifecycle.