From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Description of problem: The system with AthlonMP 2800 runs linpack benchmark as well as commercial numerical apps from Synopsys 50% slower then P4 2.4GHz. Both machines have single channel DR266 memory. Under Win32, the machines with matched CPU performance and matched memory architecture show no difference between P4 and Athlon linpack performance (i.e. P4 2.0Ghz vs. Athlon 2000+ both running DDR266). Please see: http://www.tech-report.com/reviews/2002q1/northwood-vs-2000/index.x?pg=3 (look at rightmost part of the graph) The numbers I have gotten are : P4 2.4 Ghz - 180MFlops 2.4.20-8 AthlonMP 2800 - 100 Mflops 2.4.20-20.9smp AthlonMP 2200 - 78 Mflops 2.4.2-2smp I will be happy to provide you with the linpack source I used in this test. I have used gcc with following commands cc -DDP -DUNROLL -O1 clinpack1000.c -lm -o clinpack1000.exe Version-Release number of selected component (if applicable): kernel-2.4.20-20.9smp and earlier How reproducible: Always Steps to Reproduce: 1.Compile the clinpack1000.c source for matrix order of 1000 (n=1000) 2. Run on P4 3. Run on Athlon 4. Compare the performance Actual Results: Athlon has roughly 50% performance of P4. Expected Results: Somewhat better Athlon 2800 performance over P4 2.4 GHz. Additional info: my email address is ognjen.milic the person in Synopsys Inc. who can confirm these findings is andrey.kucherov
Created attachment 95780 [details] Linpack bench converted to C for Athlon vs. P4 speed test please compile with the following commands cc -DDP -DUNROLL -O1 clinpack1000.c -lm -o clinpack1000.exe
In a AlthlonMP 2000+ here (rh9, tyan tiger S2466, one cpu only) i get 120 Mflops with your options. If i optimize with: -O3 -march=athlon -msse -mfpmath=sse -malign-double -mpreferred-stack-boundary=4 -falign-loops=4 it goes up to 130 Mflops. For the record in a p4 2Ghz with DDRAM i get 170 Mflops and in a a dual 2Ghz with RDRAM i get 250 Mflops with a linpack binary compiled with ifc (version 7 if i remember). Linpack performance is directly correlated to memory bandwidth, since you get bad performance i suspect something is wrong with your ram/motherboard.
It is nice to see that by tweaking compiler options one can achieve better performance, however that is beside the point here. The issue is that there is huge difference between P4 and 2P Athlon performance when run with the same executable even though P4 and 2P Athlon run the same type and speed of memory, in this case DDR266. This is contrary to the win32 linpack performance as seen from tech-report review. I have tested the problem on about 12 machines, all with AMD760MPX chipset and they all have the same problem. The problem is that memory controller driver for AMD762 north-bridge does not work or is non-existent in Linux distributions when installed as-is. Did you do any kernel recompilation? One thing though, I never tested it on Athlon MP platform with only one CPU! Should not matter as linpack is single-threaded app. but who knows. I am 100% percent sure that this is not an isolated case, as I tested it on number of other AMD760MPX platforms, running various kernel versions. All had 2 processors installed.Also, the same machine with 2800+ processors was tested for memory performance under win32 and it passed with flying colors. That rules out single-machine hardware issue. Maybe plugging in second processor causes the memory controller to divide the bandwidth between the processors by default and there is no driver, or is malfunctioning, to rectify this. In your case it is obvious that single CPU is getting all the bandwidth and thus performs well.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/