* reported by customer * test program (c++, see attached code) * Taroon on s390 using a c++ program (see attachments), performance measurements indicate a decrease in performance from 7.1-s390x and 7.2-s390 to RHEL 3 (Taroon)
Created attachment 95401 [details] customer reported problem
Created attachment 95402 [details] Taroon test executable provided by customer
Created attachment 95403 [details] source for test executable (provided by customer) (.tar format)
Created attachment 95404 [details] run on local system (Taroon 31-bit test tree) this run on a local system (jake.z900.redhat.com 512 Mb of storage and 7 Gb of dasd) running s390 (31-bit) Taroon RHEL 3 test tree (Taroon-re1007.RC1.0) ...
Created attachment 95409 [details] run on local system (Taroon 64-bit test tree) run on local system (Taroon 64-bit test tree) this run on a local system (jake.z900.redhat.com 512 Mb of storage and 7 Gb of dasd) running s390x (64-bit) Taroon RHEL 3 test tree (Taroon-re1007.RC1.0) ...
This test doesn't seem to be much of a compiler benchmark but rather test how long does 4mil calls to malloc(27) take. That's where at least 60% of the total time seems to be spent. I've tried the test on 2.4.21-3.EL s390x kernel running 32-bit userland. The test compiled with -O2 with compat-gcc-295-7.2-2.95.3.78 (smallest real time from a 5 iterations): real 0m3.067s user 0m2.560s sys 0m0.510s -O2 with gcc 3.2.3-26: real 0m3.256s user 0m2.740s sys 0m0.520s Both against glibc-2.3.2-95.6. When the program was compiled against compat-gcc-295 and linked against glibc-2.2.4-24.2s.1 (I've hacked up libstdc++-libc6.2-2.so.3 so that it references sys_nerr instead of sys_nerr, unpacked the 7.2-s390 glibc into a subdir and linked against that also with -Wl,-dynamic-linker and -Wl,-rpath) I get: real 0m2.785s user 0m2.260s sys 0m0.530s Still I don't see any 50% performance drop. gcc-3.2.3-RH seems to be 5% slower than gcc-2.95.2, but the comparison includes two completely different standard C++ headers and libraries, where 2.95.2 is very far from conforming. Then there is a 10% performance drop in speed of 4mil malloc(27) calls. glibc 2.3.* has a rewritten malloc implementation, and at least from benchmarks done on other architectures it seems to be an overall win.
email response from customer: Was this actually run under RH72? We compiled and ran it on RH72 and re-compiled and ran on RHEL3. That is where we saw the difference. We also ran CPU SPEC2000 against RH72 and RHEL3, re-compiled on each platform, and saw a large difference between RH72 and RHEL3. With a single engine RH72 performs 17% better than does RHEL3. When a second engine is added, the performance difference is less, but still there.
No, as stated above, the test was done on RHEL3 only, but using both RHEL3 and RH7.2 compilers. That way both are using the same kernel which can make a difference as well.
Created attachment 96951 [details] Summary page from SPEC CPU2000 run on RH72 and EL3
I don't have access to SPEC, so I cannot look at it.
The program above where you noted that the response times were the same was compiled and run on RH72 here and compiled and run on RHEL3. The difference was when it was compiled and run on the same platform. Even in your case it show a 6% degredation in performance. EL3 is supposed to be better not worse. This is a big issues at Fidelity and is holding up any implementation here. RedHat should conduct exhaustive benchmarks to insure that performance is better on each release or upgrade.
IBM LTC now following this defect
----- Additional Comments From khoa.com 2004-02-04 23:18 ------- Sachin - please take a look at this bug and see if you or your team can help. This is hot for Red Hat. Thanks.
----- Additional Comments From smolinski.com 2004-02-05 08:55 ------- Hi Khoa, for detailed problem determination, please, compile the test programs on both platforms with option '-pg' in order to enable profiling. Then execute the program, which creates file 'gmon.out' in the current working directory. Invoking 'gprof <binary_name>' and thus comparing the profile samples will unveil the differences of the two programs at runtime. After determining candidates for the excessive time consumption, we'd be interested in examining the compiled binaries obtained by compiling with the '-g' option enabled. This will allow us to determine code patterns that are inequivalently translated on RHEL3 and RH7.2. If you need any further assistance in problem determination, just drop me a note, or update this bugzilla. Regards, Holger btw: Is the customer comparing RH7.2 (31 bit) against RHEL3 on 64bit? From the architecture field of this bugzilla and attachment t01prmc-fromfidelity.s390ons390x.txt I get the impression that the problem occurs on RHEL3 64bit (with 31bit EMU-layer?) while 31bit RedHat 7.2 is okay.
----- Additional Comments From khoa.com 2004-02-05 12:17 ------- Per our discussion this morning, I'd like to re-assign this bug back to you and then ask Glen Johnson to send your latest comment to Red Hat. I also ask Glen to set up BugMail to mirror updates between this bug and RH bug 107773. Glen - please set up BugMail to mirror updates between this bug and RH 107773. Thanks.
----- Additional Comments From khoa.com 2004-02-05 13:24 ------- I expect Red Hat to provide the IBM Boblingen team with the information request above from Holger Smolinski.
The comparison was EL3.0 to RH7.2 not EL3.0x to RH7.2. The caveat is the fact that EL3.0 was the beta2 release. We are running benchmarks at this time on the ga EL3.0. I was told that there was very little difference between b2 and ga for EL3.0.
----- Additional Comments From khoa.com 2004-02-11 22:29 ------- Can Red Hat confirm the observation from Mike Reeves above (i.e., using the GA level RHEL3) ?
----- Additional Comments From smolinski.com 2004-02-12 03:36 ------- Besides this side discussion about 390x and 390 distributions, we really need the profiling data and debug enabled versions of the testcase for an analysis. We expect RedHat to work with the customer on providing more detailed information.
Mike Reeves, Can you provide what IBM is asking us for ?
----- Additional Comments From smolinski.com 2004-02-18 08:23 ------- Hi Mark, Glen, we are still waiting for the debug data to be provided by the customer via RedHat. Is there any progress in data collection? Since we are waiting for data for two wweks now, setting status to deferred. Will close the bug after 30 additional days of inactivity. Regards, Holger
Mike Reeves, IBM is still looking for feedback here on the testcases.
----- Additional Comments From smolinski.com 2004-03-05 15:21 ------- Sent email to Mike Waite, (cc: Jay Barrows, Jim Burke) requesting a call with him to collect the facts on this issue and prepare an action plan. Received contact information to Mike Waite from Jay Barrows at. Tried to contact Mike Waite, not in, left message on voice mail. addt'ly sent another email to Mike requesting to urgently schedule a call with me today.
----- Additional Comments From smolinski.com 2004-03-05 16:03 ------- Tried to call Mike Waite, his phone is temporarily unavailable. deferring further action to Monday. Have a great weekend. Holger.
I will ping Mike Reeves again on the questions from Holger posted in #15 above.
I have been pinged, but I am and MVS sysprog and don't do much c. I will have to find someone who can do this or you could provide the process for me and I will be happy to do it.
----- Additional Comments From smolinski.com 2004-03-10 03:25 ------- Received cc: of a note from Mike Reeves to Michale Waite, providing source code of another testcase, exposing a performance issue as well (LMBench): >Mike - > >I am having difficulty getting into bugzilla and I have problems getting >into it to update it. > >Here is the situation. We are running RedHat EL3.0 (31 bit not 64 bit). The >indication is 7.1x, but we never ran that level of code. The difference is >between EL3 (31 bit) and RH72 (31bit). > >We are trying to re-run the CPU SPEC2000 on EL3.0 update 1, but are >receiving compile errors and have not been able to get that to run. > >Initially, we noticed the problem with SPEC2000. However, a customer was >doing some testing with a small c program and found that it too much longer >to run on EL3 than it did on RH72. > >I sent the program to RedHat for testing and they tested the program on EL3, >but did not run it on RH72. > >If you would like to run LMbench2 on both, you will see that context >switching is noticeable faster on RH72 versus EL3. But I an running on the >native OS versions not trying to emulate RH72 under EL3. > >I have attached a copy of LMbench. In general, it seems that RH72 performs >better that EL3. Perhaps there are some kernel parameters that need to be >tweaked and I am certainly open to that. However, I feel that this is >something that should be checked before a new OS is released. > >I attached the source and executables for the test to the bugzilla. > >Thanks - >Mike Reeves Summarizing: we have got three incidents of performance degradation from RedHat 7.2 to RHEL3 1. SPEC2000 2. customer provided c++ test program 3. LMBench Right now we have source code available for any of these, but there are no runtime profiling data for any of these, which would allow us to deductively identify the origin of the performance degradation. Hence we cannot prove, that the three issues come out of the same origin. However, it's really unlikely, that the timer tick patch, which is currently discussed as a candidate to resolve the problem, will help here, since neither RedHat 7.2 nor RHEL3 currently contain a version of the patch. Additionally this patch is supposed to reduce CPU utilization, when guests are idling, not consuming CPU cycles just for incrementing the timer tick value. This is an issue that gets more important, the more idle guests you have in your z/VM system. Michael Waite, Jay Barrows, please prioritize the three issues with the customer, then let's start collecting the runtime profiling data for the most important issue first, nail that problem down and resolve it, and finally doucble check, whether the solution also resolves the other issues. What do you think of that plan? Holger
Holger - you are correct, the timer patch will not correct this issue unless the performance degradation is due to the number of idle guests running in the system. I would like to start with the simple c program that performed well on a RedHat 7.2 guest in the same environment as the EL3 guest which did not perform as well. It appears that compat-gcc-295-7.2-2.95.3.78 was put out there after we found this problem. Is it possible that could be the cause? thx Mike
----- Additional Comments From smolinski.com 2004-03-10 11:14 ------- Mike Reeves, In order to isolate the origin of the performance issues, we need the profiling data, requested in my comment of 2004-02-05 08:55. Can you provide these data obtained 'in-situ'. Usually effects, like you observed, vanish as soon as you try to recreate them in a lab environment. Holger.
Created attachment 98429 [details] make file used to compile programs
The "-pg" did not produce a "gmon.out" file. Consequently, I have created an attachment with the make file for your review.
Here are the current timins which are much better, but still about a 16% difference: [root@ldal9001 atng0]# uname -a Linux ldal9001.fmr.com 2.4.21-9.EL #1 SMP Thu Jan 8 16:59:07 EST 2004 s390 s390 s390 GNU/Linux [root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg t01prmc.cpp - v.m. 01.002 - 2003.10.14 - bgn t01prmc.cpp - v.m. 01.002 - 2003.10.14 - end real 0m3.736s user 0m2.960s sys 0m0.770s [root@ldal9007 atng0]# uname -a Linux ldal9007.eserver.fmr.com 2.4.9-37 #1 SMP Wed Apr 24 17:46:32 CEST 2002 s390 unknown [root@ldal9007 atng0]# time ./t01prmc -r 10 -nodbg t01prmc.cpp - v.m. 01.002 - 2003.10.14 - bgn t01prmc.cpp - v.m. 01.002 - 2003.10.14 - end real 0m3.153s user 0m2.550s sys 0m0.600s This is pretty interesting because that is about the percentage difference we saw on SPEC CPU2000 with one ILF. thx Mike
All - I may have found some things here that could explain the differences. First, the compiles were not done with the âO2 option on RHEL3. Second, the kernel was set to 2.4.19 (probably for IBM MQSeries). Following recompiling with âO2 I received the followintimes: RedHat EL3.0 running with old thread model â LD_ASSUME_KERNEL=2.4.19 export LD_ASSUME_KERNEL [root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg t01prmc.cpp - v.m. 01.002 - 2003.10.14 - bgn t01prmc.cpp - v.m. 01.002 - 2003.10.14 - end real 0m5.399s user 0m2.960s sys 0m2.430s RedHat EL3.0 running with new nptl thread model â #LD_ASSUME_KERNEL=2.4.19 #export LD_ASSUME_KERNEL [root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg t01prmc.cpp - v.m. 01.002 - 2003.10.14 - bgn t01prmc.cpp - v.m. 01.002 - 2003.10.14 - end real 0m4.176s user 0m3.390s sys 0m0.780s RedHat 7.3 - [root@ldal9007 atng0]# time ./t01prmc -r 10 -nodbg t01prmc.cpp - v.m. 01.002 - 2003.10.14 - bgn t01prmc.cpp - v.m. 01.002 - 2003.10.14 - end real 0m6.902s user 0m2.650s sys 0m4.160s We are recompiling the SPEC2000 at this time and will rerun to see if running the kernel at 2.4.19 had an effect on the test. thx MR
Mike Reeves, Any update on the results of your recompile ?
Sorry I missed this Bob. We have not been able to get SPEC2000 to recompile on EL3. I beleive that Brock Organ opened a buzilla on this. So I do not have results on this. It could be related to the compat libs. thx MR
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|OPEN |REJECTED Resolution| |WILL_NOT_FIX ------- Additional Comments From hannsj_uhl.com 2006-02-22 06:56 EDT ------- problem will not be pursued for RHEL3 ... ... closing bugzilla as \'will not fix\' for RHEL3 ...