107773 – LTC6124-test program performance affected

Bug 107773 - LTC6124-test program performance affected

Summary: LTC6124-test program performance affected

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	gcc
Sub Component:
Version:	3.0
Hardware:	s390
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	185486
TreeView+	depends on / blocked

Reported:	2003-10-22 20:20 UTC by Brock Organ
Modified:	2007-11-30 22:06 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-10-09 19:23:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
customer reported problem (1.24 KB, text/plain) 2003-10-22 20:25 UTC, Brock Organ	no flags	Details
Taroon test executable provided by customer (70.00 KB, application/octet-stream) 2003-10-22 20:26 UTC, Brock Organ	no flags	Details
source for test executable (provided by customer) (.tar format) (10.00 KB, application/octet-stream) 2003-10-22 20:27 UTC, Brock Organ	no flags	Details
run on local system (Taroon 31-bit test tree) (6.14 KB, text/plain) 2003-10-22 20:32 UTC, Brock Organ	no flags	Details
run on local system (Taroon 64-bit test tree) (6.14 KB, text/plain) 2003-10-22 21:24 UTC, Brock Organ	no flags	Details
Summary page from SPEC CPU2000 run on RH72 and EL3 (134.60 KB, image/gif) 2004-01-13 21:51 UTC, Mike Reeves	no flags	Details
make file used to compile programs (5.30 KB, text/plain) 2004-03-10 16:39 UTC, Mike Reeves	no flags	Details
View All

Description Brock Organ 2003-10-22 20:20:59 UTC

* reported by customer 
* test program (c++, see attached code)
* Taroon on s390

using a c++ program (see attachments), performance measurements indicate a
decrease in performance from 7.1-s390x and 7.2-s390 to RHEL 3 (Taroon)

Comment 1 Brock Organ 2003-10-22 20:25:28 UTC

Created attachment 95401 [details]
customer reported problem

Comment 2 Brock Organ 2003-10-22 20:26:14 UTC

Created attachment 95402 [details]
Taroon  test executable provided by customer

Comment 3 Brock Organ 2003-10-22 20:27:04 UTC

Created attachment 95403 [details]
source for test executable (provided by customer) (.tar format)

Comment 4 Brock Organ 2003-10-22 20:32:27 UTC

Created attachment 95404 [details]
run on local system (Taroon 31-bit test tree)

this run on a local system (jake.z900.redhat.com 512 Mb of storage and 7 Gb of
dasd) running s390 (31-bit) Taroon RHEL 3 test tree (Taroon-re1007.RC1.0) ...

Comment 5 Brock Organ 2003-10-22 21:24:14 UTC

Created attachment 95409 [details]
run on local system (Taroon 64-bit test tree)

run on local system (Taroon 64-bit test tree)

this run on a local system (jake.z900.redhat.com 512 Mb of storage and 7 Gb of
dasd) running s390x (64-bit) Taroon RHEL 3 test tree (Taroon-re1007.RC1.0) ...

Comment 6 Jakub Jelinek 2004-01-05 21:30:30 UTC

This test doesn't seem to be much of a compiler benchmark but rather
test how long does 4mil calls to malloc(27) take.
That's where at least 60% of the total time seems to be spent.
I've tried the test on 2.4.21-3.EL s390x kernel running 32-bit userland.
The test compiled with -O2 with compat-gcc-295-7.2-2.95.3.78 (smallest real time from a
5 iterations):
real    0m3.067s
user    0m2.560s
sys     0m0.510s
-O2 with gcc 3.2.3-26:
real    0m3.256s
user    0m2.740s
sys     0m0.520s
Both against glibc-2.3.2-95.6.
When the program was compiled against compat-gcc-295 and linked against
glibc-2.2.4-24.2s.1 (I've hacked up libstdc++-libc6.2-2.so.3 so that it
references sys_nerr instead of sys_nerr, unpacked
the 7.2-s390 glibc into a subdir and linked against that also with
-Wl,-dynamic-linker and -Wl,-rpath) I get:
real    0m2.785s
user    0m2.260s
sys     0m0.530s
Still I don't see any 50% performance drop. gcc-3.2.3-RH seems to be
5% slower than gcc-2.95.2, but the comparison includes two completely
different standard C++ headers and libraries, where 2.95.2 is very far
from conforming.
Then there is a 10% performance drop in speed of 4mil malloc(27) calls.
glibc 2.3.* has a rewritten malloc implementation, and at least from
benchmarks done on other architectures it seems to be an overall win.

Comment 7 Brock Organ 2004-01-13 21:09:56 UTC

email response from customer:

Was this actually run under RH72? We compiled and ran it on RH72 and
re-compiled and ran on RHEL3. That is where we saw the difference. We
also ran CPU SPEC2000 against RH72 and RHEL3, re-compiled on each
platform, and saw a large difference between RH72 and RHEL3. With a
single engine RH72 performs 17% better than does RHEL3. When a second
engine is added, the performance difference is less, but still there.

Comment 8 Jakub Jelinek 2004-01-13 21:15:47 UTC

No, as stated above, the test was done on RHEL3 only, but using both
RHEL3 and RH7.2 compilers.
That way both are using the same kernel which can make a difference
as well.

Comment 9 Mike Reeves 2004-01-13 21:51:54 UTC

Created attachment 96951 [details]
Summary page from SPEC CPU2000 run on RH72 and EL3

Comment 11 Jakub Jelinek 2004-01-29 17:31:52 UTC

I don't have access to SPEC, so I cannot look at it.

Comment 12 Mike Reeves 2004-02-02 16:25:10 UTC

The program above where you noted that the response times were the 
same was compiled and run on RH72 here and compiled and run on RHEL3. 
The difference was when it was compiled and run on the same platform. 
Even in your case it show a 6% degredation in performance. EL3 is 
supposed to be better not worse.

This is a big issues at Fidelity and is holding up any implementation 
here. RedHat should conduct exhaustive benchmarks to insure that 
performance is better on each release or upgrade.

Comment 13 IBM Bug Proxy 2004-02-04 18:02:50 UTC

IBM LTC now following this defect

Comment 14 IBM Bug Proxy 2004-02-05 04:19:03 UTC

----- Additional Comments From khoa.com  2004-02-04 23:18 -------
Sachin - please take a look at this bug and see if you or your team can help.
This is hot for Red Hat.  Thanks.

Comment 15 IBM Bug Proxy 2004-02-05 13:59:49 UTC

----- Additional Comments From smolinski.com  2004-02-05 08:55 -------
Hi Khoa,
for detailed problem determination, please, compile the test programs on both 
platforms with  option '-pg' in order to enable profiling. Then execute the 
program, which creates file 'gmon.out' in the current working directory. 
Invoking 'gprof <binary_name>' and thus comparing the profile samples will 
unveil the differences of the two programs at runtime. 
After determining candidates for the excessive time consumption, we'd be 
interested in examining the compiled binaries obtained by compiling with the 
'-g' option enabled. This will allow us to determine code patterns that are 
inequivalently translated on RHEL3 and RH7.2.
If you need any further assistance in problem determination, just drop me a 
note, or update this bugzilla.

Regards,
 Holger

btw: Is the customer comparing RH7.2 (31 bit) against RHEL3 on 64bit? From the 
architecture field of this bugzilla and attachment 
t01prmc-fromfidelity.s390ons390x.txt I get the impression that the problem 
occurs on RHEL3 64bit (with 31bit EMU-layer?) while 31bit RedHat 7.2 is okay.

Comment 16 IBM Bug Proxy 2004-02-05 17:19:30 UTC

----- Additional Comments From khoa.com  2004-02-05 12:17 -------
Per our discussion this morning, I'd like to re-assign this bug back to you
and then ask Glen Johnson to send your latest comment to Red Hat.  I also
ask Glen to set up BugMail to mirror updates between this bug and RH bug
107773.

Glen - please set up BugMail to mirror updates between this bug and RH
107773.  Thanks.

Comment 17 IBM Bug Proxy 2004-02-05 18:23:55 UTC

----- Additional Comments From khoa.com  2004-02-05 13:24 -------
I expect Red Hat to provide the IBM Boblingen team with the information request
above from Holger Smolinski.

Comment 18 Mike Reeves 2004-02-05 22:35:43 UTC

The comparison was EL3.0 to RH7.2 not EL3.0x to RH7.2. The caveat is 
the fact that EL3.0 was the beta2 release. We are running benchmarks 
at this time on the ga EL3.0. I was told that there was very little 
difference between b2 and ga for EL3.0.

Comment 20 IBM Bug Proxy 2004-02-12 03:29:17 UTC

----- Additional Comments From khoa.com  2004-02-11 22:29 -------
Can Red Hat confirm the observation from Mike Reeves above (i.e., using the
GA level RHEL3) ?

Comment 21 IBM Bug Proxy 2004-02-12 08:36:20 UTC

----- Additional Comments From smolinski.com  2004-02-12 03:36 -------
Besides this side discussion about 390x and 390 distributions, we really
need the profiling data and debug enabled versions of the testcase for an 
analysis. We expect RedHat to work with the customer on providing more
detailed information.

Comment 22 Bob Johnson 2004-02-13 13:57:38 UTC

Mike Reeves,
Can you provide what IBM is asking us for ?

Comment 23 IBM Bug Proxy 2004-02-18 13:24:12 UTC

----- Additional Comments From smolinski.com  2004-02-18 08:23 -------
Hi Mark, Glen,

we are still waiting for the debug data to be provided by the customer via RedHat.
Is there any progress in data collection? 
Since we are waiting for data for two wweks now, setting status to deferred.
Will close the bug after 30 additional days of inactivity.

Regards,
    Holger

Comment 24 Bob Johnson 2004-03-03 16:50:29 UTC

Mike Reeves,

IBM is still looking for feedback here on the testcases.

Comment 25 IBM Bug Proxy 2004-03-05 20:22:34 UTC

----- Additional Comments From smolinski.com  2004-03-05 15:21 -------
Sent email to Mike Waite, (cc: Jay Barrows, Jim Burke) requesting a call with
him to collect the facts on this issue and prepare an action plan.
Received contact information to Mike Waite from Jay Barrows at. Tried to contact
Mike Waite, not in, left message on voice mail. addt'ly sent another email to
Mike requesting to urgently schedule a call with me today.

Comment 26 IBM Bug Proxy 2004-03-05 21:02:06 UTC

----- Additional Comments From smolinski.com  2004-03-05 16:03 -------
Tried to call Mike Waite, his phone is temporarily unavailable. deferring
further action to Monday. Have a great weekend. Holger.

Comment 27 Bob Johnson 2004-03-09 18:02:21 UTC

I will ping Mike Reeves again on the questions from Holger posted in
#15 above.

Comment 28 Mike Reeves 2004-03-09 19:09:51 UTC

I have been pinged, but I am and MVS sysprog and don't do much c. I 
will have to find someone who can do this or you could provide the 
process for me and I will be happy to do it.

Comment 29 IBM Bug Proxy 2004-03-10 08:30:14 UTC

----- Additional Comments From smolinski.com  2004-03-10 03:25 -------
Received cc: of a note from Mike Reeves to Michale Waite, providing source code
of another testcase, exposing a performance issue as well (LMBench):

>Mike -
>
>I am having difficulty getting into bugzilla and I have problems getting
>into it to update it.
>
>Here is the situation. We are running RedHat EL3.0 (31 bit not 64 bit). The
>indication is 7.1x, but we never ran that level of code. The difference is
>between EL3 (31 bit) and RH72 (31bit).
>
>We are trying to re-run the CPU SPEC2000 on EL3.0 update 1, but are
>receiving compile errors and have not been able to get that to run.
>
>Initially, we noticed the problem with SPEC2000. However, a customer was
>doing some testing with a small c program and found that it too much longer
>to run on EL3 than it did on RH72.
>
>I sent the program to RedHat for testing and they tested the program on EL3,
>but did not run it on RH72.
>
>If you would like to run LMbench2 on both, you will see that context
>switching is noticeable faster on RH72 versus EL3. But I an running on the
>native OS versions not trying to emulate RH72 under EL3.
>
>I have attached a copy of LMbench. In general, it seems that RH72 performs
>better that EL3. Perhaps there are some kernel parameters that need to be
>tweaked and I am certainly open to that. However, I feel that this is
>something that should be checked before a new OS is released.
>
>I attached the source and executables for the test to the bugzilla. 
>
>Thanks - 
>Mike Reeves 


Summarizing:
we have got three incidents of performance degradation from RedHat 7.2 to RHEL3
1. SPEC2000
2. customer provided c++ test program
3. LMBench

Right now we have source code available for any of these, but there are no
runtime profiling data for any of these, which would allow us to deductively
identify the origin of the performance degradation. Hence we cannot prove, that
the three issues come out of the same origin.
However, it's really unlikely, that the timer tick patch, which is currently
discussed as a candidate to resolve the problem, will help here, since neither
RedHat 7.2 nor RHEL3 currently contain a version of the patch. Additionally this
patch is supposed to reduce CPU utilization, when guests are idling, not
consuming CPU cycles just for incrementing the timer tick value. This is an
issue that gets more important, the more idle guests you have in your z/VM system.

Michael Waite, Jay Barrows,
please prioritize the three issues with the customer, then let's start
collecting the runtime profiling data for the most important issue first, nail
that problem down and resolve it, and finally doucble check, whether the
solution also resolves the other issues.

What do you think of that plan?
Holger

Comment 30 Mike Reeves 2004-03-10 10:57:50 UTC

Holger - you are correct, the timer patch will not correct this issue 
unless the performance degradation is due to the number of idle 
guests running in the system.

I would like to start with the simple c program that performed well 
on a RedHat 7.2 guest in the same environment as the EL3 guest which 
did not perform as well.

It appears that compat-gcc-295-7.2-2.95.3.78 was put out there after 
we found this problem. Is it possible that could be the cause?
 thx Mike

Comment 31 IBM Bug Proxy 2004-03-10 16:13:05 UTC

----- Additional Comments From smolinski.com  2004-03-10 11:14 -------
Mike Reeves,
In order to isolate the origin of the performance issues, we need the profiling
data, requested in my comment of 2004-02-05 08:55.
Can you provide these data obtained 'in-situ'. Usually effects, like you
observed, vanish as soon as you try to recreate them in a lab environment.
Holger.

Comment 32 Mike Reeves 2004-03-10 16:39:19 UTC

Created attachment 98429 [details]
make file used to compile programs

Comment 33 Mike Reeves 2004-03-10 16:45:42 UTC

The "-pg" did not produce a "gmon.out" file. Consequently, I have 
created an attachment with the make file for your review.

Comment 34 Mike Reeves 2004-03-10 16:54:56 UTC

Here are the current timins which are much better, but still about a 
16% difference:

[root@ldal9001 atng0]# uname -a
Linux ldal9001.fmr.com 2.4.21-9.EL #1 SMP Thu Jan 8 16:59:07 EST 2004 
s390 s390 s390 GNU/Linux
[root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - bgn
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - end
real    0m3.736s
user    0m2.960s
sys     0m0.770s

[root@ldal9007 atng0]# uname -a
Linux ldal9007.eserver.fmr.com 2.4.9-37 #1 SMP Wed Apr 24 17:46:32 
CEST 2002 s390 unknown
[root@ldal9007 atng0]# time ./t01prmc -r 10 -nodbg
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - bgn
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - end

real    0m3.153s
user    0m2.550s
sys     0m0.600s

This is pretty interesting because that is about the percentage 
difference we saw on SPEC CPU2000 with one ILF.

thx Mike

Comment 35 Mike Reeves 2004-03-12 21:07:46 UTC

All - I may have found some things here that could explain the 
differences. First, the compiles were not done with the âO2 option on 
RHEL3. Second, the kernel was set to 2.4.19 (probably for IBM 
MQSeries). 

Following recompiling with âO2 I received the followintimes:

RedHat EL3.0 running with old thread model â

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

[root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - bgn
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - end

real    0m5.399s
user    0m2.960s
sys     0m2.430s

RedHat EL3.0 running with new nptl thread model â

#LD_ASSUME_KERNEL=2.4.19
#export LD_ASSUME_KERNEL

[root@ldal9001 atng0]# time ./t01prmc -r 10 -nodbg
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - bgn
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - end

real    0m4.176s
user    0m3.390s
sys     0m0.780s

RedHat 7.3  -

[root@ldal9007 atng0]# time ./t01prmc -r 10 -nodbg
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - bgn
t01prmc.cpp  - v.m. 01.002 - 2003.10.14 - end

real    0m6.902s
user    0m2.650s
sys     0m4.160s

We are recompiling the SPEC2000 at this time and will rerun to see if 
running the kernel at 2.4.19 had an effect on the test.

thx MR

Comment 36 Bob Johnson 2004-03-24 16:17:01 UTC

Mike Reeves,
Any update on the results of your recompile ?

Comment 37 Mike Reeves 2004-06-17 13:56:42 UTC

Sorry I missed this Bob. We have not been able to get SPEC2000 to 
recompile on EL3. I beleive that Brock Organ opened a buzilla on 
this. So I do not have results on this. 

It could be related to the compat libs.

thx MR

Comment 38 IBM Bug Proxy 2006-02-22 11:55:31 UTC

changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|OPEN                        |REJECTED
         Resolution|                            |WILL_NOT_FIX




------- Additional Comments From hannsj_uhl.com  2006-02-22 06:56 EDT -------
problem will not be pursued for RHEL3 ... 
... closing bugzilla as \'will not fix\' for RHEL3 ...

Note You need to log in before you can comment on or make changes to this bug.