Red Hat Bugzilla – Bug 103121
LTC4108-[perf][SPECweb99] RHEL 3 Beta 1 4-way performs better than 8-way
Last modified: 2007-11-30 17:06:57 EST
The following has be reported by IBM LTC:
[perf][SPECweb99] RHEL 3 Beta 1 4-way performs better than 8-way
Server: Netfinity 8500r (8 x 900 MHz)
28 GB RAM
4 Intel e1000 dual-port gigabit adapters
Clients: 8 xSeries x330 (2 x 1GHz)
1.5 GB RAM
1 Intel e1000 gigabit adapter
RHEL 3 Beta 1 plus all errata as of Aug 11 (via up2date)
Apache 2.0.47 w/mod_specweb
Steps to Reproduce:
1. Run SPECweb99 benchmark
8-way performs poorly, 4-way performs better than 8-way
Better 8-way performance
Metric is "Simultaneous Connections" in SPECweb99
RHEL 3 Beta 1
8-way = 1500 us 16 sy 84 id 0
4-way = 1600 us 29 sy 71 id 0
A 2.5.73 kernel.org kernel built on the same RHEL 3 Beta 1 install sees 3600
8-way, and 2400 4-way.
I suspect TCP locking problems as seen in pre-Beta 1 release of RHEL 3:
8-way @ 1500 conns
0.000% 1 .text.lock.tcp_input
0.003% 5 .text.lock.tcp_minisocks
0.221% 339 .text.lock.tcp_timer
0.793% 1213 .text.lock.tcp_ipv4
2.027% 3098 .text.lock.tcp
4-way @ 1600 conns
0.001% 1 .text.lock.tcp_input
0.003% 2 .text.lock.tcp_minisocks
0.004% 3 .text.lock.tcp_timer
0.188% 117 .text.lock.tcp_ipv4
0.300% 186 .text.lock.tcp
What additional information would be helpful in debugging this issue?
* SPEC(tm) and the benchmark name SPECweb(tm) are registered
trademarks of the Standard Performance Evaluation Corporation.
This benchmarking was performed for research purposes only,
and is non-compliant, with the following deviations from the
1 - Runs were shorter than 1200 seconds.
2 - Access_log wasn't kept for full accounting. It was
written, but deleted every 200 seconds.
Not enough information to analyze this. Please obtain more detailed
locking profiles, or standard kernel profiles during an 8-way run.
------ Additional Comments From email@example.com 2003-27-08 11:37 -------
/usr/sbin/readprofile does not work for me on Beta 1. (all totals 0)
1- I can send raw captures of /proc/profile if that is helpful.
2- Can RedHat supply a working readprofile?
you need to enable the nmi watchdog as well:
------ Additional Comments From firstname.lastname@example.org 2003-28-08 18:36 -------
Can't complete a run with those kernel args-- double fault.
The box is then locked. How do I get at the stack values for debugging?
If this is one of those IBM machines where the BIOS corrupts registers if you
use the NMI, then I think we're out of luck on this one....
------ Additional Comments From email@example.com 2003-02-09 12:08 -------
Would captures of /proc/profile be useful?
only when decoded by readprofile; the addresses at which modules get loaded vary
per machine and even between reboots.....
------ Additional Comments From firstname.lastname@example.org 2003-02-09 19:44 -------
I've removed the system management card from the SPECweb system (Netfinity
8500r). In theory, that'll make the race condition under which the register
corruption happens much less likely to occur. I'm producing profiles now. With
luck, I'll have data for you tomorrow AM.
------ Additional Comments From email@example.com 2003-04-09 18:31 -------
Even with the system management card removed, I am unable to get far enough into
a run to capture profiles.
In apic.c, can I just add a call to x86_do_profile under the CONFIG_SMP case in
It would seem that then I'd be able to profile without having to enable NMIs and
do it through nmi_watchdog_tick.
yes, that change should work. The resulting profiling info should be taken with
a grain of salt - irq-disabled overhead (irq handler overhead, etc.) wont show
up, and the overhead might be added to some unrelated code. But it's better than
also, please try a newer kernel.
on a related note, why cannot the BIOS do a proper return from the SMI handler
if it interrupts an NMI handler? How does the BIOS solve the case when the BIOS
itself handles an NMI and is interrupted by an SMI?
------ Additional Comments From firstname.lastname@example.org 2003-10-09 19:27 -------
The profiling worked, but the profiles look strange.
When trying a new kernel, up2date fails on mkinitrd when updating
The source installed OK but when I try and build, it errors out in the e1000
drivers-- which I need. I may be able to use other e1000 drivers, but that
would negate the point of trying to demonstrate a networking problem.
I did an rpm -e of the kernel stuff, and then an up2date -p to re-sync with RHN,
but up2date still refuses to try the update again.
I'll attach the errors from up2date and the kernel build.
Created attachment 94406 [details]
All of your loopback devices are in use.
means that most likely you were running a kernel where you didn't compile in loop.
that's mantadory for being able to install kernels
------ Additional Comments From email@example.com 2003-15-09 11:56 -------
Very slight improvement with 2.4.21-1.1931.2.399.entsmp. Picked up ~200
additional SPECweb simultaneous connections, but still very idle-- nearly 50%.
Other 2.4.21 will go 0% idle and ~2700 simultaneous connections on this hardware.
I'll up2date all errata and re-test, then modify again for profiling on local
------ Additional Comments From firstname.lastname@example.org 2003-15-09 14:38 -------
up2date fails when run on stock beta2 as installed from ISO CDs.
149:wvdial ########################################### [100%]
New Up2date available
Traceback (most recent call last):
File "/usr/sbin/up2date", line 1148, in ?
File "/usr/sbin/up2date", line 747, in main
File "/usr/sbin/up2date", line 1014, in batchRun
# quiet mode for rhn_check
File "/usr/share/rhn/up2date_client/up2dateBatch.py", line 76, in run
File "/usr/share/rhn/up2date_client/up2dateBatch.py", line 145, in
self.kernelsToInstall = up2date.installPackages(self.packagesToInstall,
File "/usr/share/rhn/up2date_client/up2date.py", line 719, in installPackages
if "kernel" in hdr['Providename']:
File "/usr/share/rhn/up2date_client/up2date.py", line 769, in runPkgSpecialCases
TypeError: installBootLoader() takes exactly 1 argument (3 given)
re-running up2date produces:
[root@x4408way1 root]# up2date -u
Fetching package list for channel: rhel3-beta1-as-i386...
Fetching Obsoletes list for channel: rhel3-beta1-as-i386...
Fetching rpm headers...
The following Packages were marked to be skipped by your configuration:
Name Version Rel Reason
initscripts 7.31.1.EL 1 Config modified
All packages are currently up to date
Is my system OK to re-boot, given the installBootLoader() error, and that my
initscripts may be out of sync?
Are things in an OK state to proceed with testing?
Created attachment 94504 [details]
could you please run the attached script and attach the result? I suspect it's
some of the TCP settings that is causing problems, but i'm not sure.
also, could you run 'top -b d 10 > top.log' during the test and attach top.log?
Similarly, please run 'vmstat 10 > vmstat.log' too during the test and attach
the resulting vmstat.log.
------ Additional Comments From email@example.com 2003-26-09 11:24 -------
Why did it take until the 25th for Ingo's reply to show up in this defect?
I made the comment on the 15th, and got the email acknowledgement from bugzilla
a couple of minutes later, so the bugzilla side seems to be OK. Are you at IBM
running some other bug tracking system that feeds into bugzilla?
------ Additional Comments From firstname.lastname@example.org 2003-26-09 14:25 -------
> Are you at IBM running some other bug tracking system that feeds into bugzilla?
Yes, that's it exactly. And, it appears the "Internal only" flag fails to
work as well. :-)
I re-ran up2date this morning, and have the benchmark running now. I'll run
your data-collection script once it reaches maximum load. Thanks. I aplogize
for the delay.
------ Additional Comments From email@example.com 2003-26-09 19:13 -------
A change after the beta 2 ISOs has greatly helped networking. SPECweb is
currently still running from this morning and is well beyond the point at which
I expected the benchmark conformance to drop off.
I'll send benchmark results and the output of your 'getconfig' script once
SPECweb reaches its maximum conformance point.
------ Additional Comments From firstname.lastname@example.org 2003-29-09 17:04 -------
It still falls off a cliff-- only later now. There is an associated huge drop
in interrupt rate when this happens... probably NAPI kicking in (since it is
enabled by default in your kernel). e1000 NAPI has never worked for me. I'm
re-building to try running without NAPI (unless there is a module option to turn
it off? NAPI_HOWTO.txt is not helpful here.).
------ Additional Comments From email@example.com 2003-29-09 18:06 -------
Using the RedHat-supplied /boot/config-2.4.21-3.ELsmp with the only change being
'CONFIG_E1000_NAPI=y' to '# CONFIG_E1000_NAPI is not set', the kernel will
compile fine, but modules will not. Is there another way to turn off NAPI
without my having to fight this build process again? Every time I go through
this I have to make so many chnages in order to make things compile that my
resulting kernel is in no way similar to your released kernel-- which is what
I'm trying to help you test.
Our source and configs are identical to what we ship.
You have to issue a make mrproper in the /usr/src/linux-2.4 directory before
doing ANYTHING because that directory comes preconfigured (to allow external
modules to build) and the 2.4 kernel makefiles don't have complete dependencies
to wipe these :(
------ Additional Comments From firstname.lastname@example.org 2003-01-10 11:13 -------
OK, turning off NAPI was the answer. The system now loads up and runs as it
What |Removed |Added
------- Additional Comments From email@example.com 2005-03-27 12:50 EST -------
Bug clean-up time. I'd like to close this bug report based on Comment #28