Bug 175180
Summary: | SPECWeb99_SSL on RHEL 4 Performance -30% versus upstream | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | John Shakshober <dshaks> | ||||||
Component: | kernel | Assignee: | David Miller <davem> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.0 | CC: | davej, jbaron, kajtzu, mingo | ||||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | rhel4.4 | Doc Type: | Enhancement | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-05-23 13:45:10 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
John Shakshober
2005-12-07 13:14:56 UTC
Created attachment 121968 [details]
Lockmeter Results
Created attachment 121969 [details]
Oprofile Results
Also the most recent kernel-perf test results show tcp/ip to be upto 15-23% higher for 2.6.13 and 15 kernels compared to 2.6.9. Any idea what the changes to the upstream kernel are? http://kernel-perf.sourceforge.net/results.machine_id=1.html It's very likely the massive TSO TCP rewrite that was done upstream. We can never merge that in, it's way too invasive. We are already using the tso=off using ethtool for the benchmark. So perhaps this does NOT correlate with what we are seeing in SPECwebSSL. I will recheck with AMD (until we get our setup running by the end of Jan 06). Yes, please recheck that just to make sure. It could be something as simple as a driver update, but asking "what changed" in networking from 2.6.9 to 2.6.{13,15}... that's like tens of megabytes of networking changes. This sounds totally different, this performance report, from the SSL stuff this particular bug report is about. I think these extra data points will end up being distractions from the core issue, which appears to be simply an issue of SMP lock cache-line ping-pong in the AF_UNIX code. I recommended that they put the unix_table_lock in a seperate data section or similar, then retest. I never heard back after making that recommendation. The rw_lock unix_table_lock has high utilization, hold and wait times for the write phase. RWLOCK WRITES HOLD WAIT (ALL) WAIT (WW)wait on writer UTIL CON MEAN( MAX ) MEAN( MAX )( %CPU) MEAN( MAX ) TOTAL NOWAIT SPIN( WW ) NAME 0.01% 32.9% 0.5us(1795us) 2071us(9063us)( 1.5%) 197us(3760us) 59294 67.1% 32.7%(0.20%) unix_table_lock 0.00% 4.8% 0.3us( 622us) 1789us(8666us)(0.10%) 1.5us( 46us) 29960 95.2% 4.4%(0.31%) unix_create1+0x15b 0.01% 61.7% 0.8us(1795us) 2093us(9063us)( 1.4%) 988us(3760us) 29334 38.3% 61.6%(0.08%) unix_release_sock+0x2c As you can see there is ~2ms mean,~9ms max wait times all and ~1ms mean, ~3.7ms max wait times on writers. Even hold max times are hi ~1.7ms... And unix_release_sock() just clears out link, should not be that long. RW_locks favors reads so writers could get starved! Also the rw_lock does _not_ disable interrupts so that could be causing long hold times and holding off others. Why there is unix domain activity I not not know. Is there some deamon running that is causing this activity that Suse does not run? I have looked at Suse Sp2 2.6.5-7.191 and I don't see any difference from rhel 4u2 in the Unix domain code, files net/unix/af_unix.c or garbage.c In kernel.org as of 15 Dec 05 there is a patch to make the unix_table_lock as spinlock for performance reasons. Yes upstream changes it to a spinlock, but that shouldn't be showing the kind of performance killing you see here. Any number of daemons on the machine could be using AF_UNIX, why not use tools such as "netstat" and "lsof" to find out who has them open? :-) I've asked to place the unix_table_lock in a seperate cache line, and test what that does to your results. Has that been done yet? I think I recommended that a few weeks ago, so you should have been able to get a chance to perform that quick check for us by now, it's a one-line change. Yup - I still do not have a system to reproduce and why am pouring through the data again. This load requires 16 clients to drive a single server. Will try to pad the lock out if the compiler is doing the wrong thing. Do you have the oner liner you want to try? I understand the ton of changes in the upstream... thanks for comments. I still do not have a setup at Red Hat to attempt to reproduce. The upstream kernels contain significant Network performance optimizations. We recommend using FC5 2.6.15 kernels for future performance benchmarking. Found this in a cleanup. Not sure if this is a feature or a bug. The SuSE kernel that has been compared is not newer than our kernel, is it? Putting on 4.6 for a review. Shak notes the following, based on this closing as currentrelease. This bottleneck was not observed in SPECweb2005 for either 4.4 and 4.5 kernels. We have since published 7 of the top ten results for SPECweb2005 with various partners. This ticket was for AMD 64 which with HP owns the #1, 9 and 10th spot on the list below. Want me to enter into the BZ? Dave perhaps can comment on why this went away with SPECweb2005? Shak |