Bug 149502

Summary: performance drop in SMP
Product: Red Hat Enterprise Linux 4 Reporter: Dmitri A. Sergatskov <dasergatskov>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: davej, dshaks, k.georgiou, mingo, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006:0132 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-14 19:43:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
plot of times vs size for SMP and UP
none
NFS locking fixes that release the kernel_lock in do_unlk()
none
The correct patch none

Description Dmitri A. Sergatskov 2005-02-23 17:42:09 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041215 Firefox/1.0 Red Hat/1.0-12.EL4

Description of problem:
Some memory-intensive number-crunching (single-threaded) programs run significantly slower with SMP kernel than with UP kernel on the same (dual-CPU)
hardware. The problem seems to pertain to most recent RH kernels -- I reproduce
it on FC3 with the most recent kernel (kernel-smp-2.6.10-1.766_FC3.i686.rpm)
My last recorded benchamrk wich does not show this problem was from June 2004
(whatever kernels were current for RHEL3 and FC).


Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.0.3.EL

How reproducible:
Always

Steps to Reproduce:
This is the simplest example I can come up with:

1. boot SMP kernel
2. run octave and execute the following at the prompt:
   octave:1> s=zeros(3000);
 (this will create 3000x3000 matrix filled with zeros)  
   octave:2> tic; w=s'; toc
 (this will transpose it and time the procedure)
     
3. boot UP kernel and repeat the procedure. 
  

Actual Results:  On my computer (2xAthlonMP / 1GB RAM) I get approximately 2 seconds with SMP kernel and 0.8 sec in UP mode. 


Expected Results:  Since tic/toc timer counts walltime, I expect (and used to observe) slightly shorter timing in SMP mode. 

Additional info:

2x Athlon MP (2000 MHz) on Tyan Tiger MP S2460 (Bios 1.05, the latest). 1 GB RAM.

The problem may be hardware dependent. In about 2000 (kernel 2.0.30 or about) I had a very similar problem on Intel 440LX m/b, but not on Intel 440BX...
Unfortunately there is no many Athlon MP chipsets around.

Comment 1 Dmitri A. Sergatskov 2005-02-25 19:52:38 UTC
Created attachment 111441 [details]
plot of times vs size for SMP and UP

Times transposure of matrix DxD size as a 
function of size. Compare SMP and UP modes.

Comment 2 Dmitri A. Sergatskov 2005-02-25 19:53:18 UTC
I did some additional testing. The figure 
ftp://coffee.phys.unm.edu/pub/dima/octave/cpuscale.png 
 shows times obtained in SMP mode and in uni-processor
(UP) mode on the same hardware (2xAthlon 2000MHz / 1 GB RAM).
The swap was turned off for the test.
One can see a curious region around D=2000 to 4000 when
UP outperform SMP by almost a factor of 3.
The reason for deviation from O(d^2) law at high D is not
clear to me either.

Comment 7 Larry Woodman 2005-10-28 10:32:11 UTC
Created attachment 120499 [details]
NFS locking fixes that release the kernel_lock in do_unlk()


Ingo, this cause of this appears to be the missing unlock_kernel() in
do_unlk(). Its part of the NFS locking changes targeted for RHEL4-U3.

Larry Woodman

Comment 8 Steve Dickson 2005-10-28 18:03:04 UTC
Created attachment 120512 [details]
The correct patch

Note patch that Larry posted does not the needed the fix 
in it and also breaks lock tests (i.e. F_TEST) by passing 
back the wrong value when a lock does not exist.

Please try the one I just attached.