Bug 183364

Summary:	CONFIG_DEBUG_SPINLOCK causes massive increase in kernel buildtimes
Product:	[Fedora] Fedora	Reporter:	Prarit Bhargava <prarit>
Component:	kernel	Assignee:	Ingo Molnar <mingo>
Status:	CLOSED NEXTRELEASE	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	5	CC:	davej, dchapman, kmurray, konradr, maurizio.antillon, miyer, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-08-08 14:12:05 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	163350

Description Prarit Bhargava 2006-02-28 17:11:16 UTC

Description of problem:

I have been chasing an issue with the upstream and Fedora kernels for a week
or so.  The problem is with the CONFIG_DEBUG_SPINLOCK option -- when turned
on the resultant kernel requires 20 minutes to build a kernel on a 64p/64G
ia64 system.  For comparison, the same build only requires 2 minutes on 
similarily configured 2.6.9 series kernel.

Version-Release number of selected component (if applicable):

Latest fedora kernels.

How reproducible: 100 %


Steps to Reproduce:
1. boot Fedora
2. download and install source
3. compile
  
Actual results:

Below is a list of buildtimes (using time (make -j oldconfig;
make -j compressed; make -j modules; make -j modules_install; make -j install))
for the 2.6.9-33.EL RHEL4 U3 RC3 kernel, 2.6.16-rc4 without
CONFIG_DEBUG_SPINLOCK, and 2.6.16-rc4 with CONFIG_DEBUG_SPINLOCK.

You can see that there is a huge increase in system time with the 64p running
2.6.16-rc4 with CONFIG_DEBUG_SPINLOCK compared to 2.6.16-rc4 without
CONFIG_DEBUG_SPINLOCK, 958 minutes vs. 13 minutes.

2.6.9-33.EL with CONFIG_DEBUG_SPINLOCK

64p

real    2m01.013s
user    58m9.504s
sys     6m36.283s

32p

real    4m13.665s
user    58m8.381s
sys     5m24.072s

16p

real    6m41.907s
user    64m45.513s
sys     5m47.168s

8p

real    10m30.895s
user    62m36.766s
sys     4m49.389s

4p

real    18m22.062s
user    62m6.599s
sys     4m5.615s

2p

real    33m31.121s
user    54m32.272s
sys     4m46.661s

2.6.16-rc4 without CONFIG_DEBUG_SPINLOCK

64p

real    4m29.869s
user    56m37.336s
sys     13m5.512s

32p

real    4m46.434s
user    57m10.280s
sys     6m52.700s

16p

real    6m11.018s
user    57m7.368s
sys     5m36.380s

8p

real    10m20.800s
user    61m56.780s
sys     5m26.372s

4p

real    18m58.029s
user    62m8.304s
sys     4m58.764s

2p

real    37m37.040s
user    63m19.824s
sys     4m54.532s

2.6.16-rc4 with CONFIG_DEBUG_SPINLOCK

64p

real    21m25.129s
user    54m59.256s
sys     958m37.320s

32p

real    4m39.060s
user    57m10.264s
sys     22m14.432s

16p

real    6m58.702s
user    63m28.012s
sys     6m51.596s

8p

real    9m33.040s
user    56m45.364s
sys     4m56.840s

4p

real    19m51.676s
user    62m45.048s
sys     5m0.296s

2p

real    33m29.849s
user    56m39.272s
sys     4m26.068s

Expected results:

This is tough to answer -- it depends on the expected behaviour of
CONFIG_DEBUG_SPINLOCK.  If the behaviour is correct (in that it doesn't scale
well ... after all, it is a _DEBUG_ option) then this the reported behaviour
is okay.  However, CONFIG_DEBUG_SPINLOCK should not be turned on in 
production kernels.

Additional info:

Comment 1 Doug Chapman 2006-03-02 22:17:09 UTC

some data from my HP 64p (ia64) Integrity 1TB system

command in both cases was: time make -j 70

compiling kernel under RHEL4U3: 
real    1m3.786s
user    26m38.878s
sys     2m46.416s


compiling running under upstream 2.6.16-rc5 w/CONFIG_DEBUG_SPINLOCK
real    1m12.617s
user    26m3.748s
sys     9m18.908s


It is slower but not as bad as you were seeing...

- Doug

Comment 2 Prarit Bhargava 2006-03-08 19:42:46 UTC

After a discussion with Kimball, he pointed out that the issue is probably not 
any sort of cacheline ping-pong issue, but rather with the __delay(1) used 
in lib/spinlock_debug.c: __spin_lock_debug() 
 
                for (i = 0; i < loops_per_jiffy * HZ; i++) { 
                        if (__raw_spin_trylock(&lock->raw_lock)) 
                                return; 
                        __delay(1); 
                } 
 
Removing this __delay cuts down the time drastically.  Ingo, I'm probably  
missing something basic here -- but is it really necessary to do a __delay(1)? 
Why not make the "loops_per_jiffy * HZ" value tunable at boot time and allow 
users to set a value -- wouldn't that be the same as attempting to 
get close to one second? 
 
Just some thoughts, 
 
P.

Comment 3 Prarit Bhargava 2006-03-08 19:52:51 UTC

With the __delay(1) removed, and building a kernel: 
 
real    2m44.371s 
user    38m0.008s 
sys     51m19.492s 
 
We're still seeing an impact, however, that maybe due to other CONFIG options, 
etc.. 
 
P.

Comment 4 Prarit Bhargava 2006-03-10 00:55:59 UTC

The more I think about this, the more I believe the correct solution is to 
remove the __delay(1).  

The loop

       for (i = 0; i < loops_per_jiffy * HZ; i++)

does loop for approximately a second.  I do not see any advantage to delaying
for a single clock tick on ia64.

Manoj -- what is the ramification of removing the __delay on ppc & ppc64?

P.

P.

Comment 5 Manoj Iyer 2006-03-10 15:14:32 UTC

Prarit have you seen is LKML discussion ? 

http://lkml.org/lkml/2006/2/6/381

Comment 6 Prarit Bhargava 2006-03-10 15:23:50 UTC

>http://lkml.org/lkml/2006/2/6/381 
 
Hmmm ... interesting.  Ingo is saying that loops_per_jiffy is the wrong 
metric to use -- and I agree. 
 
I think the correct metric to use here is the TSC?   IIRC most arch's have a 
value that is TSC ticks/second that can be determined by querying the CPU ... 
 
P.

Comment 7 Dave Jones 2006-03-28 23:21:53 UTC

changed the __delay(1) to a cpu_relax() in latest builds.
Let me know how it works out.

Comment 8 Prarit Bhargava 2006-04-12 15:19:08 UTC

most recent rawhide kernel with cpu_relax instead of __delay(1) on 64p ... is 
actually worse, not better :/. 
 
real    21m32.134s 
user    56m47.696s 
sys     1016m2.616s