Bug 107644

Summary:	futex lock implementation doesn't scale.
Product:	Red Hat Enterprise Linux 3	Reporter:	Randy Pafford <rpafford>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED WORKSFORME	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0	CC:	drepper, petrides
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-12-16 20:02:34 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Randy Pafford 2003-10-21 17:32:49 UTC

Testing workload with 20 processes. Each process is multi-threaded. We
find that with 20 processes, CPU consumption skyrockets and we get
extremely poor scaling (far worse than on suse 8.0).  The culprit
appears to be futex calls. Oprofile shows the following:

vma      samples  %              image_name                        symbol name
c013b828 404241  54.671 vmlinux-2.4.21-1.1931.2.399.entsmp  .text.lock.futex
c013b0f0 102563  13.871 vmlinux-2.4.21-1.1931.2.399.entsmp  futex_wake
c013b480  30613  4.1402 vmlinux-2.4.21-1.1931.2.399.entsmp  futex_wait
c013e370  20063  2.7134 vmlinux-2.4.21-1.1931.2.399.entsmp  follow_page
c01251a0  14941  2.0207 vmlinux-2.4.21-1.1931.2.399.entsmp  context_switch
c013b240  13886  1.8780 vmlinux-2.4.21-1.1931.2.399.entsmp  futex_requeue

We do not believe the lock calls made by the processes are typically
contended, and thus are surprised that futex even enters the
kernel very often.

Comment 1 Randy Pafford 2003-10-21 17:34:52 UTC

We are running on 4-way IBM servers with Xeon processors.

The exact workload is a program that is being run internally by our
company and cannot be released. Our investigation shows that there
have been/still are known bugs in the futex implementation that 
appear to map directly to this problem, although we could not find
a specific bug report.  Here is part of one discussion about what
appears to be the same problem:

-----------------------------------------------------

From: "Hu, Boris" <boris hu intel com> 
To: "Bill Soudan" <bsoudan brass com>, "Perez-Gonzalez, Inaky" <inaky perez-
gonzalez intel com> 
Cc: <phil-list redhat com>, "Ramanujam, Ram" <ram ramanujam intel com>, "Ingo 
Molnar" <mingo elte hu>, "Jakub Jelinek" <jakub redhat com>, "John Levon" 
<levon movementarian org> 
Subject: RE: Poor thread performance on Linux vs. Solaris 
Date: Tue, 9 Sep 2003 15:38:57 +0800 

--------------------------------------------------------------------------------

Try the futex_q_lock-0.2 patch. It is also against linux-2.6.0-test4. 

It does the following things:
* Remove the global futex_lock as the previous futex_q_lock patch did.
* Add bucket spinlock recursively check as Jakub mentioned.
* Move vcache_lock out of lock/unlock_futex_mm() and only to protect the actual 
vcache operations.
* Shrink some lock/unlock_futex_mm() scopes.

boris

--- linux-2.6.0-test4.orig/kernel/futex.c	2003-08-23 07:53:39.000000000 
+0800
+++ linux-2.6.0-test4/kernel/futex.c	2003-09-09 14:15:02.000000000 +0800
@@ -57,9 +57,16 @@
 	struct file *filp;
 };
 
-----------------------------------------------------

Comment 3 Ingo Molnar 2003-10-27 18:19:16 UTC

There was a glibc fix that avoids a livelock bug in the mutex code. This fix was
not in taroon-beta2, it's only in taroon-final - if this is a vanilla -beta2
system then could you please upgrade to taroon-final (or just to latest
taroon-glibc) and re-test? Do you still get the same problem?

Comment 4 Suzanne Hillman 2003-12-16 20:02:34 UTC

This has been open with a request to retest with newer code for over a
month and a half. Closing. If this is still a problem, please reopen,
including information on reetesting.