Bug 469186

Summary: [FOCUS] Lockdep fixes cause latency regression
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Peter Zijlstra <pzijlstr>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 1.0CC: bhu, davids, lgoncalv, lwang, pzijlstr, williams
Target Milestone: 1.1   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-22 10:44:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to make double_unlock_balance an inline function none

Description IBM Bug Proxy 2008-10-30 14:22:53 UTC
=Comment: #0=================================================
Sripathi Kodi <sripathik.com> - 

LTC bug 46204 (RH 459478) reported very long boot up times with the debug kernel. Also, lockdep
would not work. Eventually Peter Zijlstra suggested a few patches from mainline, which when
backported, solved this problem. However, these patches caused regression in one of our proprietary
latency benchmarks. Hence we removed these patches from our builds. However, these patches continue
to exist in MRG kernels. We need to find the root cause of this regression and fix it.

Here are the commits in question:
* 0001-lockdep-fix-combinatorial-explosion-in-lock-subgrap.patch
mrg-rt.git commit 7fdba90ff2a2dfd693fab7b76736334b76d8563e
linux-2.6  commit 419ca3f13532793b81aff09f80c60af3eacbb43d

* 0001-lockdep-build-fix.patch
mrg-rt.git commit 2e3b018b87616c9d622625889d3270b6bf93870c
linux-2.6  commit d6672c501852d577097f6757c311d937aca0b04b

* 0002-lockdep-change-scheduler-annotation.patch
mrg-rt.git commit 2768deaf8c7a1e8bc929303ad897da05fc943389

* 0004-lockdep-re-annotate-scheduler-runqueues.patch
mrg-rt.git commit b0a890b69d2a807adab797a0936f54a980bb8988

From a bisection it seems the final patch
(lockdep-re-annotate-scheduler-runqueues.patch) is where the regression appears.

Comment 1 IBM Bug Proxy 2008-11-03 05:40:27 UTC
We have seen this regression on HS21 hardware.

Comment 2 IBM Bug Proxy 2008-11-03 09:40:38 UTC
The patch we are talking about, lockdep-re-annotate-scheduler-runqueues.patch, basically makes this change in a few places:

-       spin_unlock(&target_rq->lock);
+       double_unlock_balance(busiest_rq, target_rq);

double_unlock_balance does this:
static void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
__releases(busiest->lock)
{
spin_unlock(&busiest->lock);
lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_);
}

So in effect we make an extra call to lock_set_subclass. However, lock_set_subclass gets compiled out if CONFIG_LOCKDEP is not enabled. We don't enable LOCKDEP in our kernel, so lock_set_subclass() doesn't exist in our kernel. The following is the code of double_unlock_banalce:

ffffffff81030e52 <double_unlock_balance>:
ffffffff81030e52:       55                      push   %rbp
ffffffff81030e53:       48 89 f7                mov    %rsi,%rdi
ffffffff81030e56:       48 89 e5                mov    %rsp,%rbp
ffffffff81030e59:       e8 fe 6a 25 00          callq  ffffffff8128795c <__spin_unlock>
ffffffff81030e5e:       c9                      leaveq
ffffffff81030e5f:       c3                      retq

Hence the only difference this patch is making should be 1 extra function call. I will re-verify that this patch indeed is the one causing the problem. If yes, I will make double_unlock_balance() an inlined function and see if that makes a difference.

Comment 3 IBM Bug Proxy 2008-11-03 14:21:04 UTC
I have reconfirmed that the patch that causes is indeed lockdep-re-annotate-scheduler-runqueues.patch. As I mentioned in my previous comment, I have compiled a kernel in which I have inlined double_unlock_balance function. Lemme see how this one fares.

Comment 4 IBM Bug Proxy 2008-11-03 15:53:21 UTC
Okay, it is confirmed. Inlining double_unlock_balance() 'solves' the problem we are seeing. While I think it is perfectly safe to inline this function, I am surprised to see the impact of a function call.

Anyone got thoughts on this? John Stultz? Peter Zijlstra?

Comment 5 Peter Zijlstra 2008-11-03 18:52:07 UTC
I blame gcc for doing a poor job. Anyway could you send this patch upstream with some numbers showing the improvement?

Much appreciated.

Comment 6 Clark Williams 2008-11-03 19:01:09 UTC
Would someone post at patch with a Signed-off-by: so we can add this to the MRG RT build?

Comment 7 IBM Bug Proxy 2008-11-04 10:21:12 UTC
I have attached the patch to this bug. I will also put it inline here, just in case. I made this patch on 2.6.24.7-88.el5rt kernel.

The benchmark that has exposed this problem is proprietary, hence it doesn't help push this patch on LKML. I will try to find some other test case that can show this problem. If not, may be Peter's Ack will do? :-)

Patch:
**************

Make double_lock_balance an inline function. This solves some latency
problems.

Signed-off-by: Sripathi Kodi <sripathik.com>

Index: linux-2.6.24.7.x86_64/kernel/sched.c
===================================================================
--- linux-2.6.24.7.x86_64.orig/kernel/sched.c
+++ linux-2.6.24.7.x86_64/kernel/sched.c
@@ -2365,7 +2365,7 @@ static int double_lock_balance(struct rq
return ret;
}

-static void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
+static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
__releases(busiest->lock)
{
spin_unlock(&busiest->lock);

Comment 8 IBM Bug Proxy 2008-11-04 10:21:16 UTC
Created attachment 322404 [details]
Patch to make double_unlock_balance an inline function

Comment 9 Peter Zijlstra 2008-11-04 15:06:29 UTC
The workload doesn't matter, as long as the latency is measurable include the measurements and send the patch to LKML.

Comment 10 Luis Claudio R. Goncalves 2008-11-04 17:32:54 UTC
Added to the queue of -91 for testing

Comment 11 IBM Bug Proxy 2008-11-05 13:50:30 UTC
(In reply to comment #15)
> ------- Comment From pzijlstr 2008-11-04 10:06:29 EDT-------
> The workload doesn't matter, as long as the latency is measurable include the
> measurements and send the patch to LKML.
>

Sent to LKML: http://lkml.org/lkml/2008/11/5/98

Comment 13 David Sommerseth 2008-11-21 10:13:06 UTC
Verified by code review, as we are missing reproducer.

Attached patch is in mrg-rt.git as commit cc768400cea58c24a19488b33106c74b742e2e6e.  The patch is found implemented in mrg-rt-2.6.24.7-93.

Comment 15 errata-xmlrpc 2009-01-22 10:44:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0009.html