=Comment: #0================================================= Sripathi Kodi <sripathik.com> - LTC bug 46204 (RH 459478) reported very long boot up times with the debug kernel. Also, lockdep would not work. Eventually Peter Zijlstra suggested a few patches from mainline, which when backported, solved this problem. However, these patches caused regression in one of our proprietary latency benchmarks. Hence we removed these patches from our builds. However, these patches continue to exist in MRG kernels. We need to find the root cause of this regression and fix it. Here are the commits in question: * 0001-lockdep-fix-combinatorial-explosion-in-lock-subgrap.patch mrg-rt.git commit 7fdba90ff2a2dfd693fab7b76736334b76d8563e linux-2.6 commit 419ca3f13532793b81aff09f80c60af3eacbb43d * 0001-lockdep-build-fix.patch mrg-rt.git commit 2e3b018b87616c9d622625889d3270b6bf93870c linux-2.6 commit d6672c501852d577097f6757c311d937aca0b04b * 0002-lockdep-change-scheduler-annotation.patch mrg-rt.git commit 2768deaf8c7a1e8bc929303ad897da05fc943389 * 0004-lockdep-re-annotate-scheduler-runqueues.patch mrg-rt.git commit b0a890b69d2a807adab797a0936f54a980bb8988 From a bisection it seems the final patch (lockdep-re-annotate-scheduler-runqueues.patch) is where the regression appears.
We have seen this regression on HS21 hardware.
The patch we are talking about, lockdep-re-annotate-scheduler-runqueues.patch, basically makes this change in a few places: - spin_unlock(&target_rq->lock); + double_unlock_balance(busiest_rq, target_rq); double_unlock_balance does this: static void double_unlock_balance(struct rq *this_rq, struct rq *busiest) __releases(busiest->lock) { spin_unlock(&busiest->lock); lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_); } So in effect we make an extra call to lock_set_subclass. However, lock_set_subclass gets compiled out if CONFIG_LOCKDEP is not enabled. We don't enable LOCKDEP in our kernel, so lock_set_subclass() doesn't exist in our kernel. The following is the code of double_unlock_banalce: ffffffff81030e52 <double_unlock_balance>: ffffffff81030e52: 55 push %rbp ffffffff81030e53: 48 89 f7 mov %rsi,%rdi ffffffff81030e56: 48 89 e5 mov %rsp,%rbp ffffffff81030e59: e8 fe 6a 25 00 callq ffffffff8128795c <__spin_unlock> ffffffff81030e5e: c9 leaveq ffffffff81030e5f: c3 retq Hence the only difference this patch is making should be 1 extra function call. I will re-verify that this patch indeed is the one causing the problem. If yes, I will make double_unlock_balance() an inlined function and see if that makes a difference.
I have reconfirmed that the patch that causes is indeed lockdep-re-annotate-scheduler-runqueues.patch. As I mentioned in my previous comment, I have compiled a kernel in which I have inlined double_unlock_balance function. Lemme see how this one fares.
Okay, it is confirmed. Inlining double_unlock_balance() 'solves' the problem we are seeing. While I think it is perfectly safe to inline this function, I am surprised to see the impact of a function call. Anyone got thoughts on this? John Stultz? Peter Zijlstra?
I blame gcc for doing a poor job. Anyway could you send this patch upstream with some numbers showing the improvement? Much appreciated.
Would someone post at patch with a Signed-off-by: so we can add this to the MRG RT build?
I have attached the patch to this bug. I will also put it inline here, just in case. I made this patch on 2.6.24.7-88.el5rt kernel. The benchmark that has exposed this problem is proprietary, hence it doesn't help push this patch on LKML. I will try to find some other test case that can show this problem. If not, may be Peter's Ack will do? :-) Patch: ************** Make double_lock_balance an inline function. This solves some latency problems. Signed-off-by: Sripathi Kodi <sripathik.com> Index: linux-2.6.24.7.x86_64/kernel/sched.c =================================================================== --- linux-2.6.24.7.x86_64.orig/kernel/sched.c +++ linux-2.6.24.7.x86_64/kernel/sched.c @@ -2365,7 +2365,7 @@ static int double_lock_balance(struct rq return ret; } -static void double_unlock_balance(struct rq *this_rq, struct rq *busiest) +static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest) __releases(busiest->lock) { spin_unlock(&busiest->lock);
Created attachment 322404 [details] Patch to make double_unlock_balance an inline function
The workload doesn't matter, as long as the latency is measurable include the measurements and send the patch to LKML.
Added to the queue of -91 for testing
(In reply to comment #15) > ------- Comment From pzijlstr 2008-11-04 10:06:29 EDT------- > The workload doesn't matter, as long as the latency is measurable include the > measurements and send the patch to LKML. > Sent to LKML: http://lkml.org/lkml/2008/11/5/98
Verified by code review, as we are missing reproducer. Attached patch is in mrg-rt.git as commit cc768400cea58c24a19488b33106c74b742e2e6e. The patch is found implemented in mrg-rt-2.6.24.7-93.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0009.html