Bug 497181

Summary: [LTC 6.0 FEAT] Nominate power efficient idle loadbalancer [201925]
Product: Red Hat Enterprise Linux 6 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Kevin W Monroe <kmonroe>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: jjarvis, jlarrew, peterm
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-16 18:44:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 356741, 525727    

Description IBM Bug Proxy 2009-04-22 17:30:41 UTC
=Comment: #0=================================================
Emily J. Ratliff <ratliff.com> - 
1. Feature Overview:
Feature Id:	[201925]
a. Name of Feature:	Nominate power efficient idle loadbalancer
b. Feature Description
This patchset improves the idle-load balancer nomination logic, by  taking into consideration the
system topology.
 
An idle-load balancer is an idle-cpu which does not turn off it's sched_ticks and performs
load-balancing on behalf of the other idle CPUs. Currently, this idle load balancer is nominated as
the first_cpu (nohz.cpu_mask)

2. Feature Details:
Sponsor:	Linux Systems Tech
Architectures:
x86_64
ppc64
s390x

Arch Specificity: Purely Common Code
Affects Core Kernel: Yes
Affects Kernel Modules: Yes
Delivery Mechanism: Direct from community
Category:	Kernel
Request Type:	Kernel - Enhancement from Upstream
d. Upstream Acceptance:	Pending
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	IBM code
g. Component Version Target:	Patch and discussions: (Version 2)
 http://lkml.org/lkml/2009/4/2/246
 
 Patch series in Ingo's sched-tip on 14 April 09.

3. Business Case
The drawback of the current method is that the CPU numbering in the cores/packages need not
necessarily be sequential.  Now, the other power-savings settings such as the
sched_mc/smt_power_savings and the power-aware IRQ balancer try to balance tasks/IRQs by taking the
system topology into consideration, with the intention of keeping as many 'power-domains'
(cores/packages) in the low-power state.
 
The current idle-load-balancer nomination does not necessarily align towards this policy. For eg, we
could be having tasks and interrupts largely running on the first package with the intention of
keeping the second package idle.  Hence, CPU 0 may be busy. The first_cpu in the nohz.cpu_mask could
be CPU1, which in-turn becomes nominated as the idle-load balancer. CPU1 could be from the 2nd
package, would in turn prevent the 2nd package from going into a deeper sleep state.
 
Instead the role of the idle-load balancer could have been assumed by an idle CPU from the first
package, thereby helping the second package go completely idle.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie Glass, sglass.com, 512-838-9284

Technical contact(s):

Vaidyanathan Srinivasan, svaidyan.com

IBM Manager:
Jeffrey Heroux, heroux.com

Comment 2 IBM Bug Proxy 2010-05-10 17:21:03 UTC
------- Comment From arunbharadwaj.com 2010-05-10 13:18 EDT-------
Hi,

I have verified that this feature is present in RHEL6 snap 1 kernel and it is functionally working correct.

Here is how I have tested this feature:

On a fully idle SMP, start a small workload pinned to any one CPU. Observe that the interrupt rate on this CPU is high. Also observe that another CPU from the *same* package has a high interrupt rate. This is the ILB CPU. Since the ILB CPU is chosen from the same package as the busy CPU, this feature is functionally working correct.

thanks
arun