742414 – serious SPECjbb regression in KVM guest due to cpu cgroups

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 742414 - serious SPECjbb regression in KVM guest due to cpu cgroups

Summary: serious SPECjbb regression in KVM guest due to cpu cgroups

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	Unspecified
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	6.2
Assignee:	Larry Woodman
QA Contact:	Mike Gahagan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	741979 748554
TreeView+	depends on / blocked

Reported:	2011-09-30 03:24 UTC by Mark Wagner
Modified:	2013-01-10 00:24 UTC (History)
CC List:	10 users (show)
Fixed In Version:	kernel-2.6.32-211.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-12-06 14:15:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
guest perf top for the -141 kernel (4.10 KB, text/plain) 2011-09-30 19:21 UTC, Mark Wagner	no flags	Details
guest perf top data for the -142 kernel (4.04 KB, text/plain) 2011-09-30 19:23 UTC, Mark Wagner	no flags	Details
host perf top for the -141 kernel (4.04 KB, text/plain) 2011-09-30 19:24 UTC, Mark Wagner	no flags	Details
host perf top for the -142 kernel (4.54 KB, text/plain) 2011-09-30 19:24 UTC, Mark Wagner	no flags	Details
guest vmstat for the -141 kernel (27.09 KB, text/plain) 2011-09-30 19:25 UTC, Mark Wagner	no flags	Details
guest vmstat for the -142 kernel (27.09 KB, text/plain) 2011-09-30 19:36 UTC, Mark Wagner	no flags	Details
host vmstat for the -141 kernel (32.81 KB, text/plain) 2011-09-30 19:36 UTC, Mark Wagner	no flags	Details
host vmstat for the -142 kernel (43.32 KB, text/plain) 2011-09-30 19:38 UTC, Mark Wagner	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:1530	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update	2011-12-06 01:45:35 UTC

Description Mark Wagner 2011-09-30 03:24:48 UTC

Description of problem:
When running a single KVM guest on the RHEL6.2 stream, we have discovered a serious regression in the performance of SPECjbb when run in a KVM guest. 

Version-Release number of selected component (if applicable):
Initially found in the -192 kernel
Traced and it originated with the -142 kernel


How reproducible:
Everytime

Steps to Reproduce:
1.Run SPECjbb in a KVM Guest (RHEL6.0 guest) on a -142 or greater host kernel
2. 
3.
  
Actual results:
# grep through  maw_test25_rhel142.txt 
           throughput =     177317.29 SPECjbb2005 bops 
           throughput =     272970.10 SPECjbb2005 bops 
           throughput =     143913.56 SPECjbb2005 bops 
           throughput =     264087.82 SPECjbb2005 bops 


Expected results:

From a -141 kernel on the host
# grep through maw_test26_rhel141.txt
           throughput =     179524.72 SPECjbb2005 bops 
           throughput =     325709.59 SPECjbb2005 bops 
           throughput =     429984.34 SPECjbb2005 bops 
           throughput =     429432.45 SPECjbb2005 bops 

Additional info:

have been working with Larry and tracked this to cpu cgroups

With cpu cgroups disabled (  cgroup_disable=cpu) on the -142 kernel we get better performance than the with the -141 kernel (with cgroups on) 
# grep through maw_test28_rhel142_cgroups_disabled.txt 
           throughput =     181539.05 SPECjbb2005 bops 
           throughput =     342462.70 SPECjbb2005 bops 
           throughput =     458423.38 SPECjbb2005 bops 
           throughput =     445874.35 SPECjbb2005 bops

Comment 2 Larry Woodman 2011-09-30 11:26:55 UTC

Mark, I assume this is CPU bound, can you get profiles of the -141 and -142 kernels while running SPECjbb2005 so I can see where the time is going?

Larry

Comment 3 Larry Woodman 2011-09-30 11:34:43 UTC

Mark, your original email subject on this was "Huge performance difference when starting guest via cmdline vs libvirt".  Is this true and if yes, does one use a cpu cgroup and the other not use one?

Larry

Comment 4 Larry Woodman 2011-09-30 12:08:15 UTC

This appears to be a performance regression caused by the 12-part scheduler patch series I backported from upstream to address BZ623712 and was included by Aris in the -142 kernel.  
-------------------------------------------------------------------------------
%changelog
* Fri Apr 29 2011 Aristeu Rozanski <arozansk> [2.6.32-142.el6]
...
- [kernel] sched: Drop rq->lock from idle_balance() (Larry Woodman) [623712]
- [kernel] sched: Fix unregister_fair_sched_group() (Larry Woodman) [623712]
- [kernel] sched: Allow update_cfs_load() to update global load (Larry Woodman) [623712]
- [kernel] sched: Implement demand based update_cfs_load() (Larry Woodman) [623712]
- [kernel] sched: Update shares on idle_balance (Larry Woodman) [623712]
- [kernel] sched: Add sysctl_sched_shares_window (Larry Woodman) [623712]
- [kernel] sched: Introduce hierarchal order on shares update list (Larry Woodman) [623712]
- [kernel] sched: Fix update_cfs_load() synchronization (Larry Woodman) [623712]
- [kernel] sched: Fix load corruption from update_cfs_shares() (Larry Woodman) [623712]
- [kernel] sched: Make tg_shares_up() walk on-demand (Larry Woodman) [623712]
- [kernel] sched: Implement on-demand (active) cfs_rq list (Larry Woodman) [623712]
- [kernel] sched: Rewrite tg_shares_up (Larry Woodman) [623712]
...
-------------------------------------------------------------------------------
I had Shak verify that these patches did fix BZ623712, see the attachment in comment# 36.  

BZ623712 is about the time it takes to create KVM 130 guests in 130 separate cgroups where this BZ is about the performance of a single KVM guest running in a single cgroup and executing SPECjbb after the guest has been created. 

Larry

Comment 5 Mark Wagner 2011-09-30 12:19:46 UTC

Larry
Guests started with libvirt use cgroups if they are enabled on the host.  When we start the guests with a script (command line) they do not use cgroups.

Comment 6 Larry Woodman 2011-09-30 13:04:40 UTC

Mark, can you get me profiles for both -141 & -142 kernels and if possible try to determine if the startup or runtime or both is slower in the -142 kernel.

I will search the upstream commits to see if there were any recent changes that address this.

Larry

Comment 7 Mark Wagner 2011-09-30 19:21:29 UTC

Created attachment 525811 [details]
guest perf top for the -141 kernel

Comment 8 Mark Wagner 2011-09-30 19:23:17 UTC

Created attachment 525812 [details]
guest perf top data for the -142 kernel

Comment 9 Mark Wagner 2011-09-30 19:24:09 UTC

Created attachment 525813 [details]
host perf top for the -141 kernel

Comment 10 Mark Wagner 2011-09-30 19:24:45 UTC

Created attachment 525814 [details]
host perf top for the -142 kernel

Comment 11 Mark Wagner 2011-09-30 19:25:41 UTC

Created attachment 525815 [details]
guest vmstat for the -141 kernel

Comment 12 Mark Wagner 2011-09-30 19:36:02 UTC

Created attachment 525816 [details]
guest vmstat for the -142 kernel

Comment 13 Mark Wagner 2011-09-30 19:36:45 UTC

Created attachment 525817 [details]
host vmstat for the -141 kernel

Comment 14 Mark Wagner 2011-09-30 19:38:35 UTC

Created attachment 525818 [details]
host vmstat for the -142 kernel

Comment 15 Larry Woodman 2011-10-03 11:32:37 UTC

I dont know whats going on here, there is no big differences between the perftop or vmstat outputs between the -141 or -142 guests or hosts!  Mark, are both the hosts and guests running -141/-142 or it is running something else(6.0 or 6.1)???

Larry

Comment 16 Mark Wagner 2011-10-03 16:30:02 UTC

The guest stays at RHEL6.0 (2.6.32-71).  I vary the host kernel only.

Comment 17 Larry Woodman 2011-10-03 18:25:42 UTC

Mark, can we run this again with -141 & -142 and compare what values are in the cpuset & cpuaccount & cpu area of the cgroup mount points?  It alomst seems like we are limiting the amount of CPU that the 12 guests are getting.

Larry

Comment 20 Larry Woodman 2011-10-10 12:32:38 UTC

I verified this is a performance regression caused by the 12-part scheduler patch series I backported from upstream to address the another problem in cgroups where creation did not scale(BZ623712).  I've tried to isolate exactly which of those patches causes this but the system does not boot if I remove any of them.  I also verified that the upstream kernel does not suffer from this problem.  I have created a single patch that is about 1300 lines long that I am using to debug this problem.  I am looking at what additional upstream changes have been made to the scheduler, specifically load balancer related changes, to address this problem.  There are hundreds of them, the upstream scheduler has been changes a lot since 2.6.32!

I understand the urgency of this issue and I am working as hard as possible and spending all of my time on it.  The upstream patches that caused this regression went into the kernel on April 29th, yet this problem was discovered on September 29th, exactly 5 months later.  I dont know what we can do to test for this sort of thing earlier but it would have been much more comfortable for me if I knew there was a performance problem a month or two or three or four ago, performance regressions are the most difficult problems to find and fix!  

Also, I dont know there are other application running in KVM guests within cgroups that also suffer from this degradation or its limited to SPECjbb. Can someone answer this?

Larry Woodman

Comment 21 Larry Woodman 2011-10-10 18:33:52 UTC


Status: early numbers say we found it with the latest upstream backports:

Stock RHEL6.1:
root@dhcp47-18 SPECjbb2005 # grep throu  131.el6.txt
throughput =     429432.45 SPECjbb2005 bops

Stock RHEL6.2:
root@dhcp47-18 SPECjbb2005 # grep throu  207.el6.txt
throughput =     321762.84 SPECjbb2005 bops

Current upstream kernel:
root@dhcp47-18 SPECjbb2005 # grep through upstream_01.txt
throughput =     452214.41 SPECjbb2005 bops

Current 6.2 with latest upstream sched changes:
root@dhcp47-18 SPECjbb2005 # grep throu  207.el6.207sched.txt
throughput =     464825.91 SPECjbb2005 bops


We are still testing and as you know there is always some weird problem that shows up!!!  I just fixed the usual kABI breakers associated with backporting anything into RHEL and kicked the build off in brew.  I'll make this kernel available as soon as its done and update the BZ as we get more data.

Larry Woodman

Comment 22 Larry Woodman 2011-10-11 02:24:28 UTC

The kernel built in brew which I think fixes this problem is located here:

barstool.build:/mnt/redhat/brewroot/packages/kernel/2.6.32/207.el6.SPECjbb

I'll post the patches once Jeff Burke gets a chance to test it on Beaker.

Larry

Comment 23 Larry Woodman 2011-10-11 18:41:38 UTC

Posted patches to rhkernel-list.

Larry

Comment 25 Aristeu Rozanski 2011-10-19 15:28:31 UTC

Patch(es) available on kernel-2.6.32-211.el6

Comment 29 Larry Woodman 2011-11-07 18:39:56 UTC

Hi Chris, I'll ping the performance group(Mark Wagner specifically).  They already verified that the fix that went into the kernel does infact fix the regression.  Since they are the only ones with the hardware and benchmark to reproduce this problem they will have to do the QA for us and move the BZ to VERIIED. 

Larry

Comment 34 errata-xmlrpc 2011-12-06 14:15:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html

Note You need to log in before you can comment on or make changes to this bug.