Bug 742414
| Summary: | serious SPECjbb regression in KVM guest due to cpu cgroups | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mark Wagner <mwagner> |
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
| Status: | CLOSED ERRATA | QA Contact: | Mike Gahagan <mgahagan> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.2 | CC: | abaron, cpelland, ddumas, dshaks, kzhang, mjenner, moli, perfbz, syeghiay, tburke |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | 6.2 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-211.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 14:15:40 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 741979, 748554 | ||
| Attachments: | |||
Mark, I assume this is CPU bound, can you get profiles of the -141 and -142 kernels while running SPECjbb2005 so I can see where the time is going? Larry Mark, your original email subject on this was "Huge performance difference when starting guest via cmdline vs libvirt". Is this true and if yes, does one use a cpu cgroup and the other not use one? Larry This appears to be a performance regression caused by the 12-part scheduler patch series I backported from upstream to address BZ623712 and was included by Aris in the -142 kernel. ------------------------------------------------------------------------------- %changelog * Fri Apr 29 2011 Aristeu Rozanski <arozansk> [2.6.32-142.el6] ... - [kernel] sched: Drop rq->lock from idle_balance() (Larry Woodman) [623712] - [kernel] sched: Fix unregister_fair_sched_group() (Larry Woodman) [623712] - [kernel] sched: Allow update_cfs_load() to update global load (Larry Woodman) [623712] - [kernel] sched: Implement demand based update_cfs_load() (Larry Woodman) [623712] - [kernel] sched: Update shares on idle_balance (Larry Woodman) [623712] - [kernel] sched: Add sysctl_sched_shares_window (Larry Woodman) [623712] - [kernel] sched: Introduce hierarchal order on shares update list (Larry Woodman) [623712] - [kernel] sched: Fix update_cfs_load() synchronization (Larry Woodman) [623712] - [kernel] sched: Fix load corruption from update_cfs_shares() (Larry Woodman) [623712] - [kernel] sched: Make tg_shares_up() walk on-demand (Larry Woodman) [623712] - [kernel] sched: Implement on-demand (active) cfs_rq list (Larry Woodman) [623712] - [kernel] sched: Rewrite tg_shares_up (Larry Woodman) [623712] ... ------------------------------------------------------------------------------- I had Shak verify that these patches did fix BZ623712, see the attachment in comment# 36. BZ623712 is about the time it takes to create KVM 130 guests in 130 separate cgroups where this BZ is about the performance of a single KVM guest running in a single cgroup and executing SPECjbb after the guest has been created. Larry Larry Guests started with libvirt use cgroups if they are enabled on the host. When we start the guests with a script (command line) they do not use cgroups. Mark, can you get me profiles for both -141 & -142 kernels and if possible try to determine if the startup or runtime or both is slower in the -142 kernel. I will search the upstream commits to see if there were any recent changes that address this. Larry Created attachment 525811 [details]
guest perf top for the -141 kernel
Created attachment 525812 [details]
guest perf top data for the -142 kernel
Created attachment 525813 [details]
host perf top for the -141 kernel
Created attachment 525814 [details]
host perf top for the -142 kernel
Created attachment 525815 [details]
guest vmstat for the -141 kernel
Created attachment 525816 [details]
guest vmstat for the -142 kernel
Created attachment 525817 [details]
host vmstat for the -141 kernel
Created attachment 525818 [details]
host vmstat for the -142 kernel
I dont know whats going on here, there is no big differences between the perftop or vmstat outputs between the -141 or -142 guests or hosts! Mark, are both the hosts and guests running -141/-142 or it is running something else(6.0 or 6.1)??? Larry The guest stays at RHEL6.0 (2.6.32-71). I vary the host kernel only. Mark, can we run this again with -141 & -142 and compare what values are in the cpuset & cpuaccount & cpu area of the cgroup mount points? It alomst seems like we are limiting the amount of CPU that the 12 guests are getting. Larry I verified this is a performance regression caused by the 12-part scheduler patch series I backported from upstream to address the another problem in cgroups where creation did not scale(BZ623712). I've tried to isolate exactly which of those patches causes this but the system does not boot if I remove any of them. I also verified that the upstream kernel does not suffer from this problem. I have created a single patch that is about 1300 lines long that I am using to debug this problem. I am looking at what additional upstream changes have been made to the scheduler, specifically load balancer related changes, to address this problem. There are hundreds of them, the upstream scheduler has been changes a lot since 2.6.32! I understand the urgency of this issue and I am working as hard as possible and spending all of my time on it. The upstream patches that caused this regression went into the kernel on April 29th, yet this problem was discovered on September 29th, exactly 5 months later. I dont know what we can do to test for this sort of thing earlier but it would have been much more comfortable for me if I knew there was a performance problem a month or two or three or four ago, performance regressions are the most difficult problems to find and fix! Also, I dont know there are other application running in KVM guests within cgroups that also suffer from this degradation or its limited to SPECjbb. Can someone answer this? Larry Woodman Status: early numbers say we found it with the latest upstream backports: Stock RHEL6.1: root@dhcp47-18 SPECjbb2005 # grep throu 131.el6.txt throughput = 429432.45 SPECjbb2005 bops Stock RHEL6.2: root@dhcp47-18 SPECjbb2005 # grep throu 207.el6.txt throughput = 321762.84 SPECjbb2005 bops Current upstream kernel: root@dhcp47-18 SPECjbb2005 # grep through upstream_01.txt throughput = 452214.41 SPECjbb2005 bops Current 6.2 with latest upstream sched changes: root@dhcp47-18 SPECjbb2005 # grep throu 207.el6.207sched.txt throughput = 464825.91 SPECjbb2005 bops We are still testing and as you know there is always some weird problem that shows up!!! I just fixed the usual kABI breakers associated with backporting anything into RHEL and kicked the build off in brew. I'll make this kernel available as soon as its done and update the BZ as we get more data. Larry Woodman The kernel built in brew which I think fixes this problem is located here: barstool.build:/mnt/redhat/brewroot/packages/kernel/2.6.32/207.el6.SPECjbb I'll post the patches once Jeff Burke gets a chance to test it on Beaker. Larry Posted patches to rhkernel-list. Larry Patch(es) available on kernel-2.6.32-211.el6 Hi Chris, I'll ping the performance group(Mark Wagner specifically). They already verified that the fix that went into the kernel does infact fix the regression. Since they are the only ones with the hardware and benchmark to reproduce this problem they will have to do the QA for us and move the BZ to VERIIED. Larry Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |
Description of problem: When running a single KVM guest on the RHEL6.2 stream, we have discovered a serious regression in the performance of SPECjbb when run in a KVM guest. Version-Release number of selected component (if applicable): Initially found in the -192 kernel Traced and it originated with the -142 kernel How reproducible: Everytime Steps to Reproduce: 1.Run SPECjbb in a KVM Guest (RHEL6.0 guest) on a -142 or greater host kernel 2. 3. Actual results: # grep through maw_test25_rhel142.txt throughput = 177317.29 SPECjbb2005 bops throughput = 272970.10 SPECjbb2005 bops throughput = 143913.56 SPECjbb2005 bops throughput = 264087.82 SPECjbb2005 bops Expected results: From a -141 kernel on the host # grep through maw_test26_rhel141.txt throughput = 179524.72 SPECjbb2005 bops throughput = 325709.59 SPECjbb2005 bops throughput = 429984.34 SPECjbb2005 bops throughput = 429432.45 SPECjbb2005 bops Additional info: have been working with Larry and tracked this to cpu cgroups With cpu cgroups disabled ( cgroup_disable=cpu) on the -142 kernel we get better performance than the with the -141 kernel (with cgroups on) # grep through maw_test28_rhel142_cgroups_disabled.txt throughput = 181539.05 SPECjbb2005 bops throughput = 342462.70 SPECjbb2005 bops throughput = 458423.38 SPECjbb2005 bops throughput = 445874.35 SPECjbb2005 bops