Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 694696

Summary:

High KVM CPU utilization on multiple-processor system with 120 idle VMs caused by cgroup

Product:

Red Hat Enterprise Linux 6

Reporter:

Zhang, Jian <jian.zhang>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

6.1

CC:

mkenneth, notting, syeghiay, tburke, virt-maint

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-07-27 20:00:28 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Perf results of bug 694696	none

Description Zhang, Jian 2011-04-08 03:35:45 UTC

Description of problem:
We created 120 Vms on a 80core Intel Xeon E7-4870 quad-socket system, and observed 25% CPU utilization of the entire system even those VMs were idle.
Disable cgconfig or applied the patches for cgroup would solve this problem.

Version-Release number of selected component (if applicable):
RHEL6.1 (2.6.32-125.el6.x86_64 as native)
KVM: 0.12.1
Guest RHEL6.1(2.6.32-71.el6.x86_64)

How reproducible:
Create 120VMs on the RHEL6.1 snap 1 system.

Steps to Reproduce:
1. Install RHEL6.1 snapshot 1 (2.6.32-125.el6.x86_64) on Intel Xeon E7-4870 system (WSM-EX).
2. created 120 RHEL6.1 VMs 
3. using top to collect system CPU utilization, and it reached 25% when the VMs doing nothing. 
  
Actual results:
1. System CPU utilization is 25% for 120 idle VMs. 


Expected results:
1. CPU utilization with idle VMs should be less than 1%


Additional info:
1. VM configuration
1 tile contains 6 types of VMs, two of them has 2 vcpus, others has one vcpus, 5 of those VMs were assigned 1 or 2 10G SRIOV VFs.
VMs were created with following commands, using x2apic and removed usb, sound and video devices. 
/usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu qemu64,+x2apic -enable-kvm -m 2560 -smp 2,sockets=2,cores=1,threads=1 -name app1 -uuid b7a2140e-ede1-319d-0001-8eefd210ba04 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/app1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/root/mnt/tile1/os/appserver1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2 -vga std -device pci-assign,host=07:10.1,id=hostdev0,configfd=29,bus=pci.0,addr=0x3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -no-kvm-pit-reinjection
2. CPU utilization with different tile # from top
Top_CPU%	us	sy	ni	id	wa	hi	si	st	Total
5tile	0.42%	1.23%	0.00%	98.10%	0.02%	0.00%	0.24%	0.00%	1.90%
10tile	0.28%	1.67%	0.00%	95.05%	0.01%	0.00%	2.98%	0.00%	4.95%
15tile	0.55%	4.97%	0.00%	85.12%	0.06%	0.00%	9.30%	0.00%	14.88%
20tile	0.88%	9.75%	0.00%	75.54%	0.24%	0.00%	13.59%	0.00%	24.46%

3. perf showed the most overhead comes from tg_shares_up
oprofile showed the same results:
CPU: Intel Architectural Perfmon, speed 2393.88 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 6000
Counted INST_RETIRED events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 6000
samples  %        samples  %        image nam   app name  symbol name
2533563  80.9886  742681   64.5997  vmlinux   vmlinux      tg_shares_up
104855    3.3518  86379     7.5134  vmlinux   vmlinux      rb_get_reader_page
88494     2.8288  78282     6.8091  vmlinux   vmlinux      ring_buffer_consume
78932     2.5232  26442     2.3000  vmlinux   vmlinux      generic_exec_single
54653     1.7471  40614     3.5327  vmlinux   vmlinux      find_next_bit
52626     1.6823  50637     4.4045  oprofile   oprofile      /oprofile
26927     0.8608  9068      0.7888  vmlinux     vmlinux     __set_se_shares
19265     0.6158  7033      0.6117  kvm_intel    kvm_intel   /kvm_intel

4. there was already patches for the issues in https://lkml.org/lkml/2010/10/16/17
5. we tried 2.6.38.2 upstream Linux kernel, the system CPU utilization droped tp 0.6% immediately. Disable cgconfig service could also solve this problem.

Comment 1 Zhang, Jian 2011-04-08 03:39:16 UTC

Created attachment 490682 [details]
Perf results of bug 694696

Perf results

Comment 3 RHEL Program Management 2011-04-08 03:57:29 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 5 Dor Laor 2011-04-10 11:50:15 UTC

Seems it is a duplicate of 'Bug 623712 - Scalability problems with 'cpu' cgroup controller on large SMP systems (96 cpus)'

Comment 8 Linda Wang 2011-07-27 20:00:28 UTC


*** This bug has been marked as a duplicate of bug 623712 ***