Hide Forgot
Description of problem: We created 120 Vms on a 80core Intel Xeon E7-4870 quad-socket system, and observed 25% CPU utilization of the entire system even those VMs were idle. Disable cgconfig or applied the patches for cgroup would solve this problem. Version-Release number of selected component (if applicable): RHEL6.1 (2.6.32-125.el6.x86_64 as native) KVM: 0.12.1 Guest RHEL6.1(2.6.32-71.el6.x86_64) How reproducible: Create 120VMs on the RHEL6.1 snap 1 system. Steps to Reproduce: 1. Install RHEL6.1 snapshot 1 (2.6.32-125.el6.x86_64) on Intel Xeon E7-4870 system (WSM-EX). 2. created 120 RHEL6.1 VMs 3. using top to collect system CPU utilization, and it reached 25% when the VMs doing nothing. Actual results: 1. System CPU utilization is 25% for 120 idle VMs. Expected results: 1. CPU utilization with idle VMs should be less than 1% Additional info: 1. VM configuration 1 tile contains 6 types of VMs, two of them has 2 vcpus, others has one vcpus, 5 of those VMs were assigned 1 or 2 10G SRIOV VFs. VMs were created with following commands, using x2apic and removed usb, sound and video devices. /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu qemu64,+x2apic -enable-kvm -m 2560 -smp 2,sockets=2,cores=1,threads=1 -name app1 -uuid b7a2140e-ede1-319d-0001-8eefd210ba04 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/app1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/root/mnt/tile1/os/appserver1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:2 -vga std -device pci-assign,host=07:10.1,id=hostdev0,configfd=29,bus=pci.0,addr=0x3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -no-kvm-pit-reinjection 2. CPU utilization with different tile # from top Top_CPU% us sy ni id wa hi si st Total 5tile 0.42% 1.23% 0.00% 98.10% 0.02% 0.00% 0.24% 0.00% 1.90% 10tile 0.28% 1.67% 0.00% 95.05% 0.01% 0.00% 2.98% 0.00% 4.95% 15tile 0.55% 4.97% 0.00% 85.12% 0.06% 0.00% 9.30% 0.00% 14.88% 20tile 0.88% 9.75% 0.00% 75.54% 0.24% 0.00% 13.59% 0.00% 24.46% 3. perf showed the most overhead comes from tg_shares_up oprofile showed the same results: CPU: Intel Architectural Perfmon, speed 2393.88 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 6000 Counted INST_RETIRED events (number of instructions retired) with a unit mask of 0x00 (No unit mask) count 6000 samples % samples % image nam app name symbol name 2533563 80.9886 742681 64.5997 vmlinux vmlinux tg_shares_up 104855 3.3518 86379 7.5134 vmlinux vmlinux rb_get_reader_page 88494 2.8288 78282 6.8091 vmlinux vmlinux ring_buffer_consume 78932 2.5232 26442 2.3000 vmlinux vmlinux generic_exec_single 54653 1.7471 40614 3.5327 vmlinux vmlinux find_next_bit 52626 1.6823 50637 4.4045 oprofile oprofile /oprofile 26927 0.8608 9068 0.7888 vmlinux vmlinux __set_se_shares 19265 0.6158 7033 0.6117 kvm_intel kvm_intel /kvm_intel 4. there was already patches for the issues in https://lkml.org/lkml/2010/10/16/17 5. we tried 2.6.38.2 upstream Linux kernel, the system CPU utilization droped tp 0.6% immediately. Disable cgconfig service could also solve this problem.
Created attachment 490682 [details] Perf results of bug 694696 Perf results
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Seems it is a duplicate of 'Bug 623712 - Scalability problems with 'cpu' cgroup controller on large SMP systems (96 cpus)'
*** This bug has been marked as a duplicate of bug 623712 ***