Bug 705082
| Summary: | qemu-kvm takes lots of CPU resources due to _spin_lock_irqsave on a 64 cpu machine | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Moran Goldboim <mgoldboi> |
| Component: | kernel | Assignee: | Karen Noel <knoel> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.1 | CC: | juzhang, knoel, kzhang, mathieu-acct, tburke, toracat, wnefal+redhatbugzilla, xfu, xnaveira, ykaul, zliu |
| Target Milestone: | rc | Keywords: | TestBlocker |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-153.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Current posix-timers have scalability issue due to spin lock
Effect: Scalability problem with KVM/QEMU that hit very hard
a single spinlock (idr_lock) in posix-timers code, on its 48 core machine. Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time used in ticket_spin_lock, from lock_timer
Fix: Switching to RCU is quite easy, IDR being already RCU ready.
Result: Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime(). Before : real 1m18.669s
After : real 0m3.296s
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 13:31:41 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Upstream patch: https://lkml.org/lkml/2011/4/5/235 command line: qemu 81662 1 32 06:17 ? 01:43:03 /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name R6A-113 -uuid dde328fe-d6ae-437a-b570-c50acba21953 -smbios type=1,manufacturer=Red Hat,product=RHEL,version=6Server-6.1.0.1.el6,serial=4C4C4544-004A-5A10-805A-C2C04F514E31_,uuid=dde328fe-d6ae-437a-b570-c50acba21953 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/R6A-113.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2011-05-16T10:17:25 -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/06b29e0f-8c22-431e-a6f1-75989a582c52/2b27b725-9063-4a27-899a-3ec8d02c1ade/images/3e858fde-f85b-4b0e-8f60-3e523e7ffb52/f5fdc830-9e42-49e2-8680-73fbd92097c9,if=none,id=drive-virtio-disk0,format=qcow2,serial=0e-8f60-3e523e7ffb52,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=143 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1b:4a:16:9f:7a,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/R6A-113.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:118,password -k en-us -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Current posix-timers have scalability issue due to spin lock
Effect: Scalability problem with KVM/QEMU that hit very hard
a single spinlock (idr_lock) in posix-timers code, on its 48 core machine. Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time used in ticket_spin_lock, from lock_timer
Fix: Switching to RCU is quite easy, IDR being already RCU ready.
Result: Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime(). Before : real 1m18.669s
After : real 0m3.296s
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #4) > Fix: Switching to RCU is quite easy, IDR being already RCU ready. > Result: Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime(). > Before : real 1m18.669s > After : real 0m3.296s This doesn't seem to belong in a technical note. Run 6 vms on host which has 64 cpus, tested several times, seems _spin_lock_irqsave just around 0.27%.
Hi,Avi/Moran
Would you please tell me which part I miss?
Host kernel version
#uname -r
2.6.32-156.el6.x86_64
Kvm version
qemu-kvm-0.12.1.2-2.163.el6.x86_64
#perf record -a -g -f
#perf report -g
- 0.27% qemu-kvm [kernel.kallsyms] [k] _spin_lock_irqsave
- _spin_lock_irqsave
+ 15.82% add_wait_queue
+ 15.60% remove_wait_queue
+ 13.50% lock_timer_base
+ 8.54% lock_hrtimer_base
................
+ 6.01% eventfd_poll
- 3.50% lock_timer
+ 59.73% sys_timer_gettime
+ 40.27% sys_timer_settime
CML:
#/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -smp 2,sockets=2,cores=1,threads=1 -m 4G -boot c -drive file=/root/zhangjunyi/RHEL-Server-6.1-64-virtio.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,boot=on,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -net none -vnc :9 -qmp tcp:0:4444,server,nowait -monitor stdio -rtc base=localtime -no-acpi
#/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -smp 2,sockets=2,cores=1,threads=1 -m 4G -boot c -drive file=/root/zhangjunyi/RHEL-Server-6.1-64-virtio.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,boot=on,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -net none -vnc :9 -qmp tcp:0:4444,server,nowait -monitor stdio -rtc base=localtime -no-acpi
# /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -smp 1,sockets=1,cores=1,threads=1 -m 2G -uuid 9e6f04cf-2ad7-45aa-9333-2d2ee26570c6 -boot c -drive file=/root/zhang/RHEL-Server-6.1-64-virtio.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,boot=on,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -net none -vnc :11 -qmp tcp:0:5556,server,nowait -monitor stdio -rtc base=localtime -no-acpi
..........................
Write a test program that executes timer_gettime() in a loop. Run multiple instances, observe high spinlock contention.
With the patch, the same number of instances will not cause contention.
Example program:
#include <signal.h>
#include <time.h>
#include <stdlib.h>
int main(int ac, char **av)
{
timer_t timerid;
struct itimerspec its;
int r;
r = timer_create(CLOCK_REALTIME, NULL, &timerid);
if (r == -1) {
perror("timer_create");
exit(1);
}
while (1) {
r = timer_gettime(timerid, &its);
if (r == -1) {
perror("timer_gettime");
exit(1);
}
}
}
Note: will probably need a large number of cpus and instances.
>
> #include <signal.h>
> #include <time.h>
> #include <stdlib.h>
>
> int main(int ac, char **av)
> {
> timer_t timerid;
> struct itimerspec its;
> int r;
>
> r = timer_create(CLOCK_REALTIME, NULL, &timerid);
> if (r == -1) {
> perror("timer_create");
> exit(1);
> }
> while (1) {
> r = timer_gettime(timerid, &its);
> if (r == -1) {
> perror("timer_gettime");
> exit(1);
> }
> }
> }
>
>
> Note: will probably need a large number of cpus and instances.
#gcc -o test -lrt test.c
Launched 400 instances on host(amd) which has 48 cpus, tried 4 times.
_spin_lock_irqsave is around 4.7%.
#ps -aux | grep test |wc -l
401
Hi,Avi
Would you please tell me which part I missed?
Best Regards,
Junyi
Launched 500 instances on host(intel) which has 64 cpus. _spin_lock_irqsave is around 6.5%. (In reply to comment #18) > Hi,Avi > > Would you please tell me which part I missed? > I don't think you missed anything. It probably happens on specific hardware combinations. Try the reporter's machine please. Really thanks mgoldboi for lending me his host. reproduced with qemu-kvm-0.12.1.2-2.164.el6.x86_64,Marked qa_ack+ #gcc -o test -lrt test.c Launched 500 instances on intel(L7555) which has 64 cpus Results: _spin_lock_irqsave is around 96.4%. Please note: 1. this issue just can be reproduced on specific hardware. Intel(R) Xeon(R) CPU L7555 @ 1.87GHz can reproduce this issue. model name : Intel(R) Xeon(R) CPU L7555 @ 1.87GHz 2. I also tried x7550,Intel(R) W3520 and amd-6172, these these platforms can't reproduce this issue. Patch(es) available on kernel-2.6.32-153.el6 thanks for mgoldboi lend intel(L7555) to me. reproduce on kernl-2.6.32-128.el6.x86_64 after 3 steps actual result: about 47.7% _spin_lock_irqsave kernel.kallsyms] verify on kernel-2.6.32-202.el6.x86_64 after 3 steps actual result: 3.9% _spin_lock_irqsave [kernel.kallsyms] reproduce to steps: 1. boot 500 guests kernel128]# ps aux|grep qemu|wc -l 500 2. gcc -o test -lrt test.c ./test 3.perf top base on above testing result, this bug has been fixed. According to comment30,set this issue as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |
Description of problem: when running vms on a 64 cpu machine each vm takes ~35% cpu when idle, consuming all cpu resources. running perf report gives the following output: # Overhead Command Shared Object Symbol # ........ ............. ................... ......................................... # 78.70% qemu-kvm [kernel.kallsyms] [k] _spin_lock_irqsave | --- _spin_lock_irqsave | |--99.30%-- lock_timer | | | |--64.52%-- sys_timer_gettime | | system_call_fastpath | | __timer_gettime_new | | | --35.48%-- sys_timer_settime | system_call_fastpath | __timer_settime_new --0.70%-- [...] 1.69% qemu-kvm [kvm_intel] [k] vmx_vcpu_run | --- vmx_vcpu_run | |--97.48%-- ftrace_raw_output_kvm_inj_exception | ftrace_raw_event_kvm_mmio | vfs_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | __GI___ioctl | |--2.31%-- __GI___ioctl --0.20%-- [...] 0.66% qemu-kvm [kernel.kallsyms] [k] fget_light | --- fget_light | |--76.00%-- do_select | core_sys_select | sys_select | system_call_fastpath | __GI_select | | | --100.00%-- (nil) | |--10.80%-- sys_read | system_call_fastpath | __read | |--5.16%-- sys_write | system_call_fastpath | 0x3f35a0e50d Version-Release number of selected component (if applicable): 2.6.32-131.0.1.el6.x86_64 How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: