Red Hat Bugzilla – Full Text Bug Listing
|Summary:||KVM multi-VCPU clock skew? Massive stalls without I/O or CPU usage|
|Product:||[Fedora] Fedora||Reporter:||Warren Togami <wtogami>|
|Component:||qemu||Assignee:||Justin M. Forbes <jforbes>|
|Status:||CLOSED WONTFIX||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||13||CC:||amit.shah, berrange, dwmw2, ehabkost, gcosta, itamar, jaswinder, jforbes, jlayton, knoel, markmc, mjw, ondrejj, quintela, riel, scottt.tw, tburke, virt-maint, virt-maint, walters|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2011-06-27 11:02:13 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||562808, 580954|
Description Warren Togami 2010-02-25 13:16:23 EST
kernel-2.6.32-17.el6.x86_64 KVM guest running on F-12 host. rhel6-nightly.lab.bos.redhat.com is running RHEL6. It runs livecd-creator to generate RHEL6 nightly LiveCD's. While running with multiple VCPU's it very quickly behaves badly. For example: Installing: coreutils-libs ##################### [243/987] Installing: gzip ##################### [244/987] Installing: cracklib ##################### [245/987] Installing: cracklib-dicts ##################### [246/987] Installing: coreutils ##################### [247/987] While installing packages it slows to a near stand-still, sometimes inching along with another package after 10-20 minutes. During the stall other ssh session fail to respond. But existing TCP connections do not break and pings respond normally. vmstat indicates no I/O or swapping, and the host reports very little CPU usage by this guest. After several hours the task completes, but responsiveness remains erratic thereafter. For these reasons riel suspects this might be another multi-VCPU clock skew issue similar to the issue we experienced with RHEL-5.4 kvm guests. Workarounds: 1) Limit to 1 VCPU in the guest. 2) Downgrade to F-12's kernel.
Comment 2 RHEL Product and Program Management 2010-02-25 13:49:49 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion.
Comment 3 Glauber Costa 2010-02-25 15:47:21 EST
First thing to do to confirm, that, boot your kernel with the option "no-kvmclock".
Comment 4 Warren Togami 2010-03-05 16:22:44 EST
kernel-22.214.171.124-174.2.22.fc12.x86_64 It seems that this F-12 kernel is unaffected. kernel-2.6.33-1.fc13.x86_64 This F-13 kernel seems to exhibit identical bad behavior as described in Comment #1 by the EL-6 kernel. no-kvmclock seems to allow this F-13 kernel to survive a bit more, but it inevitably fails with: Clocksource tsc unstable (delta = 116429542 ns) Switching to clocksource acpi_pm INFO: task kswapd0:50 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kswapd0 D 0000000000000001 2560 50 2 0x00000000 ffff880037743720 0000000000000046 0000000000000000 ffffffff81071258 ffff8800377436d0 0000000000000001 ffff88007d358000 ffff880037743fd8 ffff880037743fd8 000000000000fb80 00000000001d5e80 ffff88007d3583f8 Call Trace: [<ffffffff81071258>] ? cpu_clock+0x43/0x5e [<ffffffffa0146967>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs] [<ffffffff814766ce>] io_schedule+0x73/0xb5 [<ffffffffa0146975>] nfs_wait_bit_uninterruptible+0xe/0x12 [nfs] [<ffffffff81476c78>] __wait_on_bit+0x48/0x7b [<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce [<ffffffff81476d19>] out_of_line_wait_on_bit+0x6e/0x79 [<ffffffffa0146967>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs] [<ffffffff8106ba83>] ? wake_bit_function+0x0/0x33 [<ffffffffa0146965>] nfs_wait_on_request+0x28/0x2a [nfs] [<ffffffffa014b130>] nfs_sync_mapping_wait+0xfa/0x22d [nfs] [<ffffffffa014b2fd>] nfs_wb_page+0x9a/0xca [nfs] [<ffffffffa013d1cc>] nfs_release_page+0x41/0x5b [nfs] [<ffffffff810d9b33>] try_to_release_page+0x37/0x40 [<ffffffff810e62b1>] shrink_page_list+0x2e2/0x48f [<ffffffff8107baae>] ? lock_release_holdtime+0x34/0xe3 [<ffffffff81478d66>] ? _raw_spin_unlock_irq+0x30/0x3c [<ffffffff810e67f2>] shrink_inactive_list+0x394/0x659 [<ffffffff810710e1>] ? sched_clock_local+0x1c/0x82 [<ffffffff8122ce59>] ? __up_read+0x83/0x8c [<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce [<ffffffff8107ba78>] ? trace_hardirqs_off+0xd/0xf [<ffffffff81071258>] ? cpu_clock+0x43/0x5e [<ffffffff8107baae>] ? lock_release_holdtime+0x34/0xe3 [<ffffffff810e6e2a>] shrink_zone+0x373/0x424 [<ffffffff810e7c55>] balance_pgdat+0x38e/0x5e8 [<ffffffff810e527e>] ? isolate_pages_global+0x0/0x203 [<ffffffff810e810a>] kswapd+0x25b/0x279 [<ffffffff8106ba4a>] ? autoremove_wake_function+0x0/0x39 [<ffffffff810e7eaf>] ? kswapd+0x0/0x279 [<ffffffff8106b5a8>] kthread+0x9a/0xa2 [<ffffffff8107d1c8>] ? trace_hardirqs_on_caller+0x111/0x135 [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 [<ffffffff814790d0>] ? restore_args+0x0/0x30 [<ffffffff8106b50e>] ? kthread+0x0/0xa2 [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 no locks held by kswapd0/50. INFO: task livecd-creator:31488 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment 5 Warren Togami 2010-03-05 16:25:16 EST
riel says this info might be important. /proc/cpuinfo of the F-12 host: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz stepping : 6 cpu MHz : 1999.998 cache size : 4096 KB physical id : 3 siblings : 2 core id : 1 cpu cores : 2 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips : 4666.69 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
Comment 6 Warren Togami 2010-03-05 16:37:09 EST
<riel> I see constant_tsc but not nonstop_tsc <riel> that is a slightly problematic CPU <riel> the TSC is stopped whenever a CPU goes into a power saving mode <warren> riel: could this CPU be exposing this kvm bug, and it is generally not a problem elsewhere? <riel> well, the kernel may be able to compensate for it - I don't know
Comment 8 Glauber Costa 2010-06-22 15:44:44 EDT
Can you retest this with latest kernel ?
Comment 9 Colin Walters 2010-06-29 09:25:25 EDT
(In reply to comment #8) > Can you retest this with latest kernel ? I can't at the moment, sorry, and I plan to have the machine stop using NFS. Feel free to deprioritize or close this bug as needed.
Comment 10 Bug Zapper 2011-06-02 12:24:54 EDT
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 11 Bug Zapper 2011-06-27 11:02:13 EDT
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.