Bug 999296
Summary: | Win2012.64 guest hang on Intel(R) Xeon(R) CPU 5130 host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xhan | ||||
Component: | qemu-kvm | Assignee: | Yvugenfi <yvugenfi> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | acathrow, bcao, bsarathy, dfleytma, drjones, juzhang, michen, mkenneth, qzhang, virt-maint, xhan, yvugenfi | ||||
Target Milestone: | rc | ||||||
Target Release: | 7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-15 13:22:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1000882 | ||||||
Attachments: |
|
Description
xhan
2013-08-21 06:12:24 UTC
Is the guest still running? i.e. are the vcpu threads still consuming cycles (check top), and/or is the rip changing if you check 'info registers' a few times? If so, then it might help to get a Windows DMP. top command output: ----------------------------------------------------------------------- Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 12.6%us, 9.4%sy, 0.0%ni, 76.8%id, 0.8%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 16329084k total, 16119504k used, 209580k free, 48464k buffers Swap: 58720240k total, 19308k used, 58700932k free, 7101752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32619 root 20 0 8647m 8.1g 4948 R 256.4 51.8 42:23.31 qemu-kvm ----------------------------------------------------------------------- rip only changes from "RIP=fffff8027f21d5e2" to "RIP=fffff801433715dc" in info registers output. (qemu) info registers RAX=000000000000000c RBX=000000000000000c RCX=000000000000000c RDX=0000000000000070 RSI=000000000000000c RDI=fffff80144841698 RBP=000000000000000c RSP=fffff801448415d8 R8 =fffff80144841698 R9 =0000000000000001 R10=0000000000000024 R11=fffffa8006e79f10 R12=0000000000000300 R13=fffffa8006e78300 R14=0000000000000000 R15=fffff8014339e490 RIP=fffff801433715dc RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 ffffffff 00000000 DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0053 00000000745f0000 00003c00 0040f300 DPL=3 DS [-WA] GS =002b fffff80142f0e000 ffffffff 00c0f300 DPL=3 DS [-WA] LDT=0000 0000000000000000 ffffffff 00000000 TR =0040 fffff80144835080 00000067 00008b00 DPL=0 TSS64-busy GDT= fffff80144834000 0000007f IDT= fffff80144834080 00000fff CR0=80050031 CR2=000007ff482ea0c0 CR3=0000000000187000 CR4=000006f8 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 FCW=027f FSW=3800 [ST=7] FTW=80 MXCSR=00001f80 FPR0=9fc0000000000000 4008 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=ea9835d47887d91d97b73beca6df947e XMM01=0000000000000011fffffa8006fffe10 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000 XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000 XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000 XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000 When the guest turns into hang status, occationally qemu-kvm would output error messages virtio_ioport_write: unexpected address 0x13 value 0x0. 1. guest works well on Sandybridge host with -cpu SandyBridge, -cpu Westmere, -cpu Conroe cpu mode host cpuinfo: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz stepping : 9 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6784.34 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual 2. can not reproduce with Win7.64 3. top after guest hang Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu0 : 42.9%us, 57.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 20.0%us, 80.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 16.7%us, 50.0%sy, 0.0%ni, 33.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16329084k total, 11878968k used, 4450116k free, 53088k buffers Swap: 58720240k total, 19308k used, 58700932k free, 7093656k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ WCHAN COMMAND 499 root 20 0 4525m 4.0g 4920 R 246.3 25.9 11:17.89 - qemu-kvm 4. kvm_stat kvm statistics efer_reload 0 0 exits 4499473 6374 fpu_reload 194411 86 halt_exits 31250 25 halt_wakeup 30558 27 host_state_reload 1802632 6057 hypercalls 0 0 insn_emulation 526907 131 insn_emulation_fail 65 0 invlpg 138937 0 io_exits 1690371 6001 irq_exits 308306 50 irq_injections 110721 50 irq_window 11007 25 largepages 29 0 mmio_exits 61049 31 mmu_cache_miss 12140 0 mmu_flooded 1500 0 mmu_pde_zapped 18627 0 mmu_pte_updated 12350 0 mmu_pte_write 85467 0 mmu_recycled 0 0 mmu_shadow_zapped 8560 0 mmu_unsync 853 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 1254739 0 pf_guest 219912 0 remote_tlb_flush 151012 0 request_irq 0 0 signal_exits 1 0 tlb_flush 296338 0 5. cmd: /usr/libexec/qemu-kvm -name vm1 -nodefaults -monitor stdio -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x4 \ -drive file=/win2012-64-virtio.raw,index=0,if=none,id=drive-ide0-0-0,media=disk,cache=none,snapshot=off,format=raw,aio=native \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0 \ -device e1000,netdev=idLd6elX,mac=9a:3a:3b:3c:3d:3e,id=id2MXw9O \ -netdev tap,id=idLd6elX,script=/scripts/qemu-ifup-switch \ -m 4G -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -cpu Conroe -M rhel6.5.0 -vnc :0 -vga std \ -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off -enable-kvm 6. guest: Win2012.64 7. result: guest hang, cpu 100% used, can not ping guest from host 8. host that hang processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz stepping : 11 cpu MHz : 249.999 cache size : 4096 KB physical id : 3 siblings : 2 core id : 1 cpu cores : 2 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm dts tpr_shadow vnmi flexpriority bogomips : 3990.05 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual [root@intel-5130-16-1 windows]# free -m total used free shared buffers cached Mem: 15946 11652 4293 0 53 6980 -/+ buffers/cache: 4619 11327 Swap: 57343 18 57325 Created attachment 789044 [details]
register info
(In reply to xhan from comment #5) > kvm statistics > > efer_reload 0 0 > exits 4499473 6374 > fpu_reload 194411 86 > halt_exits 31250 25 > halt_wakeup 30558 27 > host_state_reload 1802632 6057 > hypercalls 0 0 > insn_emulation 526907 131 > insn_emulation_fail 65 0 > invlpg 138937 0 > io_exits 1690371 6001 > irq_exits 308306 50 > irq_injections 110721 50 > irq_window 11007 25 > largepages 29 0 > mmio_exits 61049 31 > mmu_cache_miss 12140 0 > mmu_flooded 1500 0 > mmu_pde_zapped 18627 0 > mmu_pte_updated 12350 0 > mmu_pte_write 85467 0 > mmu_recycled 0 0 > mmu_shadow_zapped 8560 0 > mmu_unsync 853 0 > nmi_injections 0 0 > nmi_window 0 0 > pf_fixed 1254739 0 > pf_guest 219912 0 > remote_tlb_flush 151012 0 > request_irq 0 0 > signal_exits 1 0 > tlb_flush 296338 0 > Are any of these counts climbing quickly? Such as the interrupts? If so, then the symptoms are quite similar to a problem we had with the e1000 model and win2012, but this time there's no e1000. Or wait, is there? The cmdline in comment 0 doesn't have one configured, but the command line in comment 5 does. kvm statistics efer_reload 0 0 exits 12472328 8336 fpu_reload 400786 111 halt_exits 32808 0 halt_wakeup 33796 0 host_state_reload 7120683 6389 hypercalls 0 0 insn_emulation 862899 241 insn_emulation_fail 137 0 invlpg 176507 31 io_exits 6947633 6372 irq_exits 1017898 1397 irq_injections 255049 103 irq_window 69279 45 largepages 75 0 mmio_exits 49857 0 mmu_cache_miss 15704 0 mmu_flooded 1686 0 mmu_pde_zapped 24986 0 mmu_pte_updated 66547 0 mmu_pte_write 102439 0 mmu_recycled 0 0 mmu_shadow_zapped 9908 0 mmu_unsync 1153 1 nmi_injections 0 0 nmi_window 0 0 pf_fixed 2367442 41 pf_guest 292476 8 remote_tlb_flush 320097 25 request_irq 0 0 signal_exits 95 0 tlb_flush 426751 52 1) io_exits, irq_exits climb quickly. 2) comment 5 uses e1000 to exclude the virtio problems (In reply to xhan from comment #4) > When the guest turns into hang status, occationally qemu-kvm would output > error messages > virtio_ioport_write: unexpected address 0x13 value 0x0. Try with the guest with the latest prewhql driver for virtio, the error message is not met. Hang problem still exists. Please remove the e1000 from the config. If the problem is hit again, then check kvm stats again for the io and irq_exits. Thanks. Hmm, it's looking like win2012's interrupt handling is sensitive to running as a KVM guest in general - not necessarily just with the e1000 driver. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Is this bug still reproducible? Thanks, Yan. Start up win2012 guest with the command in the description. Guest works well. This bug could not be reproducible. tested packages version: qemu-kvm-1.5.3-10.el7.x86_64 kernel-3.10.0-35.el7.x86_64 Closing according to comment #25 |