Bug 607510
Summary: | Windows7 guest cannot resume after suspended to disk after plenty of pause:resume iterations - e1000 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Cao, Chen <kcao> | ||||||||
Component: | qemu-kvm | Assignee: | jason wang <jasowang> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 6.0 | CC: | areis, bcao, bsarathy, flang, gcosta, juzhang, llim, michen, mkenneth, qzhang, rhod, tburke, virt-maint, yhalperi, yvugenfi | ||||||||
Target Milestone: | beta | ||||||||||
Target Release: | 6.2 | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.306.el6 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 847241 884998 (view as bug list) | Environment: | |||||||||
Last Closed: | 2013-02-21 07:29:58 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 580953, 720669, 753024, 761491, 847241, 884998 | ||||||||||
Attachments: |
|
Description
Cao, Chen
2010-06-24 09:47:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Is it reproducable without spice? (In reply to comment #3) > Is it reproducable without spice? I can only reproduce this bug without spice, the Windows7 guest can suspend/resume with the -spice option. and I have also tried it on # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.90.el6.x86_64 # uname -r 2.6.32-39.el6.x86_64 This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Is this still reproducable? (In reply to comment #7) > Is this still reproducable? yes, still can reproduce, once out of 6 times. with command: /usr/libexec/qemu-kvm -name 'vm1' \ -chardev socket,id=human_monitor_eqFd,path=/tmp/monitor-humanmonitor1-20101124-114524-dMO8,server,nowait \ -mon chardev=human_monitor_eqFd,mode=readline \ -chardev socket,id=serial_i2zu,path=/tmp/serial-20101124-114524-dMO8,server,nowait \ -device isa-serial,chardev=serial_i2zu \ -drive file='./win7-32.qcow2',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,format=qcow2,aio=native \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -device rtl8139,mac=9a:3e:52:90:38:8e,netdev=idtaTtKT,id=ndev00idtaTtKT,bus=pci.0,addr=0x3 \ -netdev tap,id=idtaTtKT,ifname='t0-114524-dMO8',script='./qemu-ifup-switch',downscript='no' -m 2048 -smp 1 \ -cpu cpu64-rhel6,+sse2,+x2apic \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=none \ -M rhel6.0.0 -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm and, # uname -r 2.6.32-71.9.1.el6.x86_64 # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.113.el6_0.4.x86_64 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:1237 id "" Bus 0, device 1, function 0: ISA bridge: PCI device 8086:7000 id "" Bus 0, device 1, function 1: IDE controller: PCI device 8086:7010 BAR4: I/O at 0xc000 [0xc00f]. id "" Bus 0, device 1, function 2: USB controller: PCI device 8086:7020 IRQ 11. BAR4: I/O at 0xc020 [0xc03f]. id "" Bus 0, device 1, function 3: Bridge: PCI device 8086:7113 IRQ 9. id "" Bus 0, device 2, function 0: VGA controller: PCI device 1013:00b8 BAR0: 32 bit prefetchable memory at 0xf0000000 [0xf1ffffff]. BAR1: 32 bit memory at 0xf2000000 [0xf2000fff]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe]. id "" Bus 0, device 3, function 0: Ethernet controller: PCI device 10ec:8139 IRQ 10. BAR0: I/O at 0xc100 [0xc1ff]. BAR1: 32 bit memory at 0xf2020000 [0xf20200ff]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe]. id "ndev00idtaTtKT" Cannot reproduce. At what point it hangs? What "info cpus" and "info registers" in qemu monitor show when it stuck? (In reply to comment #9) > Cannot reproduce. At what point it hangs? What "info cpus" and "info registers" > in qemu monitor show when it stuck? 1. Reproduced once out of 10 on the intel host, and got a very high rate to reproduce it on the amd host specified below. 2. the Windows7 guest is stuck when preparing the login (unlock) screen. the screenshot is attached. on # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.113.el6_0.5.x86_64 # uname -r 2.6.32-71.13.1.el6.x86_64 3. info got on the intel host: --- (qemu) info cpus * CPU #0: pc=0x0000000082a1fcac thread_id=9440 (qemu) info registers EAX=00000009 EBX=8078ad6c ECX=82732c09 EDX=000000a2 ESI=854bfd38 EDI=854b1c80 EBP=8078ace8 ESP=8078ace0 EIP=82a1fcac EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0030 82732c00 00003748 00409300 DPL=0 DS [-WA] GS =0000 00000000 ffffffff 00000000 LDT=0000 00000000 ffffffff 00000000 TR =0028 801da000 000020ab 00008b00 DPL=0 TSS32-busy GDT= 80b95000 000003ff IDT= 80b95400 000007ff CR0=80010031 CR2=00540000 CR3=00185000 CR4=000006f8 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=027f FSW=0000 [ST=0] FTW=00 MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 info got on the amd host: --- (qemu) info cpus * CPU #0: pc=0x000000008260dcac thread_id=7402 (qemu) info registers EAX=00000009 EBX=8078ad6c ECX=82767c09 EDX=000000a2 ESI=8543bd38 EDI=85428c80 EBP=8078ace8 ESP=8078ace0 EIP=8260dcac EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0030 82767c00 00003748 00409300 DPL=0 DS [-WA] GS =0000 00023de0 0000ffff 00000000 LDT=0000 00000000 0000ffff 00000000 TR =0028 801da000 000020ab 00008b00 DPL=0 TSS32-busy GDT= 80b95000 000003ff IDT= 80b95400 000007ff CR0=8001003b CR2=02f90ffc CR3=7f1231a0 CR4=000006f8 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=027f FSW=0120 [ST=0] FTW=00 MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=8000000000000000 3fff FPR5=c0fd200000000000 4002 FPR6=f000000000000000 4002 FPR7=8000000000000000 3fff XMM00=00000000000000000000000000000000 XMM01=00430034003400310034003600420035 XMM02=002e0031005f00460044003100460043 XMM03=0031002e0030003000360037002e0031 XMM04=004e004f004e005f0035003800330036 XMM05=004300370043004600320037005f0045 XMM06=00350032003200310036003800460042 XMM07=004c0050004900440047005c00410043 4. reproduced on the following hosts: intel: --- processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping : 10 cpu MHz : 2660.132 cache size : 3072 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips : 5319.73 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: amd: --- processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 1150.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips : 4587.43 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate There is no screenshot attached. Can you attache it please. Also when stuck issue "x/50i $pc-50" + "info registers". (In reply to comment #11) > There is no screenshot attached. Can you attache it please. Also when stuck > issue "x/50i $pc-50" + "info registers". (qemu) x/50i $pc-50 0x0000000082a1fc7a: or $0xc1,%al 0x0000000082a1fc7c: call 0x7ae8583 0x0000000082a1fc81: add %al,(%eax) 0x0000000082a1fc83: (bad) 0x0000000082a1fc84: lcall *-0x3e(%ebp) 0x0000000082a1fc87: or %al,(%eax) 0x0000000082a1fc89: int3 0x0000000082a1fc8a: int3 0x0000000082a1fc8b: int3 0x0000000082a1fc8c: int3 0x0000000082a1fc8d: int3 0x0000000082a1fc8e: mov %edi,%edi 0x0000000082a1fc90: mov 0xfffe0300,%eax 0x0000000082a1fc95: test $0x1000,%eax 0x0000000082a1fc9a: jne 0x82a1fc90 0x0000000082a1fc9c: ret 0x0000000082a1fc9d: int3 0x0000000082a1fc9e: int3 0x0000000082a1fc9f: int3 0x0000000082a1fca0: int3 0x0000000082a1fca1: int3 0x0000000082a1fca2: movl $0x0,0xfffe00b0 0x0000000082a1fcac: ret 0x0000000082a1fcad: int3 0x0000000082a1fcae: int3 0x0000000082a1fcaf: int3 0x0000000082a1fcb0: int3 0x0000000082a1fcb1: int3 0x0000000082a1fca2: movl $0x0,0xfffe00b0 0x0000000082a1fcac: ret 0x0000000082a1fcad: int3 0x0000000082a1fcae: int3 0x0000000082a1fcaf: int3 0x0000000082a1fcb0: int3 0x0000000082a1fcb1: int3 0x0000000082a1fcb2: mov %edi,%edi 0x0000000082a1fcb4: push %ebp 0x0000000082a1fcb5: mov %esp,%ebp 0x0000000082a1fcb7: cmpb $0x0,0x82a36182 0x0000000082a1fcbe: mov 0x8(%ebp),%eax 0x0000000082a1fcc1: jne 0x82a1fcc6 0x0000000082a1fcc3: shl $0x18,%eax 0x0000000082a1fcc6: mov 0xc(%ebp),%edx 0x0000000082a1fcc9: push %esi 0x0000000082a1fcca: xor %esi,%esi 0x0000000082a1fccc: xor %ecx,%ecx 0x0000000082a1fcce: or %esi,%eax 0x0000000082a1fcd0: or %edx,%ecx 0x0000000082a1fcd2: push %eax 0x0000000082a1fcd3: push %ecx 0x0000000082a1fcd4: call *0x82a361a8 0x0000000082a1fcda: pop %esi 0x0000000082a1fcdb: pop %ebp 0x0000000082a1fcdc: ret $0x8 0x0000000082a1fcdf: int3 0x0000000082a1fce0: int3 0x0000000082a1fce1: int3 (qemu) info registers EAX=00000009 EBX=8078ad6c ECX=82732c09 EDX=000000a2 ESI=854bfd38 EDI=854b1c80 EBP=8078ace8 ESP=8078ace4 EIP=82a1ec86 EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0030 82732c00 00003748 00409300 DPL=0 DS [-WA] GS =0000 00000000 ffffffff 00000000 LDT=0000 00000000 ffffffff 00000000 TR =0028 801da000 000020ab 00008b00 DPL=0 TSS32-busy GDT= 80b95000 000003ff IDT= 80b95400 000007ff CR0=80010031 CR2=00540000 CR3=00185000 CR4=000006f8 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=027f FSW=0000 [ST=0] FTW=00 MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Created attachment 471787 [details]
stuck on the intel host
Created attachment 471788 [details]
stuck on the amd host
I noticed that in comment #0 you use vhost=on. vhost is not supported on rhel6.0. Is this reproducible without vhost? If yes does qemu take 100% cpu when it stuck? Run ftrace after it stuck like this: # echo kvm > /sys/kernel/debug/tracing/set_event # cat /sys/kernel/debug/tracing/trace > /tmp/trace Attach /tmp/trace here. Created attachment 471976 [details] ftrace for kvm when Windows7 guest stuck while resuming from S4 (In reply to comment #15) > I noticed that in comment #0 you use vhost=on. vhost is not supported on > rhel6.0. Is this reproducible without vhost? yes, this is reproducible without vhost, as the command line in comment #8. > If yes does qemu take 100% cpu > when it stuck? top -p `pidof qemu-kvm` with "show threads on" --- top - 09:37:36 up 1 day, 18:05, 20 users, load average: 0.82, 0.93, 0.87 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie Cpu0 : 2.0%us, 1.7%sy, 0.0%ni, 95.6%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 56.2%us, 43.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 2.6%us, 1.5%sy, 0.0%ni, 95.4%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 1.3%us, 0.4%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 7994548k total, 5606560k used, 2387988k free, 239796k buffers Swap: 10239992k total, 67496k used, 10172496k free, 2715104k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9440 root 20 0 2314m 2.0g 2720 R 100.0 26.1 2484:08 qemu-kvm 9422 root 20 0 2314m 2.0g 2720 S 3.1 26.1 77:33.64 qemu-kvm > Run ftrace after it stuck like this: > > # echo kvm > /sys/kernel/debug/tracing/set_event > # cat /sys/kernel/debug/tracing/trace > /tmp/trace > > Attach /tmp/trace here. Try to reproduce without network ("-net none" option). (In reply to comment #17) > Try to reproduce without network ("-net none" option). tried about 50+ times with -net none, cannot reproduce. also cannot reproduce this problem when the net options are not provided in the cmd line at all (user mode). /usr/libexec/qemu-kvm -name vm1 \ -chardev socket,id=human_monitor_eqFd,path=/tmp/monitor-humanmonitor1-20101924- 114524-dMO8,server,nowait \ -mon chardev=human_monitor_eqFd,mode=readline \ -chardev socket,id=serial_i2zu,path=/tmp/serial-20101924-114524-dMO8,server, nowait \ -device isa-serial,chardev=serial_i2zu \ -drive file=./win7-32-20110104.qcow2,index=0,if=none,id=drive-ide0-0-0, media=disk,cache=writethrough,format=qcow2,aio=native \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -m 2048 -smp 1 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :12 \ -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 \ -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -net none and tried qemu-kvm-0.12.1.2-2.113.el6_0.6, using cmd line provided in comment 8, can still easily reproduce this problem. Looks like Windows 7 bug to me. I saw this already with e1000. Windows 7 during resume enables receive on nic too early (before it is ready to receive interrupts). If there is a packet in nic's queue already, it generates interrupt immediately and Windows hangs. It is possible that our rtl8139 emulation start sending interrupt to early. I haven't checked against rtl8139's spec, but when I investigated the same problem with e1000 I checked against e1000's spec what Windows 7 does and it looked like it does the wrong thing, so I assume that rtl8139 problem is the same. Would be interesting to check with virtio where we control driver too. On real HW such problem may not be visible since NIC does not start to receive packets immediately after receiver is enabled. It takes a couple of msecs to do link discovery and autonegotiation. Can you try with virtio-net? (In reply to comment #20) > Can you try with virtio-net? I have tried more than 20 times, cannot reproduce using virtio-net. cmd: /usr/libexec/qemu-kvm -name 'vm1' \ -chardev socket,id=human_monitor_TJFq,path=/tmp/monitor-humanmonitor1-20110130-104015-Ghji,server,nowait -mon chardev=human_monitor_TJFq,mode=readline \ -chardev socket,id=serial_2Lar,path=/tmp/serial-20110130-104015-Ghji,server,nowait -device isa-serial,chardev=serial_2Lar \ -drive file='./win7-32-virtio.qcow2',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,format=qcow2,aio=native \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -device virtio-net-pci,netdev=idP499NY,mac=9a:b4:c8:bf:da:c3,netdev=idP499NY,id=ndev00idP499NY,bus=pci.0,addr=0x3 -netdev tap,id=idP499NY,ifname='t0-104015-Ghji',script='qemu-ifup-switch',downscript='no' -m 512 -smp 1,cores=1,threads=1,sockets=1 \ -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=localtime,clock=host,driftfix=none -M rhel6.0.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -enable-kvm Hi ,all I hit this issue sometimes during whql test .especially for win2003 guest . After hit it ,QE need to reboot guest manually and choose "resume from disk" again ,it delays our test schedules if QE did not recognize the guest is hang . What's more ,Bug 769163 's root cause is this one .I think this bug should be fixed ASAP . (In reply to comment #27) > What's more ,Bug 769163 's root cause is this one .I think this bug should be > fixed ASAP . I mean Bug 769163 's root cause *might* be this bug Yan, What do you think? Ronen. I think we should review e1000 spec and device implementation looking at actions during device reset. Checking just RCTL register in e1000_receive looks naive to me. Also - do I understand correctly that this issue cannot be reproduce with spice? And another question - are you using some special BIOS? And yet another option - try to connect WinDbg to the guest under test and break when it is stuck. If it will succeed - might give us some additional info were Windows is stuck. For the e1000 reset, I've tried Michale's fixes for resetting, it does not work (may have some defect but I haven't check). One interesting things is that this bz is only reproduced during resuming ( not booting), from the debug log, windows guest does something different for booting and resuming: For booting, it enables interrupt before letting card receving packets, For resuming, it enables interrupt after letting card receving packets, and if there some packets come before enable the interrupt, when guest tries to enable the interrupt, after an irq were injected to guest, guest would hang. As windows driver behaves differently, it may have good reason that there's something wrong with the order of irq enabling and irq handle registering. I see tons of unhandled irq of e1000/8139 were injected during the eoi broadcast. And it seems guest have no time doing other things execpt trying to handling those irqs without handlers. It seems our IOAPIC EOI broadcast emulation would re-devlier the irq immediately if it found the irq is still active (which is common when there's no irq handle registered in guest). So after each time when guest try to leave irq handler and re-enable irq, the irq-window handler would always inject the that irq to guest. As the this repeat again and again, guest would be busying and never have time to move forward. In conclusion, if one level irq were unhandled, it would be injected to guest endlessly and as guest can't do the following steps such as registering its handler and would hang forever. Not sure this is exactly what read hw behaves. This can be also reproduced with linux guest when a unhandled level irq were rasied ( see bz787959 which is a driver bug). During my exam, if we can let guest move a little before reinject the irq, everything would be fine. So there's a high possibility that windows driver has a bug (enabling the irq before its handler is registerd). *** Bug 716804 has been marked as a duplicate of this bug. *** Jason has a solution, sending upstream. We prefer to wait to the next Z-stream / 6.4 reproduce this bug as follow version: host: # uname -r 2.6.32-279.el6.x86_64 # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.295.el6.x86_64 guest: win7-32 steps: 1.boot guest with rtl8139 NIC /usr/libexec/qemu-kvm -name 'vm1' -chardev socket,id=human_monitor_eqFd,pp/monitor-humanmonitor1-20101124-114524-dMO8,server,nowait -mon chardev=human_monitor_eqFd,mode=readline -chardev socket,id=serial_i2zu,path=/tmp/serial-20101124-114524-dMO8,server,nowait -device isa-serial,chardev=serial_i2zu -drive file=/home/win7-32.qcow2,index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,format=qcow2,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device rtl8139,mac=9a:3e:52:90:38:8e,netdev=idtaTtKT,id=ndev00idtaTtKT,bus=pci.0,addr=0x3 -netdev tap,id=idtaTtKT,ifname='t0-114524-dMO8',script=/etc/qemu-ifup,downscript='no' -m 2048 -smp 1 -cpu Penryn -rtc base=utc,clock=host,driftfix=none -M rhel6.3.0 -usb -device usb-tablet -no-kvm-pit-reinjection -enable-kvm -spice port=5931,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio -bios /usr/share/seabios/bios-pm.bin 2.do S3/S4 ctrl+alt+del---->choose sleep/Hibernate result: after do S4,then resume guest --->guest becomes unresponsive while resuming. test this bug as follow version: host: # uname -r 2.6.32-279.el6.x86_64 # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.321.el6.x86_64 guest: win7-32 steps: 1.boot guest with e1000 NIC 2.do S3/S4 results:tried more than 5 times after do S3/S4 -->guest resume successfully,guest work well. addinfo: 1)if boot guest with rtl8139 NIC on the new qemu version,this issue still have.Guest becomes unresponsive while resuming (In reply to comment #50) > reproduce this bug as follow version: > host: > # uname -r > 2.6.32-279.el6.x86_64 > # rpm -q qemu-kvm > qemu-kvm-0.12.1.2-2.295.el6.x86_64 > > guest: > win7-32 > > steps: > 1.boot guest with rtl8139 NIC > /usr/libexec/qemu-kvm -name 'vm1' -chardev > socket,id=human_monitor_eqFd,pp/monitor-humanmonitor1-20101124-114524-dMO8, > server,nowait -mon chardev=human_monitor_eqFd,mode=readline -chardev > socket,id=serial_i2zu,path=/tmp/serial-20101124-114524-dMO8,server,nowait > -device isa-serial,chardev=serial_i2zu -drive > file=/home/win7-32.qcow2,index=0,if=none,id=drive-ide0-0-0,media=disk, > cache=writethrough,format=qcow2,aio=native -device > ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device > rtl8139,mac=9a:3e:52:90:38:8e,netdev=idtaTtKT,id=ndev00idtaTtKT,bus=pci.0, > addr=0x3 -netdev > tap,id=idtaTtKT,ifname='t0-114524-dMO8',script=/etc/qemu-ifup, > downscript='no' -m 2048 -smp 1 -cpu Penryn -rtc > base=utc,clock=host,driftfix=none -M rhel6.3.0 -usb -device usb-tablet > -no-kvm-pit-reinjection -enable-kvm -spice port=5931,disable-ticketing -vga > qxl -global qxl-vga.vram_size=67108864 -monitor stdio -bios > /usr/share/seabios/bios-pm.bin > 2.do S3/S4 > > ctrl+alt+del---->choose sleep/Hibernate > > result: > after do S4,then resume guest --->guest becomes unresponsive while resuming. > > test this bug as follow version: > > host: > # uname -r > 2.6.32-279.el6.x86_64 > # rpm -q qemu-kvm > qemu-kvm-0.12.1.2-2.321.el6.x86_64 > > guest: > win7-32 > > steps: > 1.boot guest with e1000 NIC > 2.do S3/S4 > > > results:tried more than 5 times after do S3/S4 -->guest resume > successfully,guest work well. > Could you please try more times, e.g. 1000 times of s3/s4 through autotest? > addinfo: > 1)if boot guest with rtl8139 NIC on the new qemu version,this issue still > have.Guest becomes unresponsive while resuming FYI, the rtl8139 issue were in another bug https://bugzilla.redhat.com/show_bug.cgi?id=847241 which is closed as WON'TFIX since we would not put any effort on 8139 issue. reproduce and verify this bug about test 200 times, on the fixed qemu-kvm version not hit the bug problem. so this bug has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0527.html |