Bug 994431
Summary: | Win8-32 guest BSOD 0x1a after migration (with shadow paging) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | CongLi <coli> | ||||
Component: | kernel | Assignee: | Marcelo Tosatti <mtosatti> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.0 | CC: | acathrow, coli, hhuang, huding, juzhang, knoel, michen, mjenner, mtosatti, quintela, qzhang, rhod, shuang, svenkatr, virt-maint, vrozenfe, xwei | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.10.0-105.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1070247 (view as bug list) | Environment: | |||||
Last Closed: | 2014-06-13 11:17:54 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1070247 | ||||||
Attachments: |
|
Description
CongLi
2013-08-07 09:30:02 UTC
Created attachment 783778 [details]
BSOD screenshot
Hi, Can you please uploaded ziped memory dump (preferably full or kernel memory dump)? Thanks, Yan. Hi, Do you see this crash on RHEL6.5 as well? Thanks, Yan. (In reply to Yan Vugenfirer from comment #11) Hi Yan, I have tested it on RHEL.6.6 host many times, didn't hit this bug on the same machine. kernel-2.6.32-438.el6.x86_64 qemu-kvm-0.12.1.2-2.419.el6.x86_64 Thanks, Cong CongLi, The crash dump is not very helpful. Accordingly to http://msdn.microsoft.com/en-us/library/windows/hardware/ff557391%28v=vs.85%29.aspx 0x00003453 is Please try the following: 1) Enable driver verifier. See instructions at http://msdn.microsoft.com/en-us/library/windows/hardware/ff545448%28v=vs.85%29.aspx#how_to_start_dv. Use the default settings. If that fails to report new information, or if drivers now cause crashes even without migration, please report and continue to 2 below. 2) Confirm whether problem is reproducible with rtl8139 or e1000 and IDE disk. If we can't isolate corruption a particular device, then it might be core migration data copy, and in that case could be related to bug which triggers https://bugzilla.redhat.com/show_bug.cgi?id=1046870 Debugging 1046870 in the meantime. (In reply to Marcelo Tosatti from comment #13) > CongLi, > > The crash dump is not very helpful. Accordingly to > http://msdn.microsoft.com/en-us/library/windows/hardware/ff557391%28v=vs. > 85%29.aspx > > 0x00003453 is "unknown memory management error". It could be "PTEs of kernel thread are corrupt" AND 0x1, but lets try the tests listed in the previous comment. (In reply to Marcelo Tosatti from comment #13) > Please try the following: > > 1) Enable driver verifier. See instructions at > http://msdn.microsoft.com/en-us/library/windows/hardware/ff545448%28v=vs. > 85%29.aspx#how_to_start_dv. Use the default settings. > > If that fails to report new information, or if drivers now cause crashes > even without migration, please report and continue to 2 below. Use the default settings, it prompts 'No unsigned drivers have been found'. > 2) Confirm whether problem is reproducible with rtl8139 or e1000 and IDE > disk. Yes, it can be reproduced with rtl8139 or e1000 and IDE disk. > If we can't isolate corruption a particular device, then it might be core > migration data copy, and in that case could be related to bug which triggers > > https://bugzilla.redhat.com/show_bug.cgi?id=1046870 > > Debugging 1046870 in the meantime. (In reply to CongLi from comment #16) Tested on version: kernel-3.10.0-89.el7.x86_64 qemu-kvm-rhev-1.5.3-49.el7.x86_64 (In reply to CongLi from comment #17) > (In reply to CongLi from comment #16) > > Tested on version: > kernel-3.10.0-89.el7.x86_64 > qemu-kvm-rhev-1.5.3-49.el7.x86_64 CongLi, Can you provide full command line, for RTL8139/IDE case, and migration parameters so that i can try to reproduce please? Did you attempt to reproduce on Intel host? (In reply to CongLi from comment #16) > (In reply to Marcelo Tosatti from comment #13) > > > Please try the following: > > > > 1) Enable driver verifier. See instructions at > > http://msdn.microsoft.com/en-us/library/windows/hardware/ff545448%28v=vs. > > 85%29.aspx#how_to_start_dv. Use the default settings. > > > > If that fails to report new information, or if drivers now cause crashes > > even without migration, please report and continue to 2 below. > > Use the default settings, it prompts 'No unsigned drivers have been found'. > > > 2) Confirm whether problem is reproducible with rtl8139 or e1000 and IDE > > disk. > > Yes, it can be reproduced with rtl8139 or e1000 and IDE disk. Thats correct. Then you should be able to reboot the guest and driver verified should be enabled. Did you do that ? I'll try to reproduce locally (once i have QEMU command line and migration command details), in the meantime. (In reply to Marcelo Tosatti from comment #18) > (In reply to CongLi from comment #17) > > (In reply to CongLi from comment #16) > > > > Tested on version: > > kernel-3.10.0-89.el7.x86_64 > > qemu-kvm-rhev-1.5.3-49.el7.x86_64 > > CongLi, > > Can you provide full command line, for RTL8139/IDE case, and migration > parameters > so that i can try to reproduce please? > > Did you attempt to reproduce on Intel host? Or AMD hardware with NPT ? (In reply to Marcelo Tosatti from comment #18) > Did you attempt to reproduce on Intel host? No, have not met this issue on Intel host. > Thats correct. Then you should be able to reboot the guest and driver verified should be enabled. Did you do that ? Yes, I have reboot it, but there is nothing special when hit BSOD with verifier enabled. > Or AMD hardware with NPT ? 1. I met this bug on the AMD machine in comment 0 which has no NPT. # cat /proc/cpuinfo | grep -i npt # 2. Haven't hit this problem on other AMD machine which has NPT. processor : 23 vendor_id : AuthenticAMD cpu family : 21 model : 1 model name : AMD Opteron(TM) Processor 6234 stepping : 2 microcode : 0x6000626 cpu MHz : 2400.038 cache size : 2048 KB physical id : 1 siblings : 12 core id : 5 cpu cores : 6 apicid : 75 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bogomips : 4799.73 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb CML (rtl8139 & ide): 1. boot src guest: /root/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140224-142410-xmzetVMD,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140224-142410-xmzetVMD,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20140224-142410-xmzetVMD,path=/tmp/seabios-20140224-142410-xmzetVMD,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20140224-142410-xmzetVMD,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/root/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/win8-32.qcow2 \ -device ide-hd,id=image1,drive=drive_image1,bus=ide.0,unit=0 \ -device rtl8139,mac=9a:db:dc:dd:de:df,id=idQqJS0p,netdev=idVNvQ0L,bus=pci.0,addr=04 \ -netdev tap,id=idVNvQ0L,fd=22 \ -m 2048 \ -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \ -cpu 'Opteron_G2',+sep,+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic \ -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=/root/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -no-hpet \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off \ -rtc-td-hack \ -enable-kvm \ 2. query time in src vm: w32tm /stripchart /samples:1 /computer:clock.redhat.com 3. boot dst vm: same CML as src, except: -vnc :1 \ -incoming tcp:0:5200 \ 4. migrate src to dst in local host: (qemu) migrate -d tcp:0:5200 5. do ping-pong migration 3 times w/ the above steps. (In reply to CongLi from comment #22) > CML (rtl8139 & ide): > > 1. boot src guest: > /root/staf-kvm-devel/autotest-devel/client/tests/virt/qemu/qemu \ > -S \ > -name 'virt-tests-vm1' \ > -sandbox off \ > -M pc \ > -nodefaults \ > -vga std \ > -chardev > socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20140224-142410- > xmzetVMD,server,nowait \ > -mon chardev=qmp_id_qmpmonitor1,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140224-142410- > xmzetVMD,server,nowait \ > -device isa-serial,chardev=serial_id_serial0 \ > -chardev > socket,id=seabioslog_id_20140224-142410-xmzetVMD,path=/tmp/seabios-20140224- > 142410-xmzetVMD,server,nowait \ > -device > isa-debugcon,chardev=seabioslog_id_20140224-142410-xmzetVMD,iobase=0x402 \ > -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > -drive > id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/root/staf- > kvm-devel/autotest-devel/client/tests/virt/shared/data/images/win8-32.qcow2 \ > -device ide-hd,id=image1,drive=drive_image1,bus=ide.0,unit=0 \ > -device > rtl8139,mac=9a:db:dc:dd:de:df,id=idQqJS0p,netdev=idVNvQ0L,bus=pci.0,addr=04 > \ > -netdev tap,id=idVNvQ0L,fd=22 \ > -m 2048 \ > -smp 1,maxcpus=1,cores=1,threads=1,sockets=2 \ > -cpu > 'Opteron_G2',+sep,+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic \ > -drive > id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=/root/staf-kvm- > devel/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso > \ > -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :0 \ > -no-hpet \ > -rtc base=localtime,clock=host,driftfix=slew \ > -boot order=cdn,once=c,menu=off \ > -rtc-td-hack \ > -enable-kvm \ > > 2. query time in src vm: > w32tm /stripchart /samples:1 /computer:clock.redhat.com > > 3. boot dst vm: > same CML as src, except: > -vnc :1 \ > -incoming tcp:0:5200 \ > > 4. migrate src to dst in local host: > (qemu) migrate -d tcp:0:5200 > > 5. do ping-pong migration 3 times w/ the above steps. Thanks CongLi. While i setup the environment, can you confirm its reproducible on Intel with kvm_intel module parameter ept=0 ? (In reply to Marcelo Tosatti from comment #23) > While i setup the environment, can you confirm its reproducible on Intel > with kvm_intel module parameter ept=0 ? Yes, it can be reproduced on Intel host with kvm_intel module parameter ept=0. # modprobe kvm_intel "ept=0" cpuinfo: processor : 23 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz stepping : 7 microcode : 0x710 cpu MHz : 2318.203 cache size : 15360 KB physical id : 1 siblings : 12 core id : 5 cpu cores : 6 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 4004.09 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Thanks, Cong A kernel patch was submitted against this bug, so this bug needs to be against the kernel component. You'll have to reacquire flags now, unfortunately. Patch(es) available on kernel-3.10.0-105.el7 When I verify this bug, hit this BZ 1036478: after win8 32 bits guest migraion, the DST no response from UI, but the mouse can move, and can ping guest from host. The kernel and qemu-kvm version is: kernel-3.10.0-107.el7.x86_64 qemu-kvm-1.5.3-52.el7.x86_64 Steps ot Reproduce 1. boot guest on src and dst host: src: # /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=1 -name win8-32 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/home/win8-32-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0a:00,bus=pci.0,addr=0x3 -vnc :2 -monitor stdio -qmp tcp:0:4445,server,nowait dst: # /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=1 -name win8-32 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/home/win8-32-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0a:00,bus=pci.0,addr=0x3 -vnc :3 -monitor stdio -qmp tcp:0:4446,server,nowait -incoming tcp:0:5200 2. do three rounds migration (qemu) migration -d tcp:0:5200 Actual results: after win8 32 bits guest migraion, the DST no response from UI, but the mouse can move, and can ping guest from host. QE would like to set this as verified and continue to track BZ 1036478. When qe verify BZ 1036478, qe will have a try this scenarios as well. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |