Bug 843000
Summary: | [balloon]Guest BSOD during 10000 times balloon device hotplug/unplug | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Mike Cao <bcao> | ||||||||
Component: | virtio-win | Assignee: | Vadim Rozenfeld <vrozenfe> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.0 | CC: | areis, bcao, bsarathy, juzhang, lijin, michen, mkenneth, qzhang, rbalakri, rhod, shuyu, virt-bugs, virt-maint, vrozenfe | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 7.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-11-09 12:30:15 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Mike Cao
2012-07-25 09:17:07 UTC
The context is partially valid. Only x86 user-mode context is available. The wow64exts extension must be loaded to access 32-bit state. .load wow64exts will do this if you haven't loaded it already. ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 9F, {4, 258, fffffa800af61680, fffff800013da3d0} Implicit thread is now fffffa80`0af61680 Probably caused by : Unknown_Image ( ANALYSIS_INCONCLUSIVE ) Followup: MachineOwner --------- 16.0: kd:x86> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DRIVER_POWER_STATE_FAILURE (9f) A driver has failed to complete a power IRP within a specific time (usually 10 minutes). Arguments: Arg1: 0000000000000004, The power transition timed out waiting to synchronize with the Pnp subsystem. Arg2: 0000000000000258, Timeout in seconds. Arg3: fffffa800af61680, The thread currently holding on to the Pnp lock. Arg4: fffff800013da3d0, nt!TRIAGE_9F_PNP on Win7 Debugging Details: ------------------ Implicit thread is now fffffa80`0af61680 DRVPOWERSTATE_SUBCODE: 4 FAULTING_THREAD: fffffa800af61680 DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT BUGCHECK_STR: 0x9F CURRENT_IRQL: 0 LAST_CONTROL_TRANSFER: from 0000000000000000 to 0000000000000000 STACK_TEXT: 00000000 00000000 00000000 00000000 00000000 0x0 STACK_COMMAND: kb SYMBOL_NAME: ANALYSIS_INCONCLUSIVE FOLLOWUP_NAME: MachineOwner MODULE_NAME: Unknown_Module IMAGE_NAME: Unknown_Image DEBUG_FLR_IMAGE_TIMESTAMP: 0 BUCKET_ID: INVALID_KERNEL_CONTEXT Followup: MachineOwner --------- hit one more time when shutdown guest after 10000 times balloon hotplug/unplug Hi Mike, Do you have the balloon service running during this test? Thank you, Vadim. (In reply to comment #5) > Hi Mike, > Do you have the balloon service running during this test? No. only do hotplug and hotunplug in a loop > > Thank you, > Vadim. Reproduced this issue on RHEL7.0(qemu-kvm-1.4.0-1.el7.x86_64 && kernel-3.8.0-0.40.el7.x86_64 ),similar issue happened. steps: 1.boot guest: /usr/libexec/qemu-kvm \ -drive file=/home/whql-test/win7-32-virtio.qcow2,if=none,cache=writethrough,media=disk,format=qcow2,id=disk1 -device ide-drive,id=ide0-0-0,drive=disk1,bootindex=0 \ -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:7f:f9:56,bus=pci.0 \ -monitor unix:/tmp/tt,server,nowait \ -boot menu=on \ -spice port=5900,disable-ticketing -vga qxl \ -chardev file,path=/root/console.log,id=serial1 \ -device isa-serial,chardev=serial1,id=s1 \ -usb -device usb-tablet,id=tablet1 \ -M pc-i440fx-1.4 -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 -m 2G \ -enable-kvm \ -fda /usr/share/virtio-win/virtio-win-1.6.3_x86.vfd \ -cdrom /usr/share/virtio-win/virtio-win-1.6.3.iso 2.do 10000 times pci hotplug/unplug for ((i=1;i<=10000;i++)) do echo device_del balloon0 |nc -U /tmp/tt sleep 1 ; echo device_add virtio-balloon-pci,id=balloon,addr=0x5 |nc -U /tmp/tt sleep 1 ; done 3.do s3,the guest still alive 4.shutdown guest,the win7.32 guest BSOD the attachment"bsod.png" is the BSOD screenshot the attachment"memory dump file&analyze"is the dump file and the windbg analyze file Created attachment 731085 [details]
BSOD screenshot
Created attachment 731089 [details]
memory dump file and windbg analyze file
Still can be reproduced with the following version when reboot guest after the hotplug/unplug loop. kernel-2.6.32-358.6.1.el6.x86_64 qemu-kvm-0.12.1.2-2.362.el6.x86_64 virtio-win-prewhql-59 Created attachment 745534 [details]
Memory dump file on rhel6.5 host
QE, can you please check again. with the latest drivers and QEMU. retest this issue on latest rhel6.6 host w/ windows 2008R2,during 10000 times balloon device hotplug/unplug,the guest work well,and after the 1000 times hotplug/unplug,the guest can reboot and shutdown without any error. qemu-kvm-rhev-0.12.1.2-2.434.el6.x86_64 kernel-2.6.32-495.el6.x86_64 seabios-0.6.1.2-28.el6.x86_64 virtio-win-prewhql-86 steps: 1.boot guest: /usr/libexec/qemu-kvm -m 2G -smp 2,maxcpus=2,cores=2,threads=1,scokets=1 -netdev tap,id=hostnet1,script=/etc/qemu-ifup -device e1000,netdev=hostnet1,id=net1,mac=00:52:00:00:11:22 -usb -device usb-tablet,id=tablet1 -drive file=win2008r2.raw,format=raw,if=none,id=drive1 -device ide-drive,drive=drive1,id=disk1 -cdrom en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso -uuid 6adb29a6-4e36-46df-84eb-c463ecfdc2ba -name win2008R2 -device virtio-balloon-pci,id=balloon,addr=0x9 -boot menu=on -spice port=5900,disable-ticketing -vga qxl -monitor unix:/tmp/tt,server,nowait 2.do 10000 times pci hotplug/unplug for((i=1;i<=1000;i++)); do echo device_del | nc -U /tmp/tt; sleep 5; echo device_add virtio-balloon-pci,id=balloon,addr=0x9 | nc -U /tmp/tt; sleep 5; done 3.reboot guest successfully 4.shutdown guest successfully Based on above,the issue has been fixed already (In reply to shuyu from comment #17) > retest this issue on latest rhel6.6 host w/ windows 2008R2,during 10000 s/10000/1000/ retest this issue on rhel 6.6 host w/ windows 2008R2 & virtio-win-prewhql-89,during 1000 times balloon device hotplug/unplug,the guest work well,and after the 1000 times hotplug/unplug,the guest can reboot and shutdown without any error. qemu-kvm-rhev-0.12.1.2-2.434.el6.x86_64 kernel-2.6.32-495.el6.x86_64 seabios-0.6.1.2-28.el6.x86_64 virtio-win-prewhql-89 the steps same as comment17 Retest this issue on virtio-win-prewhql-89 on RHEL7.0 guest ,guest BSOD at last Packages: 3.10.0-121.el7.x86_64 qemu-kvm-1.5.3-62.el7.x86_64 Steps: 1.Start VM /usr/libexec/qemu-kvm -drive file=089BLNWIN732EBK,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=writethrough,media=disk -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -monitor unix:/tmp/tt,server,nowait -boot menu=on -spice port=5900,disable-ticketing -vga qxl -chardev file,path=/root/console.log,id=serial1 -device isa-serial,chardev=serial1,id=s1 -usb -device usb-tablet,id=tablet1 -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 -m 2G -enable-kvm 2.hotplug/unplug in a loop for ((i=1;i<=10000;i++)); do echo device_del balloon0 |nc -U /tmp/tt; sleep 5; echo device_add virtio-balloon-pci,id=balloon0,addr=0x5 |nc -U /tmp/tt; sleep 5; done Actual Results: Guest BSOD occurs CRITICAL_OBJECT_TERMINATION (f4) A process or thread crucial to system operation has unexpectedly exited or been terminated. Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function. Arguments: Arg1: 00000003, Process Arg2: 8545bb08, Terminating object Arg3: 8545bc74, Process image file name Arg4: 82866cf0, Explanatory message (ascii) Debugging Details: ------------------ Page 13102 not present in the dump file. Type ".hh dbgerr004" for details KERNEL_LOG_FAILING_PROCESS: PROCESS_OBJECT: 8545bb08 IMAGE_NAME: csrss.exe DEBUG_FLR_IMAGE_TIMESTAMP: 0 MODULE_NAME: csrss FAULTING_MODULE: 00000000 PROCESS_NAME: csrss.exe EXCEPTION_CODE: (NTSTATUS) 0xc0000006 - The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x. BUGCHECK_STR: 0xF4_IOERR DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT CURRENT_IRQL: 0 ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre STACK_TEXT: 8e9d5c9c 8292c067 000000f4 00000003 8545bb08 nt!KeBugCheckEx+0x1e 8e9d5cc0 828a9c1e 82866cf0 8545bc74 8545bd78 nt!PspCatchCriticalBreak+0x71 8e9d5cf0 828a9b61 8545bb08 85efe5f8 c0000006 nt!PspTerminateAllThreads+0x2d 8e9d5d24 8268b1ea ffffffff c0000006 0170f5c4 nt!NtTerminateProcess+0x1a2 8e9d5d24 779470b4 ffffffff c0000006 0170f5c4 nt!KiFastCallEntry+0x12a WARNING: Frame IP not in any known module. Following frames may be wrong. 0170f5c4 00000000 00000000 00000000 00000000 0x779470b4 STACK_COMMAND: kb FOLLOWUP_NAME: MachineOwner IMAGE_VERSION: FAILURE_BUCKET_ID: 0xF4_IOERR_IMAGE_csrss.exe BUCKET_ID: 0xF4_IOERR_IMAGE_csrss.exe ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:0xf4_ioerr_image_csrss.exe FAILURE_ID_HASH: {2b68738d-6c37-fd75-d711-1229511b3eea} Followup: MachineOwner --------- retest this issue on rhel6.6 host w/ win7-32 & virtio-win-prewhql86,during 10000 times balloon device hotplug/unplug,the guest work well,and after the 10000 times hotplug/unplug,the guest can reboot and shutdown without any error. kernel-2.6.32-495.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.434.el6.x86_64 seabios-0.6.1.2-28.el6.x86_64 virtio-win-prewhql-86 Steps: 1.boot guest: /usr/libexec/qemu-kvm -m 2G -smp 2,maxcpus=2,cores=2,threads=1,scokets=1 -netdev tap,id=hostnet1,script=/etc/qemu-ifup -device e1000,netdev=hostnet1,id=net1,mac=00:52:00:00:11:22 -usb -device usb-tablet,id=tablet1 -drive file=win7-32-balloon.raw,format=raw,if=none,id=drive1 -device ide-drive,drive=drive1,id=disk1 -cdrom en_windows_7_ultimate_x86_dvd_x15-65921.iso -uuid 56200569-1761-4a09-94ff-383cfd9e2e01 -name win7-32-balloon -spice port=5900,disable-ticketing -vga qxl -monitor unix:/tmp/tt,server,nowait -device virtio-balloon-pci,id=balloon,addr=0x9 2.hotplug/unplug in a loop for((i=1;i<=10000;i++)); do echo device_del balloon | nc -U /tmp/tt; sleep 7; echo device_add virtio-balloon-pci,id=balloon,addr=0x9 | nc -U /tmp/tt; sleep 7; done 3.reboot guest successfully 4.shutdown guest successfully Based on comment 19 and comment 10, it looks like RHEL7 might still have this issue. QE, since it seems as if it does not reproduce on RHEL6.6, can you please also verify it on RHEL7 Thanks. Retest this issue on RHEL7.1 Packages 3.10.0-186.el7.x86_64 qemu-kvm-rhev-2.1.2-1.el7.x86_64 seabios-1.7.5-4.el7.x86_64 Steps : if sleep 2 sec between every cycle of hotunlug/hot-plug Actual Results: guest will response slowly and failed to shutdown (shutdown -t 0 -s -f does not work) if sleep 7 sec between each hotplug/unplug Actual Results: Guest works fine after 18 hours Based on above ,Vadim Can you provide QE a standard time langency for each round hot-unplug/plug operation ? Thanks, Mike (In reply to Mike Cao from comment #24) > Retest this issue on RHEL7.1 > > Packages > 3.10.0-186.el7.x86_64 > qemu-kvm-rhev-2.1.2-1.el7.x86_64 > seabios-1.7.5-4.el7.x86_64 > > Steps : > if sleep 2 sec between every cycle of hotunlug/hot-plug > > Actual Results: guest will response slowly and failed to shutdown (shutdown > -t 0 -s -f does not work) > > if sleep 7 sec between each hotplug/unplug > Actual Results: Guest works fine after 18 hours > > Based on above ,Vadim Can you provide QE a standard time langency for each > round hot-unplug/plug operation ? > Hi Mike, I don't think I can give any exact numbers. PCI device plug/unplug is a very complicated process from sides - HW (emulated by host), OS, and device driver itself. Add more load to host and latency will be changed. I think we can close this bug, but lets run this test from time to time as addition to HCK PnP tests. Best regards, Vadim. > Thanks, > Mike (In reply to Vadim Rozenfeld from comment #25) > (In reply to Mike Cao from comment #24) > > Retest this issue on RHEL7.1 > > > > Packages > > 3.10.0-186.el7.x86_64 > > qemu-kvm-rhev-2.1.2-1.el7.x86_64 > > seabios-1.7.5-4.el7.x86_64 > > > > Steps : > > if sleep 2 sec between every cycle of hotunlug/hot-plug > > > > Actual Results: guest will response slowly and failed to shutdown (shutdown > > -t 0 -s -f does not work) > > > > if sleep 7 sec between each hotplug/unplug > > Actual Results: Guest works fine after 18 hours > > > > Based on above ,Vadim Can you provide QE a standard time langency for each > > round hot-unplug/plug operation ? > > > > Hi Mike, > I don't think I can give any exact numbers. PCI device plug/unplug is a very > complicated process from sides - HW (emulated by host), OS, and device > driver itself. Add more load to host and latency will be changed. I think we > can close > this bug, but lets run this test from time to time as addition to HCK PnP > tests. I agree to closing the bug. Regarding to the HCK pnp job ,I think it is similar as the operation click "eject" in the task bar .Is it same as we run device_del in qemu monitor ? Thanks, Mike (In reply to Mike Cao from comment #26) > (In reply to Vadim Rozenfeld from comment #25) > > (In reply to Mike Cao from comment #24) > > > Retest this issue on RHEL7.1 > > > > > > Packages > > > 3.10.0-186.el7.x86_64 > > > qemu-kvm-rhev-2.1.2-1.el7.x86_64 > > > seabios-1.7.5-4.el7.x86_64 > > > > > > Steps : > > > if sleep 2 sec between every cycle of hotunlug/hot-plug > > > > > > Actual Results: guest will response slowly and failed to shutdown (shutdown > > > -t 0 -s -f does not work) > > > > > > if sleep 7 sec between each hotplug/unplug > > > Actual Results: Guest works fine after 18 hours > > > > > > Based on above ,Vadim Can you provide QE a standard time langency for each > > > round hot-unplug/plug operation ? > > > > > > > Hi Mike, > > I don't think I can give any exact numbers. PCI device plug/unplug is a very > > complicated process from sides - HW (emulated by host), OS, and device > > driver itself. Add more load to host and latency will be changed. I think we > > can close > > this bug, but lets run this test from time to time as addition to HCK PnP > > tests. > I agree to closing the bug. > Regarding to the HCK pnp job ,I think it is similar as the operation click > "eject" in the task bar .Is it same as we run device_del in qemu monitor ? No, they are not the same. Eject is a gentle way to ask the system to tear the device stack down and remove device. While device_del is some sort of brute force action similar to pulling device out of PCI slot, which will activate surprise removal path. Cheers, Vadim. > > Thanks, > Mike |