Bug 1495070
| Summary: | [virtio-win][viostor]windows 2016 stuck/bsod when run iometer on AMD host | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | lijin <lijin> |
| Component: | virtio-win | Assignee: | Vadim Rozenfeld <vrozenfe> |
| virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | lijin, vrozenfe, yvugenfi |
| Version: | 7.4 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: |
NO_DOCS
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-30 16:21:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1558351 | ||
windbg info:
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
------------------
Page 4251e1 not present in the dump file. Type ".hh dbgerr004" for details
DPC_TIMEOUT_TYPE: DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
BUGCHECK_STR: 0x133
PROCESS_NAME: Dynamo.exe
CURRENT_IRQL: d
ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre
LAST_CONTROL_TRANSFER: from fffff8022f823102 to fffff8022f7cd510
STACK_TEXT:
ffff8681`27794d88 fffff802`2f823102 : 00000000`00000133 00000000`00000001 00000000`00001e00 00000000`00000000 : nt!KeBugCheckEx
ffff8681`27794d90 fffff802`2f74e608 : 000000b3`c5b45d34 000000b3`c5b45957 fffff780`00000320 fffff802`2f7c9ae0 : nt! ?? ::FNODOBFM::`string'+0x46762
ffff8681`27794df0 fffff802`2f6114e5 : ffffc68a`9028c700 ffff8681`27719180 ffffc68a`92d43300 ffff8681`27719180 : nt!KeClockInterruptNotify+0xb8
ffff8681`27794f40 fffff802`2f6da357 : ffff8681`2754cc80 00000000`00000000 fffff898`1ea0f6a6 fffff802`2f7ceb65 : hal!HalpTimerClockIpiRoutine+0x15
ffff8681`27794f70 fffff802`2f7ceb8a : ffffc68a`9028c700 ffffc68a`92d43300 00000000`00000000 ffffc68a`92c01ef0 : nt!KiCallInterruptServiceRoutine+0x87
ffff8681`27794fb0 fffff802`2f7cefd7 : ffff8203`2f91ac60 00000000`0000018a ffff8500`00000000 ffffc68a`911cbc00 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffff8681`2778d060 fffff802`2f6e20fb : ffff8203`2f91ac60 00000000`a000000c 00000000`a0000003 ffff8681`2778d2a9 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffff8681`2778d1f0 fffff802`2fd83f65 : ffff8203`2f91ac60 ffff8681`2778d301 ffffffff`ffffffd2 fffff802`2f6122f5 : nt!IopfCompleteRequest+0x84b
ffff8681`2778d310 fffff80c`908045cb : 00000000`00000000 00000000`00000800 ffff8203`2f91ac60 ffffc68a`93147b60 : nt!IovCompleteRequest+0x1c1
ffff8681`2778d3f0 fffff802`2fd84593 : ffff8681`2778d818 ffff8203`2f370e50 ffff8681`2778d818 ffff8203`2f370f68 : CLASSPNP!TransferPktComplete+0x4ab
ffff8681`2778d640 fffff802`2f6e19c2 : ffff8203`2f370e50 ffffc68a`00000001 ffff8681`2778d759 ffffc68a`92fc32b8 : nt!IovpLocalCompletionRoutine+0x16f
ffff8681`2778d6a0 fffff802`2fd83f65 : ffff8203`2f370e50 ffff8681`2778d839 ffffc68a`931c9cf0 ffffc68a`92fe2698 : nt!IopfCompleteRequest+0x112
ffff8681`2778d7c0 fffff80c`900c695e : ffff8681`28e10010 fffff80c`900c4917 ffff8203`2f370e50 00000000`00000000 : nt!IovCompleteRequest+0x1c1
ffff8681`2778d8a0 fffff80c`900c613b : ffff8681`28dd4010 ffff8203`2f370e50 ffff8681`2778da70 00000000`00000000 : storport!RaidCompleteRequestEx+0x8e
ffff8681`2778d970 fffff80c`900c5a6a : 00000000`00000000 ffffc68a`915f01a0 ffff8681`28dd4010 00000000`00000001 : storport!RaidUnitCompleteRequest+0x59b
ffff8681`2778db00 fffff802`2f6e70a1 : ffff8681`2778dce0 00000000`00030000 ffff8681`27719180 ffff8681`2778df40 : storport!RaidpAdapterDpcRoutine+0x10a
ffff8681`2778dbe0 fffff802`2f6e649f : ffffc68a`00000000 ffffc68a`921215c0 ffff8681`2778de30 00000000`00000002 : nt!KiExecuteAllDpcs+0x2b1
ffff8681`2778dd30 fffff802`2f7d25c5 : 00000000`00000000 ffff8681`27719180 00000000`00000000 00000000`00000014 : nt!KiRetireDpcList+0x5df
ffff8681`2778dfb0 fffff802`2f7d23d0 : 00000000`00000008 fffff80c`9045ffdc 00000000`00000000 fffff802`2f7c9ae0 : nt!KxRetireDpcList+0x5
ffff8681`2b7d3b80 fffff802`2f7d0cca : ffffc68a`918ba850 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiDispatchInterruptContinue
ffff8681`2b7d3bb0 fffff802`2f6f6fd2 : ffffc68a`9260d240 00000000`00000014 ffffc68a`9299e8e0 ffff8681`2b7d3e00 : nt!KiDpcInterrupt+0xca
ffff8681`2b7d3d40 fffff80c`9041bfee : 00000000`00000001 00000000`00000001 00000001`ffff0002 ffffc68a`9260d240 : nt!KeReleaseSpinLock+0x22
ffff8681`2b7d3d70 fffff80c`904195d5 : 00000000`00000000 ffffc68a`9299e8e0 ffffc68a`9299e8e0 ffffc68a`926e4160 : tcpip!TcpTcbSend+0x5be
ffff8681`2b7d4130 fffff80c`9041929a : 00000000`00035b35 00000000`00369e99 00000000`00000003 00000000`00000000 : tcpip!TcpEnqueueTcbSendOlmNotifySendComplete+0xa5
ffff8681`2b7d4160 fffff80c`90418ddb : fffff880`005782e0 ffff8541`0197a730 ffff8681`2b7d4b01 fffff802`2f7113a1 : tcpip!TcpEnqueueTcbSend+0x30a
ffff8681`2b7d4260 fffff802`2f711325 : ffff8681`2b7d4b01 ffff8681`2b7d4360 ffff8681`2b7d47a0 fffff80c`90418db0 : tcpip!TcpTlConnectionSendCalloutRoutine+0x2b
ffff8681`2b7d42e0 fffff80c`90461aa6 : ffffc68a`921b2260 00000000`00000000 00000000`00000000 ffffc68a`918552d0 : nt!KeExpandKernelStackAndCalloutInternal+0x85
ffff8681`2b7d4330 fffff80c`90cfa4c1 : ffffc68a`921b2260 ffff8681`2b7d4b80 00000000`00000008 00000000`00000008 : tcpip!TcpTlConnectionSend+0x76
ffff8681`2b7d43a0 fffff80c`90ce1ebd : ffff8203`2f9c2e50 ffff8203`00000000 ffffc68a`912a2080 ffff8681`2b7d4600 : afd!AfdFastConnectionSend+0x3a1
ffff8681`2b7d4560 fffff802`2fb108c3 : 00000000`00000000 ffffc68a`92c21970 00000000`0001201f fffff802`2f610c25 : afd!AfdFastIoDeviceControl+0x40d
ffff8681`2b7d48e0 fffff802`2fb10536 : 00000000`00000000 00000000`0000020c 00000000`00000001 00000000`00000000 : nt!IopXxxControlFile+0x383
ffff8681`2b7d4a20 fffff802`2f7d8193 : 00000000`00000001 00000000`76e145d0 00000000`00172268 00000000`00000000 : nt!NtDeviceIoControlFile+0x56
ffff8681`2b7d4a90 00000000`76e1222c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`0009ef68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76e1222c
STACK_COMMAND: kb
FOLLOWUP_IP:
CLASSPNP!TransferPktComplete+4ab
fffff80c`908045cb 4183fc02 cmp r12d,2
SYMBOL_STACK_INDEX: 9
SYMBOL_NAME: CLASSPNP!TransferPktComplete+4ab
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: CLASSPNP
IMAGE_NAME: CLASSPNP.SYS
DEBUG_FLR_IMAGE_TIMESTAMP: 57cf989c
BUCKET_ID_FUNC_OFFSET: 4ab
FAILURE_BUCKET_ID: 0x133_VRF_ISR_CLASSPNP!TransferPktComplete
BUCKET_ID: 0x133_VRF_ISR_CLASSPNP!TransferPktComplete
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x133_vrf_isr_classpnp!transferpktcomplete
FAILURE_ID_HASH: {d26a3ea4-fbeb-e820-2562-ef4c6fa8fb78}
Followup: MachineOwner
---------
I will try more times on Intel host to check if it's amd host only issue try 10 times with ide disk on the same AMD G4 host,did not hit this issue; try 10 times on an AMD G3 host,hit twice; try 10 times on one intel host,did not hit this issue. Hi Li Jin, Could you please repeat the above test without hv_relaxed flag, but with "-hypervisor" cpu flag specified? Thanks, Vadim. (In reply to Vadim Rozenfeld from comment #6) > Hi Li Jin, > Could you please repeat the above test without hv_relaxed flag, but with > "-hypervisor" cpu flag specified? > > Thanks, > Vadim. run 10 times,hit DPC_WATCHDOG_VIOLATION bsod once. (In reply to lijin from comment #7) > (In reply to Vadim Rozenfeld from comment #6) > > Hi Li Jin, > > Could you please repeat the above test without hv_relaxed flag, but with > > "-hypervisor" cpu flag specified? > > > > Thanks, > > Vadim. > > run 10 times,hit DPC_WATCHDOG_VIOLATION bsod once. Thanks, can you post the relevant crash dump file? Best regards, Vadim. Hi Li Jin. Is it still the case with the latest drivers? Thanks, Vadim. (In reply to Vadim Rozenfeld from comment #10) > Hi Li Jin. > > Is it still the case with the latest drivers? > > Thanks, > Vadim. Try with build 155, run 20+ times, NOT hit this issues again. qemu cli: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_5esrpk/monitor-qmpmonitor1-20180705-051446-F4xN91QH,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_5esrpk/monitor-catch_monitor-20180705-051446-F4xN91QH,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idNuhVsB \ -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_5esrpk/serial-serial0-20180705-051446-F4xN91QH,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20180705-051446-F4xN91QH,path=/var/tmp/avocado_5esrpk/seabios-20180705-051446-F4xN91QH,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20180705-051446-F4xN91QH,iobase=0x402 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -drive id=drive_disk1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/storage.qcow2 \ -device scsi-hd,id=disk1,drive=drive_disk1 \ -device virtio-net-pci,mac=9a:b0:b1:b2:b3:b4,id=idLVY2Lt,vectors=4,netdev=idOmtoIk,bus=pci.0,addr=0x5 \ -netdev tap,id=idOmtoIk,vhost=on,vhostfd=11,fd=19 \ -m 15360 \ -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \ -cpu 'Opteron_G5',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \ -device scsi-cd,id=cd1,drive=drive_cd1 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm Thanks a lot. Can we move it to verified? (In reply to Vadim Rozenfeld from comment #12) > Thanks a lot. > Can we move it to verified? Sure Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3413 |
Description of problem: Version-Release number of selected component (if applicable): virtio-win-prewhql-142 qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64 kernel-3.10.0-693.el7.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch How reproducible: 30% Steps to Reproduce: 1.boot win2016 ***with*** hv_relaxed: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_zoixn5/monitor-qmpmonitor1-20170924-235741-z2TJ3Xkb,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_zoixn5/monitor-catch_monitor-20170924-235741-z2TJ3Xkb,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id1jiZ94 \ -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_zoixn5/serial-serial0-20170924-235741-z2TJ3Xkb,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20170924-235741-z2TJ3Xkb,path=/var/tmp/avocado_zoixn5/seabios-20170924-235741-z2TJ3Xkb,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20170924-235741-z2TJ3Xkb,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2016-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x3 \ -drive id=drive_disk1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/storage.qcow2 \ -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,bus=pci.0,addr=0x4 \ -device virtio-net-pci,mac=9a:dc:dd:de:df:e0,id=iddEKWN0,vectors=4,netdev=idXQjRsV,bus=pci.0,addr=0x5 \ -netdev tap,id=idXQjRsV,vhost=on,vhostfd=21,fd=20 \ -m 16384 \ -smp 16,cores=8,threads=1,sockets=2 \ -cpu 'Opteron_G4',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bootindex=2,bus=ide.0,unit=0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm 2.run iometer in guest # cmd /c Iometer.exe /c iometer.icf /r C:\autotest_iometer_result.csv 3.boot win2016 ***without*** hv_relaxed: 4.run iometer in guest # cmd /c Iometer.exe /c iometer.icf /r C:\autotest_iometer_result.csv Actual results: steps 2,guest stuck during iometer,the stuck stays sometimes only ten minutes and then can finish iometer test,sometimes it stucks more than one hour.During the stuck,I can ping the guest successfully. step4,guest bsod with "DPC_WATCHDOG_VIOLATION" Expected results: no stuck,no bsod Additional info: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6274 Stepping: 2 CPU MHz: 2200.089 BogoMIPS: 4400.17 Virtualization: AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 NUMA node1 CPU(s): 16,18,20,22,24,26,28,30 NUMA node2 CPU(s): 1,3,5,7,9,11,13,15 NUMA node3 CPU(s): 17,19,21,23,25,27,29,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold