Bug 1453090
Summary: | [virtio-win]DPC_WATCHDOG_VIOLATION error when do balloon stress on Win8.1-x86_64 guest | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Suqin Huang <shuang> | ||||
Component: | virtio-win | Assignee: | Amnon Ilan <ailan> | ||||
virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | medium | CC: | ailan, ddepaula, lijin, lmiksik, mtessun, phou, vrozenfe, xiagao | ||||
Version: | 7.4 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-10 06:28:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1473046 | ||||||
Attachments: |
|
following is the windbg info,looks not balloon issue 0: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DPC_WATCHDOG_VIOLATION (133) The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above. Arguments: Arg1: 0000000000000001, The system cumulatively spent an extended period of time at DISPATCH_LEVEL or above. The offending component can usually be identified with a stack trace. Arg2: 0000000000001e00, The watchdog period. Arg3: 0000000000000000 Arg4: 0000000000000000 Debugging Details: ------------------ DPC_TIMEOUT_TYPE: DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: 0x133 PROCESS_NAME: System CURRENT_IRQL: d ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre LAST_CONTROL_TRANSFER: from fffff8009fc1bcf0 to fffff8009fbd59a0 STACK_TEXT: fffff800`a146fc88 fffff800`9fc1bcf0 : 00000000`00000133 00000000`00000001 00000000`00001e00 00000000`00000000 : nt!KeBugCheckEx fffff800`a146fc90 fffff800`9fb400e7 : 00000000`00000000 00000000`00000000 00000000`00000001 fffff800`9fbd77a8 : nt! ?? ::FNODOBFM::`string'+0x35e40 fffff800`a146fd20 fffff800`9fa1d67f : fffff800`6a28c020 00000000`00000000 fffff800`9fa699b0 00000000`00000000 : nt!KeClockInterruptNotify+0x787 fffff800`a146ff40 fffff800`9fb674f3 : fffff800`9fa69900 00000000`00000000 ffffd001`751823c0 fffff800`9fbd6a87 : hal!HalpTimerClockInterrupt+0x4f fffff800`a146ff70 fffff800`9fbd6e2a : fffff800`9fa69900 00000000`c00000bb 00000000`00000002 ffffe000`2c530440 : nt!KiCallInterruptServiceRoutine+0xa3 fffff800`a146ffb0 fffff800`9fbd779b : 00000000`00000000 00000000`000005b4 ffffe000`000005b4 00000000`00000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea fffff800`a1467230 fffff800`6a120d88 : ffffe000`2c5c8ec0 fffff800`a14674c0 ffffe000`2c5c8ec0 ffffe000`2b3e7980 : nt!KiInterruptDispatchNoLockNoEtw+0xfb fffff800`a14673c0 fffff800`6a107571 : ffffe000`00000000 ffffe000`2b798010 fffff800`a1467559 fffff800`a14675f0 : tcpip!IpNlpSetPathInfo+0x616 fffff800`a14674a0 fffff800`6a106d58 : ffffe000`2c3550a0 00000000`7b0e6809 fffff800`a1467930 ffffe000`2c3550a0 : tcpip!TcpBwUpdateEre+0x59c fffff800`a14675c0 fffff800`6a10afa7 : fffff800`a1467740 fffff800`a1467930 00000004`ce7b1f2a 00000000`000005b4 : tcpip!TcpBwManageTrackedData+0x128 fffff800`a1467640 fffff800`6a10c787 : ffffe000`2bc88d08 ffffe000`2bb1d1a0 fffff800`00000000 00000000`00000001 : tcpip!TcpTcbFastDatagram+0x757 fffff800`a1467890 fffff800`6a10d370 : 00000000`00000000 fffff800`6a1453bf ffffe000`29ca6650 ffffe000`2b69dbb0 : tcpip!TcpTcbReceive+0x247 fffff800`a14679e0 fffff800`6a10c123 : ffffd001`77162022 ffffe000`2b73a140 00000000`00000000 ffffe000`2b673f40 : tcpip!TcpMatchReceive+0x1f0 fffff800`a1467b70 fffff800`6a141f9f : ffffe000`2b773470 fffff800`a1467e99 fffff800`a14620c0 00000000`00000000 : tcpip!TcpPreValidatedReceive+0x363 fffff800`a1467c70 fffff800`6a13e143 : fffff800`00000000 fffff800`00000000 ffffe000`ffffffff fffff800`00000000 : tcpip!IppDeliverListToProtocol+0x4f fffff800`a1467d30 fffff800`6a13c525 : ffffe000`2bc137b0 00000000`00000000 ffffe000`2b5bd700 fffff800`a1467e38 : tcpip!IppProcessDeliverList+0x63 fffff800`a1467dd0 fffff800`6a13a900 : ffffe000`2bb1d1a0 ffffe000`2b803c80 fffff800`6a28c180 ffffe000`2b73a000 : tcpip!IppReceiveHeaderBatch+0x235 fffff800`a1467f00 fffff800`6a138d22 : ffffe000`2b9c2510 00000000`00000000 fffff800`a1468301 00000000`00000000 : tcpip!IppFlcReceivePacketsCore+0x680 fffff800`a1468280 fffff800`6a139745 : ffffe000`2b9e0002 00000000`00000000 fffff800`6a139790 ffffe000`00000101 : tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x318 fffff800`a1468360 fffff800`9fac3873 : 00000000`0000000a 00000000`00000000 ffffe000`2b78c910 fffff800`a1463000 : tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x155 fffff800`a1468490 fffff800`6a139936 : fffff800`6a1395f0 fffff800`a14685b0 00000000`00000010 00000000`00000001 : nt!KeExpandKernelStackAndCalloutInternal+0xf3 fffff800`a1468580 fffff800`69e45a53 : 00000000`00000000 fffff800`a1468661 00000000`00000006 ffffe000`2bbb3d38 : tcpip!FlReceiveNetBufferListChain+0xb6 fffff800`a1468600 fffff800`69e45e7f : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00000006 : ndis!ndisMIndicateNetBufferListsToOpen+0x123 fffff800`a14686c0 fffff800`69e466b2 : ffffe000`2bb1d1a0 ffffe563`aa3a9e01 fffff800`69e524e0 fffff800`6ab7db67 : ndis!ndisMTopReceiveNetBufferLists+0x22f fffff800`a1468750 fffff800`6ab75f81 : 00000000`0000000a ffffe000`2bbaf910 ffffe000`2bba9548 fffff800`6a13368c : ndis!NdisMIndicateReceiveNetBufferLists+0x732 fffff800`a1468940 fffff800`6ab73358 : fffff800`0000000a ffffe000`29ca6650 fffff800`00000000 ffffe000`2b803c80 : netkvm+0x4f81 fffff800`a14689b0 fffff800`6ab7f988 : ffffe000`2bba9000 fffff800`a1468ad0 00000000`00000000 00000000`00000007 : netkvm+0x2358 fffff800`a1468a10 fffff800`69e541b8 : ffffe000`2b7a4980 fffff800`a1468a78 fffff800`00000000 00001f80`00e5026a : netkvm+0xe988 fffff800`a1468a70 fffff800`69e544b3 : ffffe000`2bc12528 00000000`0000000a 00000000`00000001 00000000`00000000 : ndis!ndisMiniportDpc+0x110 fffff800`a1468b10 fffff800`69e6680e : 00000004`00000005 ffffe000`2b6073d0 fffff800`a1468ea0 ffffe000`2bc12e68 : ndis!ndisQueuedMiniportDpcWorkItem+0xfb fffff800`a1468bb0 fffff800`9fb4fcd0 : fffff800`9fd8af00 fffff800`a1468ea0 00000000`00000000 00000000`00000000 : ndis!ndisPeriodicReceivesTimer+0x12a fffff800`a1468be0 fffff800`9fb4ef87 : ffffe000`2c7d9080 fffff800`a1468e30 fffff800`9fd88180 fffff800`9fd88180 : nt!KiExecuteAllDpcs+0x1b0 fffff800`a1468d30 fffff800`9fbd8ad5 : 00000000`00000000 fffff800`9fd88180 fffff800`9fa69900 00000000`00000000 : nt!KiRetireDpcList+0xd7 fffff800`a1468fb0 fffff800`9fbd88d9 : fffff800`9fd88180 fffff800`9fbd88ef ffffe000`28d1d300 ffffd001`74ecddd0 : nt!KxRetireDpcList+0x5 ffffd001`74ecd8a0 fffff800`9fbdab45 : fffff800`9fa69900 fffff800`9fbd7813 ffffd001`74ecd8f0 00000000`00000018 : nt!KiDispatchInterruptContinue ffffd001`74ecd8d0 fffff800`9fbd7813 : ffffd001`74ecd8f0 00000000`00000018 ffffe000`2c31b880 fffff800`9fd88180 : nt!KiDpcInterruptBypass+0x25 ffffd001`74ecd8e0 fffff800`9fab6da9 : fffff800`9fd88180 ffffe000`28d1d300 ffffd001`fffffffe fffff800`fffffffe : nt!KiInterruptDispatchNoLockNoEtw+0x173 ffffd001`74ecda70 fffff800`9fab67f9 : 00000000`00000001 fffff800`9fd8af28 00000000`00000000 00000000`00000001 : nt!KiSwapThread+0x179 ffffd001`74ecdb10 fffff800`9faf667f : 00000000`00000000 00000000`00000005 00000000`00000000 ffffe000`28d1d300 : nt!KiCommitThreadWait+0x129 ffffd001`74ecdb90 fffff800`9fbb45b0 : fffff800`9fd88180 00000000`00000000 ffffe000`28d1d300 00bb0000`00000026 : nt!KeWaitForGate+0x10b ffffd001`74ecdbe0 fffff800`9fb86c70 : 00530059`0053000a ffffe000`28d1d300 00000000`00000080 00010000`0000b8c8 : nt!KiExecuteDpc+0x74 ffffd001`74ecdd40 fffff800`9fbdbfc6 : ffffd001`78bc7180 ffffe000`28d1d300 ffffd001`78bd33c0 00310047`004f004c : nt!PspSystemThreadStartup+0x58 ffffd001`74ecdda0 00000000`00000000 : ffffd001`74ece000 ffffd001`74ec8000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND: kb FOLLOWUP_IP: tcpip!IpNlpSetPathInfo+616 fffff800`6a120d88 488b0d91b21600 mov rcx,qword ptr [tcpip!WPP_GLOBAL_Control (fffff800`6a28c020)] SYMBOL_STACK_INDEX: 7 SYMBOL_NAME: tcpip!IpNlpSetPathInfo+616 FOLLOWUP_NAME: MachineOwner MODULE_NAME: tcpip IMAGE_NAME: tcpip.sys DEBUG_FLR_IMAGE_TIMESTAMP: 54505542 BUCKET_ID_FUNC_OFFSET: 616 FAILURE_BUCKET_ID: 0x133_VRF_ISR_tcpip!IpNlpSetPathInfo BUCKET_ID: 0x133_VRF_ISR_tcpip!IpNlpSetPathInfo ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:0x133_vrf_isr_tcpip!ipnlpsetpathinfo FAILURE_ID_HASH: {d0da75d6-6f5f-c4bc-a004-0c20d177c7c6} Followup: MachineOwner --------- It may still be caused by balloon, even if it's not on the stack when the bugcheck fired. Could you please try this a few more times to see if you can get dumps with balloon on the stack? One thing we may improve in the driver is make sure that we don't hold locks when notifying virtqueues. If for some reason the host takes longer to process the virtqueue, the driver is stuck in the kick I/O with a spinlock held -- and that means IRLQ = DISPATCH_LEVEL. virtqueue_kick can be split into virtqueue_kick_prepare (must run under lock) and virtqueue_notify (may run at PASSIVE_LEVEL). Commit db6a904 is an example of this. (In reply to Ladi Prosek from comment #6) > It may still be caused by balloon, even if it's not on the stack when the > bugcheck fired. Could you please try this a few more times to see if you can > get dumps with balloon on the stack? Could you give some suggestion how to reproduce it for getting dumps with balloon on the stack? Thanks, xiagao > > One thing we may improve in the driver is make sure that we don't hold locks > when notifying virtqueues. If for some reason the host takes longer to > process the virtqueue, the driver is stuck in the kick I/O with a spinlock > held -- and that means IRLQ = DISPATCH_LEVEL. > > virtqueue_kick can be split into virtqueue_kick_prepare (must run under > lock) and virtqueue_notify (may run at PASSIVE_LEVEL). Commit db6a904 is an > example of this. (In reply to xiagao from comment #7) > (In reply to Ladi Prosek from comment #6) > > It may still be caused by balloon, even if it's not on the stack when the > > bugcheck fired. Could you please try this a few more times to see if you can > > get dumps with balloon on the stack? > > Could you give some suggestion how to reproduce it for getting dumps with > balloon on the stack? We just need to keep trying, using the steps in comment 0, and hope balloon will be on the stack when the BSOD occurs. I don't think there's anything we can do to help it. Can you please get a few more dumps and share them with me? Thanks! (In reply to Ladi Prosek from comment #8) > (In reply to xiagao from comment #7) > > (In reply to Ladi Prosek from comment #6) > > > It may still be caused by balloon, even if it's not on the stack when the > > > bugcheck fired. Could you please try this a few more times to see if you can > > > get dumps with balloon on the stack? > > > > Could you give some suggestion how to reproduce it for getting dumps with > > balloon on the stack? > > We just need to keep trying, using the steps in comment 0, and hope balloon > will be on the stack when the BSOD occurs. I don't think there's anything we > can do to help it. > > Can you please get a few more dumps and share them with me? Thanks! I have tried 150 times and have not hit this issue. (In reply to xiagao from comment #9) > I have tried 150 times and have not hit this issue. Bummer. I'll do one more pass over the source code to see if there's anything else that we can improve. If nothing else, I'll at least split virtqueue_kick per comment 6. Thanks! package info: virtio-win-1.9.3-1.el7.iso qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64 kernel-3.10.0-693.el7.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch I file bz1495070 to track the iometer issue The virtqueue locking tweak outlined in comment 6 has been committed as: https://github.com/virtio-win/kvm-guest-drivers-windows/commit/75d2bd1d35aae39389b05b12191142ae0ed75906 Test Used Versions: kernel-3.10.0-776.el7.x86_64 qemu-kvm-rhev-2.10.0-6.el7.x86_64 seabios-bin-1.11.0-1.el7.noarch virtio-win-prewhql-143 Host info: [root@amd-9600b-8-1 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 16 Model: 2 Model name: AMD Phenom(tm) 9600B Quad-Core Processor Stepping: 3 CPU MHz: 2300.000 CPU max MHz: 2300.0000 CPU min MHz: 1150.0000 BogoMIPS: 4609.93 Virtualization: AMD-V L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 2048K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs hw_pstate vmmcall npt lbrv svm_lock Thanks~ change status to verified according to comment#23 and comment#24 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0657 |
Created attachment 1280908 [details] error info Description of problem: Version-Release number of selected component (if applicable): virtio-win-prewhql-0.1-137 qemu-kvm-rhev-2.9.0-4.el7.x86_64 How reproducible: some time Steps to Reproduce: 1. run video in background /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_VSspUc/monitor-qmpmonitor1-20170522-013229-Q6Zfqcvc,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_VSspUc/monitor-catch_monitor-20170522-013229-Q6Zfqcvc,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idf4QrVI \ -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_VSspUc/serial-serial0-20170522-013229-Q6Zfqcvc,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20170522-013229-Q6Zfqcvc,path=/var/tmp/avocado_VSspUc/seabios-20170522-013229-Q6Zfqcvc,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20170522-013229-Q6Zfqcvc,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win8-64.1-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x3 \ -device virtio-net-pci,mac=9a:63:64:65:66:67,id=idcudggr,vectors=4,netdev=idRUhH89,bus=pci.0,addr=0x4 \ -netdev tap,id=idRUhH89,vhost=on,vhostfd=20,fd=19 \ -m 8192 \ -smp 8,cores=4,threads=1,sockets=2 \ -cpu 'Opteron_G3',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \ -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bootindex=1,bus=ide.0,unit=0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 2. repeat balloon test with random mem 500 times {'execute': 'balloon', 'arguments': {'value': 7796162560}, 'id': 'fR5eEZSM'} 3. Actual results: DPC_WATCHDOG_VIOLATION error Expected results: Additional info: Guest: Win8.1-x86_64 host: processor : 15 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6128 stepping : 1 microcode : 0x10000d9 cpu MHz : 2000.089 cache size : 512 KB physical id : 1 siblings : 8 core id : 3 cpu cores : 8 apicid : 39 initial apicid : 23 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save pausefilter