RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1495070 - [virtio-win][viostor]windows 2016 stuck/bsod when run iometer on AMD host
Summary: [virtio-win][viostor]windows 2016 stuck/bsod when run iometer on AMD host
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1558351
TreeView+ depends on / blocked
 
Reported: 2017-09-25 06:11 UTC by lijin
Modified: 2018-10-30 16:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
NO_DOCS
Clone Of:
Environment:
Last Closed: 2018-10-30 16:21:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3413 0 None None None 2018-10-30 16:23:53 UTC

Description lijin 2017-09-25 06:11:43 UTC
Description of problem:


Version-Release number of selected component (if applicable):
virtio-win-prewhql-142
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
kernel-3.10.0-693.el7.x86_64
seabios-bin-1.10.2-3.el7_4.1.noarch

How reproducible:
30%

Steps to Reproduce:
1.boot win2016 ***with*** hv_relaxed:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_zoixn5/monitor-qmpmonitor1-20170924-235741-z2TJ3Xkb,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_zoixn5/monitor-catch_monitor-20170924-235741-z2TJ3Xkb,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id1jiZ94  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_zoixn5/serial-serial0-20170924-235741-z2TJ3Xkb,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20170924-235741-z2TJ3Xkb,path=/var/tmp/avocado_zoixn5/seabios-20170924-235741-z2TJ3Xkb,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20170924-235741-z2TJ3Xkb,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2016-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x3 \
    -drive id=drive_disk1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/storage.qcow2 \
    -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,bus=pci.0,addr=0x4 \
    -device virtio-net-pci,mac=9a:dc:dd:de:df:e0,id=iddEKWN0,vectors=4,netdev=idXQjRsV,bus=pci.0,addr=0x5  \
    -netdev tap,id=idXQjRsV,vhost=on,vhostfd=21,fd=20 \
    -m 16384  \
    -smp 16,cores=8,threads=1,sockets=2  \
    -cpu 'Opteron_G4',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=2,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm


2.run iometer in guest
# cmd /c Iometer.exe /c iometer.icf /r C:\autotest_iometer_result.csv

3.boot win2016 ***without*** hv_relaxed:

4.run iometer in guest
# cmd /c Iometer.exe /c iometer.icf /r C:\autotest_iometer_result.csv


Actual results:
steps 2,guest stuck during iometer,the stuck stays sometimes only ten minutes and then can finish iometer test,sometimes it stucks more than one hour.During the stuck,I can ping the guest successfully.
step4,guest bsod with "DPC_WATCHDOG_VIOLATION"

Expected results:
no stuck,no bsod

Additional info:
# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Model name:            AMD Opteron(TM) Processor 6274
Stepping:              2
CPU MHz:               2200.089
BogoMIPS:              4400.17
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
NUMA node1 CPU(s):     16,18,20,22,24,26,28,30
NUMA node2 CPU(s):     1,3,5,7,9,11,13,15
NUMA node3 CPU(s):     17,19,21,23,25,27,29,31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

Comment 2 lijin 2017-09-25 06:12:59 UTC
windbg info:
kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
	DISPATCH_LEVEL or above. The offending component can usually be
	identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

Page 4251e1 not present in the dump file. Type ".hh dbgerr004" for details

DPC_TIMEOUT_TYPE:  DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x133

PROCESS_NAME:  Dynamo.exe

CURRENT_IRQL:  d

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LAST_CONTROL_TRANSFER:  from fffff8022f823102 to fffff8022f7cd510

STACK_TEXT:  
ffff8681`27794d88 fffff802`2f823102 : 00000000`00000133 00000000`00000001 00000000`00001e00 00000000`00000000 : nt!KeBugCheckEx
ffff8681`27794d90 fffff802`2f74e608 : 000000b3`c5b45d34 000000b3`c5b45957 fffff780`00000320 fffff802`2f7c9ae0 : nt! ?? ::FNODOBFM::`string'+0x46762
ffff8681`27794df0 fffff802`2f6114e5 : ffffc68a`9028c700 ffff8681`27719180 ffffc68a`92d43300 ffff8681`27719180 : nt!KeClockInterruptNotify+0xb8
ffff8681`27794f40 fffff802`2f6da357 : ffff8681`2754cc80 00000000`00000000 fffff898`1ea0f6a6 fffff802`2f7ceb65 : hal!HalpTimerClockIpiRoutine+0x15
ffff8681`27794f70 fffff802`2f7ceb8a : ffffc68a`9028c700 ffffc68a`92d43300 00000000`00000000 ffffc68a`92c01ef0 : nt!KiCallInterruptServiceRoutine+0x87
ffff8681`27794fb0 fffff802`2f7cefd7 : ffff8203`2f91ac60 00000000`0000018a ffff8500`00000000 ffffc68a`911cbc00 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffff8681`2778d060 fffff802`2f6e20fb : ffff8203`2f91ac60 00000000`a000000c 00000000`a0000003 ffff8681`2778d2a9 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffff8681`2778d1f0 fffff802`2fd83f65 : ffff8203`2f91ac60 ffff8681`2778d301 ffffffff`ffffffd2 fffff802`2f6122f5 : nt!IopfCompleteRequest+0x84b
ffff8681`2778d310 fffff80c`908045cb : 00000000`00000000 00000000`00000800 ffff8203`2f91ac60 ffffc68a`93147b60 : nt!IovCompleteRequest+0x1c1
ffff8681`2778d3f0 fffff802`2fd84593 : ffff8681`2778d818 ffff8203`2f370e50 ffff8681`2778d818 ffff8203`2f370f68 : CLASSPNP!TransferPktComplete+0x4ab
ffff8681`2778d640 fffff802`2f6e19c2 : ffff8203`2f370e50 ffffc68a`00000001 ffff8681`2778d759 ffffc68a`92fc32b8 : nt!IovpLocalCompletionRoutine+0x16f
ffff8681`2778d6a0 fffff802`2fd83f65 : ffff8203`2f370e50 ffff8681`2778d839 ffffc68a`931c9cf0 ffffc68a`92fe2698 : nt!IopfCompleteRequest+0x112
ffff8681`2778d7c0 fffff80c`900c695e : ffff8681`28e10010 fffff80c`900c4917 ffff8203`2f370e50 00000000`00000000 : nt!IovCompleteRequest+0x1c1
ffff8681`2778d8a0 fffff80c`900c613b : ffff8681`28dd4010 ffff8203`2f370e50 ffff8681`2778da70 00000000`00000000 : storport!RaidCompleteRequestEx+0x8e
ffff8681`2778d970 fffff80c`900c5a6a : 00000000`00000000 ffffc68a`915f01a0 ffff8681`28dd4010 00000000`00000001 : storport!RaidUnitCompleteRequest+0x59b
ffff8681`2778db00 fffff802`2f6e70a1 : ffff8681`2778dce0 00000000`00030000 ffff8681`27719180 ffff8681`2778df40 : storport!RaidpAdapterDpcRoutine+0x10a
ffff8681`2778dbe0 fffff802`2f6e649f : ffffc68a`00000000 ffffc68a`921215c0 ffff8681`2778de30 00000000`00000002 : nt!KiExecuteAllDpcs+0x2b1
ffff8681`2778dd30 fffff802`2f7d25c5 : 00000000`00000000 ffff8681`27719180 00000000`00000000 00000000`00000014 : nt!KiRetireDpcList+0x5df
ffff8681`2778dfb0 fffff802`2f7d23d0 : 00000000`00000008 fffff80c`9045ffdc 00000000`00000000 fffff802`2f7c9ae0 : nt!KxRetireDpcList+0x5
ffff8681`2b7d3b80 fffff802`2f7d0cca : ffffc68a`918ba850 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiDispatchInterruptContinue
ffff8681`2b7d3bb0 fffff802`2f6f6fd2 : ffffc68a`9260d240 00000000`00000014 ffffc68a`9299e8e0 ffff8681`2b7d3e00 : nt!KiDpcInterrupt+0xca
ffff8681`2b7d3d40 fffff80c`9041bfee : 00000000`00000001 00000000`00000001 00000001`ffff0002 ffffc68a`9260d240 : nt!KeReleaseSpinLock+0x22
ffff8681`2b7d3d70 fffff80c`904195d5 : 00000000`00000000 ffffc68a`9299e8e0 ffffc68a`9299e8e0 ffffc68a`926e4160 : tcpip!TcpTcbSend+0x5be
ffff8681`2b7d4130 fffff80c`9041929a : 00000000`00035b35 00000000`00369e99 00000000`00000003 00000000`00000000 : tcpip!TcpEnqueueTcbSendOlmNotifySendComplete+0xa5
ffff8681`2b7d4160 fffff80c`90418ddb : fffff880`005782e0 ffff8541`0197a730 ffff8681`2b7d4b01 fffff802`2f7113a1 : tcpip!TcpEnqueueTcbSend+0x30a
ffff8681`2b7d4260 fffff802`2f711325 : ffff8681`2b7d4b01 ffff8681`2b7d4360 ffff8681`2b7d47a0 fffff80c`90418db0 : tcpip!TcpTlConnectionSendCalloutRoutine+0x2b
ffff8681`2b7d42e0 fffff80c`90461aa6 : ffffc68a`921b2260 00000000`00000000 00000000`00000000 ffffc68a`918552d0 : nt!KeExpandKernelStackAndCalloutInternal+0x85
ffff8681`2b7d4330 fffff80c`90cfa4c1 : ffffc68a`921b2260 ffff8681`2b7d4b80 00000000`00000008 00000000`00000008 : tcpip!TcpTlConnectionSend+0x76
ffff8681`2b7d43a0 fffff80c`90ce1ebd : ffff8203`2f9c2e50 ffff8203`00000000 ffffc68a`912a2080 ffff8681`2b7d4600 : afd!AfdFastConnectionSend+0x3a1
ffff8681`2b7d4560 fffff802`2fb108c3 : 00000000`00000000 ffffc68a`92c21970 00000000`0001201f fffff802`2f610c25 : afd!AfdFastIoDeviceControl+0x40d
ffff8681`2b7d48e0 fffff802`2fb10536 : 00000000`00000000 00000000`0000020c 00000000`00000001 00000000`00000000 : nt!IopXxxControlFile+0x383
ffff8681`2b7d4a20 fffff802`2f7d8193 : 00000000`00000001 00000000`76e145d0 00000000`00172268 00000000`00000000 : nt!NtDeviceIoControlFile+0x56
ffff8681`2b7d4a90 00000000`76e1222c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`0009ef68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76e1222c


STACK_COMMAND:  kb

FOLLOWUP_IP: 
CLASSPNP!TransferPktComplete+4ab
fffff80c`908045cb 4183fc02        cmp     r12d,2

SYMBOL_STACK_INDEX:  9

SYMBOL_NAME:  CLASSPNP!TransferPktComplete+4ab

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: CLASSPNP

IMAGE_NAME:  CLASSPNP.SYS

DEBUG_FLR_IMAGE_TIMESTAMP:  57cf989c

BUCKET_ID_FUNC_OFFSET:  4ab

FAILURE_BUCKET_ID:  0x133_VRF_ISR_CLASSPNP!TransferPktComplete

BUCKET_ID:  0x133_VRF_ISR_CLASSPNP!TransferPktComplete

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x133_vrf_isr_classpnp!transferpktcomplete

FAILURE_ID_HASH:  {d26a3ea4-fbeb-e820-2562-ef4c6fa8fb78}

Followup: MachineOwner
---------

Comment 4 lijin 2017-09-25 06:18:50 UTC
I will try more times on Intel host to check if it's amd host only issue

Comment 5 lijin 2017-09-25 09:34:55 UTC
try 10 times with ide disk on the same AMD G4 host,did not hit this issue;
try 10 times on an AMD G3 host,hit twice;
try 10 times on one intel host,did not hit this issue.

Comment 6 Vadim Rozenfeld 2017-09-27 11:23:58 UTC
Hi Li Jin,
Could you please repeat the above test without hv_relaxed flag, but with "-hypervisor" cpu flag specified?

Thanks,
Vadim.

Comment 7 lijin 2017-10-09 03:24:41 UTC
(In reply to Vadim Rozenfeld from comment #6)
> Hi Li Jin,
> Could you please repeat the above test without hv_relaxed flag, but with
> "-hypervisor" cpu flag specified?
> 
> Thanks,
> Vadim.

run 10 times,hit DPC_WATCHDOG_VIOLATION bsod once.

Comment 8 Vadim Rozenfeld 2017-10-09 17:32:26 UTC
(In reply to lijin from comment #7)
> (In reply to Vadim Rozenfeld from comment #6)
> > Hi Li Jin,
> > Could you please repeat the above test without hv_relaxed flag, but with
> > "-hypervisor" cpu flag specified?
> > 
> > Thanks,
> > Vadim.
> 
> run 10 times,hit DPC_WATCHDOG_VIOLATION bsod once.

Thanks,
can you post the relevant crash dump file?

Best regards,
Vadim.

Comment 10 Vadim Rozenfeld 2018-07-05 07:46:16 UTC
Hi Li Jin.

Is it still the case with the latest drivers?

Thanks,
Vadim.

Comment 11 lijin 2018-07-06 09:21:34 UTC
(In reply to Vadim Rozenfeld from comment #10)
> Hi Li Jin.
> 
> Is it still the case with the latest drivers?
> 
> Thanks,
> Vadim.

Try with build 155, run 20+ times, NOT hit this issues again.

qemu cli:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_5esrpk/monitor-qmpmonitor1-20180705-051446-F4xN91QH,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_5esrpk/monitor-catch_monitor-20180705-051446-F4xN91QH,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idNuhVsB  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_5esrpk/serial-serial0-20180705-051446-F4xN91QH,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20180705-051446-F4xN91QH,path=/var/tmp/avocado_5esrpk/seabios-20180705-051446-F4xN91QH,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180705-051446-F4xN91QH,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -drive id=drive_disk1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/storage.qcow2 \
    -device scsi-hd,id=disk1,drive=drive_disk1 \
    -device virtio-net-pci,mac=9a:b0:b1:b2:b3:b4,id=idLVY2Lt,vectors=4,netdev=idOmtoIk,bus=pci.0,addr=0x5  \
    -netdev tap,id=idOmtoIk,vhost=on,vhostfd=11,fd=19 \
    -m 15360  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'Opteron_G5',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm

Comment 12 Vadim Rozenfeld 2018-07-07 11:27:09 UTC
Thanks a lot.
Can we move it to verified?

Comment 13 lijin 2018-07-09 01:34:40 UTC
(In reply to Vadim Rozenfeld from comment #12)
> Thanks a lot.
> Can we move it to verified?

Sure

Comment 16 errata-xmlrpc 2018-10-30 16:21:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3413


Note You need to log in before you can comment on or make changes to this bug.