Bug 1751017 - Hits BSOD in windows2019 guest when do iozone long time stress testing
Summary: Hits BSOD in windows2019 guest when do iozone long time stress testing
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: x86_64
OS: Windows
high
medium
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: qing.wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-11 03:42 UTC by qing.wang
Modified: 2020-06-13 15:07 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description qing.wang 2019-09-11 03:42:26 UTC
Description of problem:

Hit BSOD {'kvm_version': '4.18.0-137.el8.x86_64', 'qemu_version': 'qemu-kvm-core-4.1.0-6.module+el8.1.0+4164+854d66f5.x86_64'} when execute iozone 


Host:
4.18.0-137.el8.x86_64
qemu-kvm-core-4.1.0-6.module+el8.1.0+4164+854d66f5.x86_64

Guest:
windows 2019 with virtio-win-prewhql-0.1-173


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.boot guest with below cmd lines.
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_tvspuerp/monitor-qmpmonitor1-20190910-045030-6wAagViP,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_tvspuerp/monitor-catch_monitor-20190910-045030-6wAagViP,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idERmoET \
    -chardev socket,server,id=chardev_serial0,path=/var/tmp/avocado_tvspuerp/serial-serial0-20190910-045030-6wAagViP,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190910-045030-6wAagViP,path=/var/tmp/avocado_tvspuerp/seabios-20190910-045030-6wAagViP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190910-045030-6wAagViP,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio.raw \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -drive id=drive_stg0,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/kvm_autotest_root/images/storage0.raw \
    -device virtio-blk-pci,id=stg0,drive=drive_stg0,bootindex=1,bus=pci.0,addr=0x5 \
    -drive id=drive_stg1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/kvm_autotest_root/images/storage1.raw \
    -device virtio-blk-pci,id=stg1,drive=drive_stg1,bootindex=2,bus=pci.0,addr=0x6 \
    -device virtio-net-pci,mac=9a:f4:67:57:41:48,id=idynkpcz,netdev=idZri0rC,bus=pci.0,addr=0x7  \
    -netdev tap,id=idZri0rC,vhost=on,vhostfd=23,fd=17 \
    -m 30720  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=3,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm

2. long time running command on guest
D:\Iozone\iozone.exe -a -I -f I:\testfile
D:\Iozone\iozone.exe -a -I -f J:\testfile
D:\Iozone\iozone.exe -a -I -f C:\testfile

3. wait 3-5 minutes repeat step 2

Actual results:
Sometimes it hit BSOD

Expected results:
Not BSOD

Additional info:
7: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CRITICAL_STRUCTURE_CORRUPTION (109)
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
 or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
 debugger that was not attached when the system was booted. Normal breakpoints,
 "bp", can only be set if the debugger is attached at boot time. Hardware
 breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
Arguments:
Arg1: a3a01f58f819188f, Reserved
Arg2: b3b72bdf4a9c67e5, Reserved
Arg3: 0007032000000000, Failure type dependent information
Arg4: 0000000000000017, Type of corrupted region, can be
	0   : A generic data region
	1   : Modification of a function or .pdata
	2   : A processor IDT
	3   : A processor GDT
	4   : Type 1 process list corruption
	5   : Type 2 process list corruption
	6   : Debug routine modification
	7   : Critical MSR modification
	8   : Object type
	9   : A processor IVT
	a   : Modification of a system service function
	b   : A generic session data region
	c   : Modification of a session function or .pdata
	d   : Modification of an import table
	e   : Modification of a session import table
	f   : Ps Win32 callout modification
	10  : Debug switch routine modification
	11  : IRP allocator modification
	12  : Driver call dispatcher modification
	13  : IRP completion dispatcher modification
	14  : IRP deallocator modification
	15  : A processor control register
	16  : Critical floating point control register modification
	17  : Local APIC modification
	18  : Kernel notification callout modification
	19  : Loaded module list modification
	1a  : Type 3 process list corruption
	1b  : Type 4 process list corruption
	1c  : Driver object corruption
	1d  : Executive callback object modification
	1e  : Modification of module padding
	1f  : Modification of a protected process
	20  : A generic data region
	21  : A page hash mismatch
	22  : A session page hash mismatch
	23  : Load config directory modification
	24  : Inverted function table modification
	25  : Session configuration modification
	26  : An extended processor control register
	27  : Type 1 pool corruption
	28  : Type 2 pool corruption
	29  : Type 3 pool corruption
	2a  : Type 4 pool corruption
	2b  : Modification of a function or .pdata
	2c  : Image integrity corruption
	101 : General pool corruption
	102 : Modification of win32k.sys

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  17763.1.amd64fre.rs5_release.180914-1434

SYSTEM_MANUFACTURER:  Red Hat

SYSTEM_PRODUCT_NAME:  KVM

SYSTEM_VERSION:  RHEL 7.6.0 PC (i440FX + PIIX, 1996)

BIOS_VENDOR:  SeaBIOS

BIOS_VERSION:  1.12.0-5.module+el8.1.0+4022+29a53beb

BIOS_DATE:  04/01/2014

DUMP_TYPE:  1

BUGCHECK_P1: a3a01f58f819188f

BUGCHECK_P2: b3b72bdf4a9c67e5

BUGCHECK_P3: 7032000000000

BUGCHECK_P4: 17

CPU_COUNT: c

CPU_MHZ: d40

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 55

CPU_STEPPING: 4

CPU_MICROCODE: 6,55,4,0 (F,M,S,R)  SIG: 1'00000000 (cache) 1'00000000 (init)

DEFAULT_BUCKET_ID:  BAD_STACK_0x109

BUGCHECK_STR:  0x109

PROCESS_NAME:  System

CURRENT_IRQL:  2

ANALYSIS_SESSION_HOST:  WIN-3IORRL4PE1F

ANALYSIS_SESSION_TIME:  09-11-2019 10:45:11.0117

ANALYSIS_VERSION: 10.0.16299.15 amd64fre

STACK_TEXT:  
ffff8185`126e62d8 00000000`00000000 : 00000000`00000109 a3a01f58`f819188f b3b72bdf`4a9c67e5 00070320`00000000 : nt!KeBugCheckEx


THREAD_SHA1_HASH_MOD_FUNC:  81a83ae0317433a47fcc36991983df3b6e638b71

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  6e16edd8c7dd677734fdbcd2397a2e35e9fae964

THREAD_SHA1_HASH_MOD:  76cd06466d098060a9eb26e5fd2a25cb1f3fe0a3

SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME:  Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID:  BAD_STACK_0x109

PRIMARY_PROBLEM_CLASS:  BAD_STACK_0x109

FAILURE_BUCKET_ID:  BAD_STACK_0x109

TARGET_TIME:  2019-09-09T22:06:50.000Z

OSBUILD:  17763

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  400

PRODUCT_TYPE:  3

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 Server TerminalServer DataCenter SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  unknown_date

BUILDDATESTAMP_STR:  180914-1434

BUILDLAB_STR:  rs5_release

BUILDOSVER_STR:  10.0.17763.1.amd64fre.rs5_release.180914-1434

ANALYSIS_SESSION_ELAPSED_TIME:  8ba

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:bad_stack_0x109

FAILURE_ID_HASH:  {b4d7023a-05c3-49b2-3ea4-6240fe57d90e}

Followup:     MachineOwner
---------

Comment 1 qing.wang 2019-09-11 05:12:25 UTC
(workspace) root@dell-per740xd-01 /sys/module/kvm_intel/parameters #  for f in $(ls);do echo "$f=$(cat $f)";done
emulate_invalid_guest_state=Y
enable_apicv=Y
enable_shadow_vmcs=Y
enlightened_vmcs=N
ept=Y
eptad=Y
fasteoi=Y
flexpriority=Y
nested=Y
nested_early_check=N
ple_gap=128
ple_window=4096
ple_window_grow=2
ple_window_max=4294967295
ple_window_shrink=0
pml=Y
preemption_timer=Y
pt_mode=0
unrestricted_guest=Y
vmentry_l1d_flush=cond
vnmi=Y
vpid=Y

Comment 4 Ademar Reis 2020-02-05 23:05:55 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 5 Paolo Bonzini 2020-05-13 22:35:45 UTC
I think that bits 16-23 of arg3 are the offset that didn't match, i.e. 0x320 which is APIC_LVTT.  Is it possible to grab a trace of booting the affected OS, using "trace-cmd record -e kvm" in front of the QEMU invocation?  Just a few seconds until the login screen should be enough.  Thanks!

Comment 6 qing.wang 2020-05-26 12:27:50 UTC
I can not reproduce this bug

qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9

Guest:
windows 2019 with virtio-win-prewhql-0.1-185.iso

Test steps:
1.create images
 qemu-img create -f raw /home/kvm_autotest_root/images/storage0.raw 10G
  qemu-img create -f raw /home/kvm_autotest_root/images/storage1.raw 11G

2.boot vm

trace-cmd record -e kvm \
  /usr/libexec/qemu-kvm \
   \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -drive id=drive_stg0,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/kvm_autotest_root/images/storage0.raw \
    -device virtio-blk-pci,id=stg0,drive=drive_stg0,bootindex=1,bus=pci.0,addr=0x5 \
    -drive id=drive_stg1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/kvm_autotest_root/images/storage1.raw \
    -device virtio-blk-pci,id=stg1,drive=drive_stg1,bootindex=2,bus=pci.0,addr=0x6 \
    -device virtio-net-pci,mac=9a:f4:67:57:41:48,id=idynkpcz,netdev=idZri0rC,bus=pci.0,addr=0x7  \
    -netdev tap,id=idZri0rC,vhost=on \
    -m 8096  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=3,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :6 \
        -rtc base=localtime,clock=host,driftfix=slew \
        -enable-kvm \
        -qmp tcp:0:5956,server,nowait \
        -monitor stdio \


3.online disks in guest


4.long time running command on guest
D:\Iozone\iozone.exe -a -I -f E:\testfile
D:\Iozone\iozone.exe -a -I -f F:\testfile
D:\Iozone\iozone.exe -a -I -f C:\testfile

5. wait 3-5 minutes repeat step 4

Comment 7 Felipe Franciosi 2020-05-26 21:06:40 UTC
I can reproduce it, but it takes hours or days sometimes.

In one occasion, I reproduced it by having a single Windows 2016 VM (8 vCPUs and 16GB RAM) running on a host. I connected to it via RDP, opened Chrome and watched a fireplace YouTube video which is 10 hours long. Then I migrated the VM back and forth between two hosts about 1,000 times (took a few hours). It crashed with exactly the same error as reported.

Comment 8 Vadim Rozenfeld 2020-05-27 01:26:37 UTC
(In reply to Felipe Franciosi from comment #7)
> I can reproduce it, but it takes hours or days sometimes.
> 
> In one occasion, I reproduced it by having a single Windows 2016 VM (8 vCPUs
> and 16GB RAM) running on a host. I connected to it via RDP, opened Chrome
> and watched a fireplace YouTube video which is 10 hours long. Then I
> migrated the VM back and forth between two hosts about 1,000 times (took a
> few hours). It crashed with exactly the same error as reported.

Hi Felipe,
What is the kernel and qemu versions in your case?

Thanks,
Vadim.

Comment 9 Felipe Franciosi 2020-05-27 11:16:41 UTC
Hi Vadim,

Issue is seen on kernel 4.19 and 5.4 w/ QEMU 2.12 (RH).
Issue was not seen in Kernel 4.4 w/ QEMU 2.6 (also RH).

That, plus the looks of the bug sounds like a KVM regression of sorts.

Thanks

Comment 10 Vadim Rozenfeld 2020-05-28 07:01:17 UTC
(In reply to Felipe Franciosi from comment #9)
> Hi Vadim,
> 
> Issue is seen on kernel 4.19 and 5.4 w/ QEMU 2.12 (RH).
> Issue was not seen in Kernel 4.4 w/ QEMU 2.6 (also RH).
> 
> That, plus the looks of the bug sounds like a KVM regression of sorts.
> 
> Thanks

Yep,

We saw this problem before, but it disappeared after 

b1394e745b9453dcb5b0671c205b770e87dedb87
commit b1394e745b9453dcb5b0671c205b770e87dedb87
Author: Radim Krčmář <rkrcmar@redhat.com>
Date:   Thu Nov 30 19:05:45 2017 +0100

    KVM: x86: fix APIC page invalidation
    
    Implementation of the unpinned APIC page didn't update the VMCS address
    cache when invalidation was done through range mmu notifiers.
    This became a problem when the page notifier was removed.

Best,
Vadim.

Comment 11 qing.wang 2020-05-28 08:04:36 UTC
This bus come from automation test, i run the test about 12 hours,
i do not find this issue on 

qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9

Guest:
windows 2016 with virtio-win-prewhql-0.1-185.iso

python3 ConfigTest.py --nrepeat=1 --platform=x86_64 --guestname=Win2016 --testcase=iozone_windows.long_time_stress.i440fx --driveformat=virtio_blk --machine=i440fx --imageformat=raw --customsparam="qemu_force_use_drive_expression = yes\ncdrom_virtio = isos/windows/virtio-win-prewhql-0.1-185.iso"

Comment 12 Felipe Franciosi 2020-06-02 15:35:57 UTC
We tried QEMU 4.2 (upstream) with Kernel 5.4 and the issue is still there.

Comment 13 Paolo Bonzini 2020-06-05 17:32:14 UTC
From chat with Felipe:

The race condition happens when the SPTE (memslot) and the APIC access page phys addr are out of sync, possibly because of a race condition between kvm_vcpu_reload_apic_access_page and the creation of SPTEs.

I'm thinking of removing kvm_arch_mmu_notifier_invalidate_range, and instead triggering the update from set_spte.

Comment 14 Eiichi Tsukata 2020-06-07 23:51:39 UTC
Hello Paolo

I submitted RFC patch to fix the issue: https://lore.kernel.org/lkml/20200606042627.61070-1-eiichi.tsukata@nutanix.com/T/#u
Could you take a look at the patch?

Thanks

Comment 15 Jon Kohler 2020-06-13 15:07:59 UTC
fix now merged, here: https://github.com/torvalds/linux/commit/e649b3f0188f8


Note You need to log in before you can comment on or make changes to this bug.