Bug 1261797

Summary: contents of MSR_TSC_AUX are not migrated
Product: Red Hat Enterprise Linux 7 Reporter: Xiaoqing Wei <xwei>
Component: qemu-kvm-rhevAssignee: Amit Shah <amit.shah>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact: Jiri Herrmann <jherrman>
Priority: medium    
Version: 7.2CC: ailan, amit.shah, areis, chayang, dgilbert, hhuang, jen, juzhang, knoel, lijin, pbonzini, virt-maint, zhguo
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.5.0-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1265427 1265428 (view as bug list) Environment:
Last Closed: 2016-11-07 20:37:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1265427, 1265428, 1287070, 1288337, 1305606, 1313485    
Attachments:
Description Flags
bsod minidump
none
windbg output none

Description Xiaoqing Wei 2015-09-10 07:44:38 UTC
Description of problem:

Windows 10 x86_64 BSOD during migrating while formating emulated usb-storage

Version-Release number of selected component (if applicable):
kernel-3.10.0-314.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64
spice-server-0.12.4-13.el7.x86_64


How reproducible:
1/5

happen once, 4 attempts to reproduce but failed.

Steps to Reproduce:
1. boot a vm with emulated usb-storage

/usr/libexec/qemu-kvm -monitor stdio \
    -S  \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-virt-tests-vm1-qmpmonitor1-20150908-154232-mKWvp7CD,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-virt-tests-vm1-catch_monitor-20150908-154232-mKWvp7CD,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idHrfoDK  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150908-154232-mKWvp7CD,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20150908-154232-mKWvp7CD,path=/tmp/seabios-20150908-154232-mKWvp7CD,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20150908-154232-mKWvp7CD,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file='/home/win10.qcow2' \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on \
    -device virtio-net-pci,mac=9a:3a:3b:3c:3d:3e,id=idNIRZOu,vectors=4,netdev=idZpUNpM,bus=pci.0,addr=05,disable-legacy=off,disable-modern=on  \
    -netdev tap,id=idZpUNpM,vhost=on  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'SandyBridge',+sep,+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file='/home/en_windows_10_enterprise_x64_dvd_6851151.iso' \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=10,bus=ide.0,unit=0 \
    -drive id=drive_winutils,if=none,snapshot=off,aio=native,media=cdrom,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso \
    -device ide-cd,id=winutils,drive=drive_winutils,bootindex=2,bus=ide.0,unit=1 \
    -drive id=drive_unattended,if=none,snapshot=off,aio=native,media=cdrom,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/win8.1-64/autounattend.iso \
    -device ide-cd,id=unattended,drive=drive_unattended,bootindex=3,bus=ide.1,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=5900,disable-ticketing,addr=0,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -drive id=drive_image2,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file='/home/1G.qcow2' \
    -enable-kvm \
    -device usb-storage,drive=drive_image2,id=virt0-0-1,bus=usb1.0,bootindex=20,serial=3ff69574-14e5-4139-8f5c-942ea8748ad7 \
    -boot order=cdn,once=d,menu=off,strict=off \
    -chardev spicevmc,id=charredir0,name=usbredir \
    -device usb-redir,chardev=charredir0,id=redir0 \



2. inside of guest:

diskpart
sel disk 1        -> this is the emulated usb
create part pri   -> create a primary partition
format fs=ntfs label=usb -> format it as NTFS, sector by sector, without the laziness 'quick' option

3. migrate this vm

/usr/libexec/qemu-kvm -monitor stdio \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-virt-tests-vm1-qmpmonitor1-20150908-154232-mKWvp7CD,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-virt-tests-vm1-catch_monitor-20150908-154232-mKWvp7CD,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idHrfoDK  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150908-154232-mKWvp7CD,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20150908-154232-mKWvp7CD,path=/tmp/seabios-20150908-154232-mKWvp7CD,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20150908-154232-mKWvp7CD,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file='/home/win10.qcow2' \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on \
    -device virtio-net-pci,mac=9a:3a:3b:3c:3d:3e,id=idNIRZOu,vectors=4,netdev=idZpUNpM,bus=pci.0,addr=05,disable-legacy=off,disable-modern=on  \
    -netdev tap,id=idZpUNpM,vhost=on  \
    -m 2048  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'SandyBridge',+sep,+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file='/home/en_windows_10_enterprise_x64_dvd_6851151.iso' \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=10,bus=ide.0,unit=0 \
    -drive id=drive_winutils,if=none,snapshot=off,aio=native,media=cdrom,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/isos/windows/winutils.iso \
    -device ide-cd,id=winutils,drive=drive_winutils,bootindex=2,bus=ide.0,unit=1 \
    -drive id=drive_unattended,if=none,snapshot=off,aio=native,media=cdrom,file=/home/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/win8.1-64/autounattend.iso \
    -device ide-cd,id=unattended,drive=drive_unattended,bootindex=3,bus=ide.1,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=5910,disable-ticketing,addr=0,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -drive id=drive_image2,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file='/home/1G.qcow2' \
    -enable-kvm \
    -device usb-storage,drive=drive_image2,id=virt0-0-1,bus=usb1.0,bootindex=20,serial=3ff69574-14e5-4139-8f5c-942ea8748ad7 \
    -boot order=cdn,once=d,menu=off,strict=off \
    -chardev spicevmc,id=charredir0,name=usbredir \
    -device usb-redir,chardev=charredir0,id=redir0 \
\
\
\
\
    -incoming tcp:0:4000


on src qemu:
migrate -d tcp:xx:yy


Actual results:

guest bsod in few min
should be able to see the minidump in C:\windows\minidump

Expected results:
both host and guest work well

Additional info:

Comment 1 Xiaoqing Wei 2015-09-10 07:49:31 UTC
Created attachment 1072052 [details]
bsod minidump

Comment 2 Xiaoqing Wei 2015-09-10 07:50:33 UTC
Created attachment 1072053 [details]
windbg output

Comment 3 Xiaoqing Wei 2015-09-10 07:51:24 UTC
1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CRITICAL_STRUCTURE_CORRUPTION (109)
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
 or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
 debugger that was not attached when the system was booted. Normal breakpoints,
 "bp", can only be set if the debugger is attached at boot time. Hardware
 breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
Arguments:
Arg1: a3a01f58a88e3638, Reserved
Arg2: b3b72bdefb0f076f, Reserved
Arg3: 00000001c0000103, Failure type dependent information
Arg4: 0000000000000007, Type of corrupted region, can be
	0   : A generic data region
	1   : Modification of a function or .pdata
	2   : A processor IDT
	3   : A processor GDT
	4   : Type 1 process list corruption
	5   : Type 2 process list corruption
	6   : Debug routine modification
	7   : Critical MSR modification
	8   : Object type
	9   : A processor IVT
	a   : Modification of a system service function
	b   : A generic session data region
	c   : Modification of a session function or .pdata
	d   : Modification of an import table
	e   : Modification of a session import table
	f   : Ps Win32 callout modification
	10  : Debug switch routine modification
	11  : IRP allocator modification
	12  : Driver call dispatcher modification
	13  : IRP completion dispatcher modification
	14  : IRP deallocator modification
	15  : A processor control register
	16  : Critical floating point control register modification
	17  : Local APIC modification
	18  : Kernel notification callout modification
	19  : Loaded module list modification
	1a  : Type 3 process list corruption
	1b  : Type 4 process list corruption
	1c  : Driver object corruption
	1d  : Executive callback object modification
	1e  : Modification of module padding
	1f  : Modification of a protected process
	20  : A generic data region
	21  : A page hash mismatch
	22  : A session page hash mismatch
	23  : Load config directory modification
	24  : Inverted function table modification
	25  : Session configuration modification
	102 : Modification of win32k.sys

Comment 5 Gerd Hoffmann 2015-09-11 09:57:03 UTC
> CRITICAL_STRUCTURE_CORRUPTION (109)

> Arg4: 0000000000000007, Type of corrupted region, can be

> 	7   : Critical MSR modification

Hmm, that doesn't look usb-storage related at all.
Probably formating usb-storage just creates some load
which increases the chance to hit this.

Cc'ing paolo.

Comment 7 Paolo Bonzini 2015-09-22 22:21:02 UTC
The first three arguments are "reserved", but they strongly look like old value, new value (or a hash of it) and MSR index.  In fact the third is definitely the MSR index and it is MSR_TSC_AUX.

Do you remember if the crash happened before migration finished, or afterwards?

Comment 8 Paolo Bonzini 2015-09-22 22:22:15 UTC
It looks like QEMU is not saving and restoring MSR_TSC_AUX.

Comment 9 Xiaoqing Wei 2015-09-23 03:15:38 UTC
(In reply to Paolo Bonzini from comment #7)
> The first three arguments are "reserved", but they strongly look like old
> value, new value (or a hash of it) and MSR index.  In fact the third is
> definitely the MSR index and it is MSR_TSC_AUX.
> 
> Do you remember if the crash happened before migration finished, or
> afterwards?

not sure :(
I was started the formatting in guest and type 'migrate -d ' in qemu monitor and leave a while, when I back, the cmd terminal in guest is gone, so I check if it has a dump and it's there.

then I tried to reproduce as C#1, but failed, with 4 attempts with exactly identical steps, on origin host. no luck .

Comment 28 errata-xmlrpc 2016-11-07 20:37:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html

Comment 29 Paolo Bonzini 2017-12-21 11:17:21 UTC
This was upstream commit c9b8f6b6210847b4381c5b2ee172b1c7eb9985d6.