Bug 1794609

Summary: [virtio-win][vioser] Hit BSOD when transfer data between host and guest via virtio-serial port in win2012-64 guest
Product: Red Hat Enterprise Linux 8 Reporter: Peixiu Hou <phou>
Component: virtio-winAssignee: ybendito
virtio-win sub component: virtio-win-prewhql QA Contact: xiagao
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, lijin, lmiksik, vrozenfe, xiagao, ybendito
Version: 8.2Keywords: Regression
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:05:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
BSOD picture
none
BSOD picture-2
none
host run script serial_host_send_receive.py
none
VirtioChannel_guest_send_reveive.py
none
Batch to run ETW logger
none
REG file to start serial driver log automatically
none
Fix candidate 1
none
REG file to start serial driver log automatically
none
REG file to start serial driver log automatically none

Description Peixiu Hou 2020-01-24 01:48:08 UTC
Created attachment 1654936 [details]
BSOD picture

Description of problem:
Hit BSOD when transfer data between host and guest via virtio-serial port in win2012-64 guest. 

Tested 3 jobs with automation, all hit BSOD when do data transfet through virtio-serail.
    1)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.q35/
    2)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.boot_with_multiple_virtserialports.1_N.q35/
    3)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.max_ports.q35/

BSOD pictures as attachment, job 1) report "SYSTEM_THREAD_EXCEPTION_NOT_HANDLED(Wdf01000.sys)", job 2) report "PAGE_FAULT_IN_NONPAGED_AREA(Wdf01000.sys)".

Only hit this issue on win2012-64 guest, not hit this issue on win2012-r2, win8, win10, win8.1, win2016 and win2019.

Version-Release number of selected component (if applicable):
kernel-4.18.0-171.el8.x86_64
qemu-kvm-4.2.0-7.module+el8.2.0+5520+4e5817f3.x86_64
seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
virtio-win-prewhql-178

How reproducible:
Not 100%, about 30%.

Steps to Reproduce:
1. Boot a win2012 guest up with virtio-serial-pci.
===============================================================================
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 14336  \
    -smp 24,maxcpus=24,cores=12,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_ng28eq6t/monitor-qmpmonitor1-20200123-195806-lh4Cw0Mi,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_ng28eq6t/monitor-catch_monitor-20200123-195806-lh4Cw0Mi,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idJY554y \
    -chardev socket,id=chardev_serial0,nowait,server,path=/var/tmp/avocado_ng28eq6t/serial-serial0-20200123-195806-lh4Cw0Mi \
    -device isa-serial,id=serial0,chardev=chardev_serial0 \
    -chardev socket,id=chardev_vs1,nowait,server,path=/var/tmp/avocado_ng28eq6t/serial-vs1-20200123-195806-lh4Cw0Mi \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pcie.0-root-port-2,addr=0x0 \
    -device virtserialport,id=vs1,name=vs1,chardev=chardev_vs1,bus=virtio_serial_pci0.0,nr=1  \
    -chardev socket,id=seabioslog_id_20200123-195806-lh4Cw0Mi,path=/var/tmp/avocado_ng28eq6t/seabios-20200123-195806-lh4Cw0Mi,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200123-195806-lh4Cw0Mi,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/win2012-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie.0-root-port-4,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:94:16:d5:78:28,id=idif9SCd,netdev=idQP8R1B,bus=pcie.0-root-port-5,addr=0x0  \
    -netdev tap,id=idQP8R1B,vhost=on,vhostfd=21,fd=14 \
    -blockdev node-name=file_cd1,driver=file,read-only=on,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=1,write-cache=on,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x6,bus=pcie.0
===============================================================================

2. Install the vioser driver for virtio-serial-pci device.

3. Copy test scripts to guest.
Copy file from /usr/share/avocado-plugins-vt/shared/deps/serial/VirtIoChannel_guest_send_receive.py to 10.73.73.202:C:\
Copy file from /usr/share/avocado-plugins-vt/shared/deps/serial/windows_support.py to 10.73.73.202:C:\

4. Creating 10MB file on host.
19:58:30 INFO | Running 'dd if=/dev/zero of=/var/tmp/avocado_ng28eq6t/tmp-bAd52vDY bs=1M count=10'
19:58:30 DEBUG| [stderr] 10+0 records in
19:58:30 INFO | Command 'dd if=/dev/zero of=/var/tmp/avocado_ng28eq6t/tmp-bAd52vDY bs=1M count=10' finished with 0 after 0.005354881286621094s
19:58:30 DEBUG| [stderr] 10+0 records out
19:58:30 DEBUG| [stderr] 10485760 bytes (10 MB, 10 MiB) copied, 0.00476057 s, 2.2 GB/s

5. Creating 10Mb file on guest.
19:58:30 INFO | Context: (run) --> (transfer_data) --> Creating 10MB file on guest
19:58:30 DEBUG| Sending command: dd if=/dev/zero of=%TEMP%/tmp-YlI84tAE bs=1M count=10
19:58:31 DEBUG| Sending command: echo %errorlevel%

6. Send command in host:
19:58:31 INFO | Context: (run) --> (transfer_data) --> Send host command: `command -v python python3 | head -1` /usr/share/avocado-plugins-vt/shared/deps/serial/serial_host_send_receive.py -t unix_socket -s /var/tmp/avocado_ng28eq6t/serial-vs1-20200123-195806-lh4Cw0Mi -f /var/tmp/avocado_ng28eq6t/tmp-bAd52vDY -a both

7. Send command in guest:
19:58:34 DEBUG| Sending command: python2.7 C:\/VirtIoChannel_guest_send_receive.py -d vs1 -f %TEMP%/tmp-YlI84tAE -a both

8. check the guest status.

Actual results:
BSOD

Expected results:
data transfer success.

Additional info:
1. Tried download the vioser driver to last release version virtio-win-prewhql-154, tried test 30 times, not hit this issue.

Comment 1 Peixiu Hou 2020-01-24 01:49:18 UTC
Created attachment 1654937 [details]
BSOD picture-2

Comment 2 Peixiu Hou 2020-01-24 01:50:24 UTC
Created attachment 1654938 [details]
host run script serial_host_send_receive.py

Comment 3 Peixiu Hou 2020-01-24 01:51:02 UTC
Created attachment 1654939 [details]
VirtioChannel_guest_send_reveive.py

Comment 5 xiagao 2020-02-03 06:45:56 UTC
didn't hit this issue with virtio-win-1.9.10-3.el8, so it's regression.

Comment 6 ybendito 2020-02-04 12:52:37 UTC
Created attachment 1657542 [details]
Batch to run ETW logger

Comment 8 xiagao 2020-02-05 06:14:58 UTC
This issue is found when run automation test, couldn't reproduce by manual.
And I checked the auto log again and found BSOD occurred during shutting down guest not data transferring.

the detailed steps:
1. Creating 10MB file on host
dd if=/dev/zero of=/var/tmp/avocado_bnkcjtzv/tmp-WZv6SM2v bs=1M count=10
2. Creating 10MB file on guest
dd if=/dev/zero of=%TEMP%/tmp-Twa1lzwM bs=1M count=10
3. run script in host.
`command -v python python3 | head -1` /usr/share/avocado-plugins-vt/shared/deps/serial/serial_host_send_receive.py -t unix_socket -s /var/tmp/avocado_bnkcjtzv/serial-channel1-20200112-221011-uY8xn3yq -f /var/tmp/avocado_bnkcjtzv/tmp-WZv6SM2v -a both
4. run script in guest.
python2.7 C:\/VirtIoChannel_guest_send_receive.py -d vs1 -f %TEMP%/tmp-Twa1lzwM -a both
error occurred
2020-01-12 22:11:17: Traceback (most recent call last):
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 255, in <module>
2020-01-12 22:11:17:     md5_ori = receive(device, filename, p_size=p_size)
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 175, in receive
2020-01-12 22:11:17:     size = vio.shake_hand(action="receive")
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 105, in shake_hand
2020-01-12 22:11:17:     self.send(txt)
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 78, in send
2020-01-12 22:11:17:     self._vport.write(message)
2020-01-12 22:11:17:   File "C:\windows_support.py", line 39, in write
2020-01-12 22:11:17:     win32file.GetOverlappedResult(self._hfile, self._write_ovrlpd, True)
2020-01-12 22:11:17: pywintypes.error: (995, 'GetOverlappedResult', 'The I/O operation has been aborted because of either a thread exit or an application request.')

5.shutdown guest
22:11:27 DEBUG| Destroying VM avocado-vt-vm1 (PID 33075)

after step 5,BSOD occurred.

Comment 9 ybendito 2020-02-05 07:19:34 UTC
Created attachment 1657759 [details]
REG file to start serial driver log automatically

Comment 10 ybendito 2020-02-05 07:20:15 UTC
Created attachment 1657760 [details]
Fix candidate 1

Comment 18 ybendito 2020-02-10 11:05:08 UTC
Created attachment 1662117 [details]
REG file to start serial driver log automatically

Comment 19 ybendito 2020-02-10 11:07:04 UTC
Please use updated REG file (comment #18) for automatic logging. With previous one the log works, but stops too early, before crash time.

Comment 21 ybendito 2020-02-10 17:01:00 UTC
Created attachment 1662206 [details]
REG file to start serial driver log automatically

Comment 22 ybendito 2020-02-10 17:04:16 UTC
Please use another updated REG file (comment #21) for automatic logging. With previous one I can see the logs, but the internal buffer is not so big, so the point of interest is lost.

Comment 26 Vadim Rozenfeld 2020-02-12 00:21:21 UTC
Please give a try to new build b179
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1083820

Comment 27 xiagao 2020-02-12 03:26:37 UTC
Test pass with 179, so change bug status to verify.

Comment 33 errata-xmlrpc 2020-04-28 16:05:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1757

Comment 34 ybendito 2020-05-01 03:17:14 UTC
In my opinion, no special documentation required as the build with the bug was never released.
The problem exists only in intermediate build