Bug 1794609 - [virtio-win][vioser] Hit BSOD when transfer data between host and guest via virtio-serial port in win2012-64 guest
Summary: [virtio-win][vioser] Hit BSOD when transfer data between host and guest via v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: virtio-win
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: ybendito
QA Contact: xiagao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-24 01:48 UTC by Peixiu Hou
Modified: 2020-05-01 03:17 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:05:16 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
BSOD picture (22.31 KB, image/jpeg)
2020-01-24 01:48 UTC, Peixiu Hou
no flags Details
BSOD picture-2 (22.07 KB, image/jpeg)
2020-01-24 01:49 UTC, Peixiu Hou
no flags Details
host run script serial_host_send_receive.py (9.18 KB, text/x-python)
2020-01-24 01:50 UTC, Peixiu Hou
no flags Details
VirtioChannel_guest_send_reveive.py (8.03 KB, text/x-python)
2020-01-24 01:51 UTC, Peixiu Hou
no flags Details
Batch to run ETW logger (1.21 KB, text/plain)
2020-02-04 12:52 UTC, ybendito
no flags Details
REG file to start serial driver log automatically (425 bytes, text/x-ms-regedit)
2020-02-05 07:19 UTC, ybendito
no flags Details
Fix candidate 1 (2.08 MB, application/zip)
2020-02-05 07:20 UTC, ybendito
no flags Details
REG file to start serial driver log automatically (1.09 KB, text/x-ms-regedit)
2020-02-10 11:05 UTC, ybendito
no flags Details
REG file to start serial driver log automatically (1.15 KB, text/x-ms-regedit)
2020-02-10 17:01 UTC, ybendito
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:1757 None None None 2020-04-28 16:05:30 UTC

Description Peixiu Hou 2020-01-24 01:48:08 UTC
Created attachment 1654936 [details]
BSOD picture

Description of problem:
Hit BSOD when transfer data between host and guest via virtio-serial port in win2012-64 guest. 

Tested 3 jobs with automation, all hit BSOD when do data transfet through virtio-serail.
    1)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.q35/
    2)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.boot_with_multiple_virtserialports.1_N.q35/
    3)-Host_RHEL.m8.u2.product_av.qcow2.virtio_blk.up.virtio_net.Guest.Win2012.x86_64.io-github-autotest-qemu.virtio_serial_file_transfer.unix_socket.max_ports.q35/

BSOD pictures as attachment, job 1) report "SYSTEM_THREAD_EXCEPTION_NOT_HANDLED(Wdf01000.sys)", job 2) report "PAGE_FAULT_IN_NONPAGED_AREA(Wdf01000.sys)".

Only hit this issue on win2012-64 guest, not hit this issue on win2012-r2, win8, win10, win8.1, win2016 and win2019.

Version-Release number of selected component (if applicable):
kernel-4.18.0-171.el8.x86_64
qemu-kvm-4.2.0-7.module+el8.2.0+5520+4e5817f3.x86_64
seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
virtio-win-prewhql-178

How reproducible:
Not 100%, about 30%.

Steps to Reproduce:
1. Boot a win2012 guest up with virtio-serial-pci.
===============================================================================
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 14336  \
    -smp 24,maxcpus=24,cores=12,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_ng28eq6t/monitor-qmpmonitor1-20200123-195806-lh4Cw0Mi,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_ng28eq6t/monitor-catch_monitor-20200123-195806-lh4Cw0Mi,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idJY554y \
    -chardev socket,id=chardev_serial0,nowait,server,path=/var/tmp/avocado_ng28eq6t/serial-serial0-20200123-195806-lh4Cw0Mi \
    -device isa-serial,id=serial0,chardev=chardev_serial0 \
    -chardev socket,id=chardev_vs1,nowait,server,path=/var/tmp/avocado_ng28eq6t/serial-vs1-20200123-195806-lh4Cw0Mi \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pcie.0-root-port-2,addr=0x0 \
    -device virtserialport,id=vs1,name=vs1,chardev=chardev_vs1,bus=virtio_serial_pci0.0,nr=1  \
    -chardev socket,id=seabioslog_id_20200123-195806-lh4Cw0Mi,path=/var/tmp/avocado_ng28eq6t/seabios-20200123-195806-lh4Cw0Mi,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200123-195806-lh4Cw0Mi,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/win2012-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie.0-root-port-4,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:94:16:d5:78:28,id=idif9SCd,netdev=idQP8R1B,bus=pcie.0-root-port-5,addr=0x0  \
    -netdev tap,id=idQP8R1B,vhost=on,vhostfd=21,fd=14 \
    -blockdev node-name=file_cd1,driver=file,read-only=on,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=1,write-cache=on,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x6,bus=pcie.0
===============================================================================

2. Install the vioser driver for virtio-serial-pci device.

3. Copy test scripts to guest.
Copy file from /usr/share/avocado-plugins-vt/shared/deps/serial/VirtIoChannel_guest_send_receive.py to 10.73.73.202:C:\
Copy file from /usr/share/avocado-plugins-vt/shared/deps/serial/windows_support.py to 10.73.73.202:C:\

4. Creating 10MB file on host.
19:58:30 INFO | Running 'dd if=/dev/zero of=/var/tmp/avocado_ng28eq6t/tmp-bAd52vDY bs=1M count=10'
19:58:30 DEBUG| [stderr] 10+0 records in
19:58:30 INFO | Command 'dd if=/dev/zero of=/var/tmp/avocado_ng28eq6t/tmp-bAd52vDY bs=1M count=10' finished with 0 after 0.005354881286621094s
19:58:30 DEBUG| [stderr] 10+0 records out
19:58:30 DEBUG| [stderr] 10485760 bytes (10 MB, 10 MiB) copied, 0.00476057 s, 2.2 GB/s

5. Creating 10Mb file on guest.
19:58:30 INFO | Context: (run) --> (transfer_data) --> Creating 10MB file on guest
19:58:30 DEBUG| Sending command: dd if=/dev/zero of=%TEMP%/tmp-YlI84tAE bs=1M count=10
19:58:31 DEBUG| Sending command: echo %errorlevel%

6. Send command in host:
19:58:31 INFO | Context: (run) --> (transfer_data) --> Send host command: `command -v python python3 | head -1` /usr/share/avocado-plugins-vt/shared/deps/serial/serial_host_send_receive.py -t unix_socket -s /var/tmp/avocado_ng28eq6t/serial-vs1-20200123-195806-lh4Cw0Mi -f /var/tmp/avocado_ng28eq6t/tmp-bAd52vDY -a both

7. Send command in guest:
19:58:34 DEBUG| Sending command: python2.7 C:\/VirtIoChannel_guest_send_receive.py -d vs1 -f %TEMP%/tmp-YlI84tAE -a both

8. check the guest status.

Actual results:
BSOD

Expected results:
data transfer success.

Additional info:
1. Tried download the vioser driver to last release version virtio-win-prewhql-154, tried test 30 times, not hit this issue.

Comment 1 Peixiu Hou 2020-01-24 01:49:18 UTC
Created attachment 1654937 [details]
BSOD picture-2

Comment 2 Peixiu Hou 2020-01-24 01:50:24 UTC
Created attachment 1654938 [details]
host run script serial_host_send_receive.py

Comment 3 Peixiu Hou 2020-01-24 01:51:02 UTC
Created attachment 1654939 [details]
VirtioChannel_guest_send_reveive.py

Comment 5 xiagao 2020-02-03 06:45:56 UTC
didn't hit this issue with virtio-win-1.9.10-3.el8, so it's regression.

Comment 6 ybendito 2020-02-04 12:52:37 UTC
Created attachment 1657542 [details]
Batch to run ETW logger

Comment 8 xiagao 2020-02-05 06:14:58 UTC
This issue is found when run automation test, couldn't reproduce by manual.
And I checked the auto log again and found BSOD occurred during shutting down guest not data transferring.

the detailed steps:
1. Creating 10MB file on host
dd if=/dev/zero of=/var/tmp/avocado_bnkcjtzv/tmp-WZv6SM2v bs=1M count=10
2. Creating 10MB file on guest
dd if=/dev/zero of=%TEMP%/tmp-Twa1lzwM bs=1M count=10
3. run script in host.
`command -v python python3 | head -1` /usr/share/avocado-plugins-vt/shared/deps/serial/serial_host_send_receive.py -t unix_socket -s /var/tmp/avocado_bnkcjtzv/serial-channel1-20200112-221011-uY8xn3yq -f /var/tmp/avocado_bnkcjtzv/tmp-WZv6SM2v -a both
4. run script in guest.
python2.7 C:\/VirtIoChannel_guest_send_receive.py -d vs1 -f %TEMP%/tmp-Twa1lzwM -a both
error occurred
2020-01-12 22:11:17: Traceback (most recent call last):
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 255, in <module>
2020-01-12 22:11:17:     md5_ori = receive(device, filename, p_size=p_size)
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 175, in receive
2020-01-12 22:11:17:     size = vio.shake_hand(action="receive")
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 105, in shake_hand
2020-01-12 22:11:17:     self.send(txt)
2020-01-12 22:11:17:   File "C:\/VirtIoChannel_guest_send_receive.py", line 78, in send
2020-01-12 22:11:17:     self._vport.write(message)
2020-01-12 22:11:17:   File "C:\windows_support.py", line 39, in write
2020-01-12 22:11:17:     win32file.GetOverlappedResult(self._hfile, self._write_ovrlpd, True)
2020-01-12 22:11:17: pywintypes.error: (995, 'GetOverlappedResult', 'The I/O operation has been aborted because of either a thread exit or an application request.')

5.shutdown guest
22:11:27 DEBUG| Destroying VM avocado-vt-vm1 (PID 33075)

after step 5,BSOD occurred.

Comment 9 ybendito 2020-02-05 07:19:34 UTC
Created attachment 1657759 [details]
REG file to start serial driver log automatically

Comment 10 ybendito 2020-02-05 07:20:15 UTC
Created attachment 1657760 [details]
Fix candidate 1

Comment 18 ybendito 2020-02-10 11:05:08 UTC
Created attachment 1662117 [details]
REG file to start serial driver log automatically

Comment 19 ybendito 2020-02-10 11:07:04 UTC
Please use updated REG file (comment #18) for automatic logging. With previous one the log works, but stops too early, before crash time.

Comment 21 ybendito 2020-02-10 17:01:00 UTC
Created attachment 1662206 [details]
REG file to start serial driver log automatically

Comment 22 ybendito 2020-02-10 17:04:16 UTC
Please use another updated REG file (comment #21) for automatic logging. With previous one I can see the logs, but the internal buffer is not so big, so the point of interest is lost.

Comment 26 Vadim Rozenfeld 2020-02-12 00:21:21 UTC
Please give a try to new build b179
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1083820

Comment 27 xiagao 2020-02-12 03:26:37 UTC
Test pass with 179, so change bug status to verify.

Comment 33 errata-xmlrpc 2020-04-28 16:05:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1757

Comment 34 ybendito 2020-05-01 03:17:14 UTC
In my opinion, no special documentation required as the build with the bug was never released.
The problem exists only in intermediate build


Note You need to log in before you can comment on or make changes to this bug.