Bug 1972079 - Windows Installation blocked on 4k disk when using blk+raw+iothread
Summary: Windows Installation blocked on 4k disk when using blk+raw+iothread
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: All
OS: All
medium
high
Target Milestone: beta
: ---
Assignee: Kevin Wolf
QA Contact: qing.wang
URL:
Whiteboard:
Depends On:
Blocks: 1972515 2002631
TreeView+ depends on / blocked
 
Reported: 2021-06-15 08:27 UTC by qing.wang
Modified: 2021-12-07 21:26 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-6.0.0-11.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1972515 2002631 (view as bug list)
Environment:
Last Closed: 2021-12-07 21:24:13 UTC
Type: ---
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src qemu-kvm merge_requests 31 0 None None None 2021-08-05 17:01:17 UTC

Description qing.wang 2021-06-15 08:27:08 UTC
Description of problem:

Install windows guest (example win10,win2016,win2019),
It will auto reboot the guest after most steps finished in the installation.

The windows guest will step in  black screen.
It can not finish installation with specific configuration:
raw image + virtio + iothread on 4K disk.
.



Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-common-6.0.0-5.el9.x86_64
edk2-ovmf-20200602gitca407c7246bf-2.el9.noarch
virtio-win-prewhql-0.1-201.iso


How reproducible:
100% on specific host (4k disk)

Disk /dev/mapper/rhel_dell--per440--10-home: 455.5 GiB, 489093595136 bytes, 119407616 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


other host can not reproduce this issue.

Steps to Reproduce:
1.create raw image file
qemu-img create -f raw /home/kvm_autotest_root/images/win10.raw 30g

2.boot vm with blk+raw+iothread
/usr/libexec/qemu-kvm \
    \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 12288 \
    -object memory-backend-ram,size=12288M,id=mem-machine_mem  \
    -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server-noTSX',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \
   \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=native,filename=/home/kvm_autotest_root/images/win10.raw,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=raw,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,serial=SYSTEM_DISK0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:a6:6b:c2:93:56,id=idd0M4NV,netdev=idtL9U8k,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idtL9U8k,vhost=on \
    -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=native,filename=/home/kvm_autotest_root/iso/ISO/Win10/en_windows_10_business_editions_version_21h1_x64_dvd_ec5a76c1.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
    -device ide-cd,id=cd1,drive=drive_cd1,bootindex=1,write-cache=on,bus=ide.0,unit=0 \
    -blockdev node-name=file_unattended,driver=file,auto-read-only=on,discard=unmap,aio=native,filename=/home/kvm_autotest_root/iso/windows/virtio-win-prewhql-0.1-201.iso,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_unattended,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_unattended \
    -device ide-cd,id=unattended,drive=drive_unattended,bootindex=3,write-cache=on,bus=ide.2,unit=0  \
    -vnc :5  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=d,strict=off \
    -enable-kvm -monitor stdio \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5


3. start to install

Actual results:
Windows guest step in blockscreen and no response

Expected results:
The Installation can succeed.

Additional info:
automation:
python ConfigTest.py --testcase=unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=Win10 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=raw --machines=q35  --customsparams="cd_format=ide\nimage_aio=native"


It may pass following combination.
blk+raw
blk+qcow2+iothread 
scsi+raw+iothread

It may pass if we put the raw file on non-4k disk, like  nfs 
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=native,filename=/home/nfs/win10.raw,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=raw,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \



So this issue look like related to blk+raw+iothread+4k disks ?

Comment 1 Klaus Heinrich Kiwi 2021-06-15 13:27:45 UTC
Kevin,

 assigning this for you for now. Care to take a look?

I see that the machine type is q35 - that edk2-based image should in theory support 4k disks, doesn't it?

Qing wang: do you have results for this exact same test in other versions involved (i.e., RHEL8)? Just for us to try to clarify if this is a regression or it simply never worked?

 -Klaus

Comment 2 qing.wang 2021-06-16 06:08:45 UTC
The 4K disks can be emulated via targetcli iscsi server:
example: Targetcli /backstores/fileio/disk set attribute block_size=4096

o- / ..................................................................... [...]
  o- backstores .......................................................... [...]
  | o- block .............................................. [Storage Objects: 0]
  | o- fileio ............................................. [Storage Objects: 1]
  | | o- disk1 ........... [/home/iscsi/onex.img (40.0GiB) write-back activated]
  | |   o- alua ............................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ................... [ALUA state: Active/optimized]
  | o- pscsi .............................................. [Storage Objects: 0]
  | o- ramdisk ............................................ [Storage Objects: 0]
  o- iscsi ........................................................ [Targets: 1]
  | o- iqn.2016-06.one.server:one-a .................................. [TPGs: 1]
  |   o- tpg1 ........................................... [no-gen-acls, no-auth]
  |     o- acls ...................................................... [ACLs: 2]
  |     | o- iqn.1994-05.com.redhat:clienta ................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ............................ [lun0 fileio/disk1 (rw)]
  |     | o- iqn.1994-05.com.redhat:clientb ................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ............................ [lun0 fileio/disk1 (rw)]
  |     o- luns ...................................................... [LUNs: 1]
  |     | o- lun0 ..... [fileio/disk1 (/home/iscsi/onex.img) (default_tg_pt_gp)]
  |     o- portals ................................................ [Portals: 1]
  |       o- 0.0.0.0:3260 ................................................. [OK]
  o- loopback ..................................................... [Targets: 0]

It looks like regression issue. Hit same on 

Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
4.18.0-310.el8.x86_64
qemu-kvm-common-6.0.0-18.module+el8.5.0+11243+5269aaa1.x86_64

But not found on
Red Hat Enterprise Linux release 8.4 (Ootpa)
4.18.0-305.el8.x86_64
qemu-kvm-common-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64

Comment 3 Kevin Wolf 2021-07-15 14:35:10 UTC
(In reply to qing.wang from comment #0)
> Disk /dev/mapper/rhel_dell--per440--10-home: 455.5 GiB, 489093595136 bytes,
> 119407616 sectors
> Units: sectors of 1 * 4096 = 4096 bytes
> Sector size (logical/physical): 4096 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes

This path doesn't appear in the command line at all, so I assume you're not directly using the block device for the VM. Is /home/kvm_autotest_root the mountpoint for this volume? If so, can you please tell me which filesystem it uses?

Is /home/kvm_autotest_root/iso/ on the same filesystem, i.e. do we know that it's win10.raw that causes the problem or could in theory any of the other images be part of the problem, too?

With some basic tests, 4k support seemed to work fine for me in RHEL-AV-8.5 and RHEL 9. I'll try to check next whether I can reproduce with a Windows image.

(In reply to Klaus Heinrich Kiwi from comment #1)
> I see that the machine type is q35 - that edk2-based image should in theory
> support 4k disks, doesn't it?

Maybe I'm missing something, but I didn't see edk2 in the command line, so I think this guest uses SeaBIOS. It's still the default with q35.

Either way, the 4k native disk is on the host. The guest devices use the default sector size in the command line options, which is 512 bytes, so the guest shouldn't see a difference. So at the first sight, the emulation of 512 byte sectors on top of 4k host disks could have a bug. Basic operation seems to work fine, but maybe something is wrong about the serialisation of concurrent requests to adjacent sectors (just taking a guess there, though).

Comment 4 qing.wang 2021-07-16 07:06:29 UTC
(In reply to Kevin Wolf from comment #3)
> (In reply to qing.wang from comment #0)
> > Disk /dev/mapper/rhel_dell--per440--10-home: 455.5 GiB, 489093595136 bytes,
> > 119407616 sectors
> > Units: sectors of 1 * 4096 = 4096 bytes
> > Sector size (logical/physical): 4096 bytes / 4096 bytes
> > I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> 
> This path doesn't appear in the command line at all, so I assume you're not
> directly using the block device for the VM. Is /home/kvm_autotest_root the
> mountpoint for this volume? If so, can you please tell me which filesystem
> it uses?

This host have 4k physical disk, the folder is created via os installation, the default FS is xfs.
The points is the image file locate on the 4k disk.


> Is /home/kvm_autotest_root/iso/ on the same filesystem, i.e. do we know that
> it's win10.raw that causes the problem or could in theory any of the other
> images be part of the problem, too?
> 

/home/kvm_autotest_root/iso/ is nfs. i think put the os iso local have same issue.

This issue is related to blk+raw+iothread+4k installation, if we remove one factor, the installation may succeed.



> With some basic tests, 4k support seemed to work fine for me in RHEL-AV-8.5
> and RHEL 9. I'll try to check next whether I can reproduce with a Windows
> image.
> 
> (In reply to Klaus Heinrich Kiwi from comment #1)
> > I see that the machine type is q35 - that edk2-based image should in theory
> > support 4k disks, doesn't it?
> 
> Maybe I'm missing something, but I didn't see edk2 in the command line, so I
> think this guest uses SeaBIOS. It's still the default with q35.
> 
Not involve edk2.

> Either way, the 4k native disk is on the host. The guest devices use the
> default sector size in the command line options, which is 512 bytes, so the
> guest shouldn't see a difference. So at the first sight, the emulation of
> 512 byte sectors on top of 4k host disks could have a bug. Basic operation
> seems to work fine, but maybe something is wrong about the serialisation of
> concurrent requests to adjacent sectors (just taking a guess there, though).

So far it only happened in installation, it may  basically boot succeed  if we still using the image.

Comment 5 Kevin Wolf 2021-07-26 16:51:08 UTC
I can reproduce a hang, but it's not just Windows that is hanging, but the whole QEMU process. Can you confirm that the monitor isn't responsive any more for you?

It hangs while trying to drain the disk image:

(gdb) bt
#0  0x00007f70645b14fe in ppoll () at /lib64/libc.so.6
#1  0x00005604b796d11f in qemu_poll_ns (fds=0x7f6fc07d47f0, nfds=2, timeout=-1) at ../util/qemu-timer.c:336
#2  0x00005604b7996983 in fdmon_poll_wait (ctx=0x5604b9bb84d0, ready_list=0x7f7055ce9cf0, timeout=-1) at ../util/fdmon-poll.c:80
#3  0x00005604b7983979 in aio_poll (ctx=0x5604b9bb84d0, blocking=true) at ../util/aio-posix.c:607
#4  0x00005604b77d6e79 in bdrv_do_drained_begin (bs=0x5604b9e69820, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at ../block/io.c:473
#5  0x00005604b77d6ee5 in bdrv_drained_begin (bs=0x5604b9e69820) at ../block/io.c:479
#6  0x00005604b7894da2 in bdrv_set_aio_context_ignore (bs=0x5604b9e69820, new_context=0x5604b9bb84d0, ignore=0x7f7055ce9ee0) at ../block.c:6859
#7  0x00005604b7894ebd in bdrv_set_aio_context_ignore (bs=0x5604b9e6fb00, new_context=0x5604b9bb84d0, ignore=0x7f7055ce9ee0) at ../block.c:6881
#8  0x00005604b7895336 in bdrv_child_try_set_aio_context (bs=0x5604b9e6fb00, ctx=0x5604b9bb84d0, ignore_child=0x5604bb021c50, errp=0x0) at ../block.c:6998
#9  0x00005604b780110b in blk_do_set_aio_context (blk=0x5604bb0f4aa0, new_context=0x5604b9bb84d0, update_root_node=true, errp=0x0) at ../block/block-backend.c:2066
#10 0x00005604b78011a0 in blk_set_aio_context (blk=0x5604bb0f4aa0, new_context=0x5604b9bb84d0, errp=0x0) at ../block/block-backend.c:2087
#11 0x00005604b76f2b43 in virtio_blk_data_plane_stop (vdev=0x5604bb0f1a20) at ../hw/block/dataplane/virtio-blk.c:337
#12 0x00005604b737366a in virtio_bus_stop_ioeventfd (bus=0x5604bb0f1998) at ../hw/virtio/virtio-bus.c:250
#13 0x00005604b7303852 in virtio_pci_stop_ioeventfd (proxy=0x5604bb0e97a0) at ../hw/virtio/virtio-pci.c:295
#14 0x00005604b73042ff in virtio_write_config (pci_dev=0x5604bb0e97a0, address=4, val=1280, len=2) at ../hw/virtio/virtio-pci.c:628
#15 0x00005604b724767e in pci_host_config_write_common (pci_dev=0x5604bb0e97a0, addr=4, limit=4096, val=1280, len=2) at ../hw/pci/pci_host.c:83

The block node to be drained has a relatively large in_flight counter, but no tracked requests at all:

(gdb) p bs.in_flight
$5 = 24
(gdb) p bs.tracked_requests 
$7 = {lh_first = 0x0}

Maybe we forget to decrease the in_flight counter somewhere in the 512-on-4k emulation code. Otherwise, I'll have to find out what operation outside of a tracked request is still in flight.

Comment 6 qing.wang 2021-07-27 05:49:14 UTC
Yes, qemu and guest both hanging. 
I also reproduce it on
Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
4.18.0-315.el8.x86_64
qemu-kvm-common-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64




Switching to thread 10 (Thread 0x7f5c1e5fc700 (LWP 132865))]
#0  0x00007f5f337d5b36 in ppoll () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f5f337d5b36 in ppoll () from /lib64/libc.so.6
#1  0x000055d5083ec4e9 in ppoll (__ss=0x0, __timeout=0x0, 
    __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
    timeout=timeout@entry=-1) at ../util/qemu-timer.c:336
#3  0x000055d50840e859 in fdmon_poll_wait (ctx=0x55d50ad16290, 
    ready_list=0x7f5c1e5fb220, timeout=-1) at ../util/fdmon-poll.c:80
#4  0x000055d5083fcd51 in aio_poll (ctx=0x55d50ad16290, 
    blocking=blocking@entry=true) at ../util/aio-posix.c:607
#5  0x000055d50831ad43 in bdrv_do_drained_begin (poll=<optimized out>, 
    ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55d50aea2930)
    at ../block/io.c:443
#6  bdrv_do_drained_begin (bs=0x55d50aea2930, recursive=<optimized out>, 
    parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>)
    at ../block/io.c:409
#7  0x000055d5083103f3 in bdrv_set_aio_context_ignore (bs=0x55d50aea2930, 
    new_context=new_context@entry=0x55d50ad16290, 
    ignore=ignore@entry=0x7f5c1e5fb350) at ../block.c:6520
#8  0x000055d5083104cb in bdrv_set_aio_context_ignore (
    bs=bs@entry=0x55d50aea9000, new_context=new_context@entry=0x55d50ad16290, 
    ignore=ignore@entry=0x7f5c1e5fb350) at ../block.c:6542
#9  0x000055d508310ab3 in bdrv_child_try_set_aio_context (
--Type <RET> for more, q to quit, c to continue without paging--
    bs=bs@entry=0x55d50aea9000, ctx=0x55d50ad16290, 
    ignore_child=<optimized out>, errp=<optimized out>) at ../block.c:6659
#10 0x000055d5083359cb in blk_do_set_aio_context (blk=0x55d50bf2fc60, 
    new_context=0x55d50ad16290, update_root_node=update_root_node@entry=true, 
    errp=errp@entry=0x0) at ../block/block-backend.c:2052
#11 0x000055d5083381a1 in blk_set_aio_context (blk=<optimized out>, 
    new_context=<optimized out>, errp=errp@entry=0x0)
    at ../block/block-backend.c:2073
#12 0x000055d508255331 in virtio_blk_data_plane_stop (vdev=<optimized out>)
    at ../hw/block/dataplane/virtio-blk.c:329
#13 0x000055d5080ca2d0 in virtio_bus_stop_ioeventfd (
    bus=bus@entry=0x55d50bf2d5f8) at ../hw/virtio/virtio-bus.c:250
#14 0x000055d5080caa5f in virtio_bus_stop_ioeventfd (
    bus=bus@entry=0x55d50bf2d5f8) at ../hw/virtio/virtio-bus.c:242
#15 0x000055d5081311f2 in virtio_pci_stop_ioeventfd (proxy=0x55d50bf254e0)
    at ../hw/virtio/virtio-pci.c:617
#16 virtio_write_config (pci_dev=0x55d50bf254e0, address=4, val=1280, len=2)
    at ../hw/virtio/virtio-pci.c:617
#17 0x000055d5082cb108 in memory_region_write_accessor (mr=<optimized out>, 
    addr=<optimized out>, value=<optimized out>, size=<optimized out>, 
    shift=<optimized out>, mask=<optimized out>, attrs=...)
    at ../softmmu/memory.c:491
#18 0x000055d5082c9cfe in access_with_adjusted_size (addr=addr@entry=4194308,

Comment 7 Kevin Wolf 2021-08-02 10:34:14 UTC
Sent a patch to fix this upstream:
https://lists.gnu.org/archive/html/qemu-block/2021-07/msg00786.html

Comment 9 Yanan Fu 2021-08-10 01:30:16 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 12 qing.wang 2021-08-11 02:05:30 UTC
Passed test on
Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.14.0-0.rc4.35.el9.x86_64
qemu-kvm-common-6.0.0-11.el9.x86_64
edk2-ovmf-20210527gite1999b264f1f-3.el9.noarch
virtio-win-prewhql-0.1-203.iso


Note You need to log in before you can comment on or make changes to this bug.