Bug 2020998

Summary:	[virtio-win] Windows 10 "Optimize drive"/Trim/Discard causes all data to be rewritten
Product:	Red Hat Enterprise Linux 9	Reporter:	Peixiu Hou <phou>
Component:	virtio-win	Assignee:	Vadim Rozenfeld <vrozenfe>
virtio-win sub component:	virtio-win-prewhql	QA Contact:	Peixiu Hou <phou>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	ailan, fdeutsch, gveitmic, mdean, menli, qizhu, vrozenfe, ymankad
Version:	9.0	Keywords:	Triaged, ZStream
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Windows
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2145213 (view as bug list)		Environment:
Last Closed:	2023-05-09 07:55:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2145213, 2154127
Deadline:	2023-02-13

Description Peixiu Hou 2021-11-08 03:41:37 UTC

Description of problem:
This prolem is from upstream issue: https://github.com/virtio-win/kvm-guest-drivers-windows/issues/666

Running the trim command causes a high cpu load, runs for an absurd time (bare metal installation runs for ~5 sec vs 10-15 min for the vioscsi version) and it causes all the data to be rewritten again instead of just trimming the drive (can be seen in iotop, iostat and in S.M.A.R.T. Lifetime writes) which is less than ideal for a ssd.
Linux guests do not suffer from this problem.

Passing the device.scsi0-0-0-0.rotation_rate=1 aka. SSD emulation argument to QEMU doesn't have an impact.

A similar problem can be seen here https://forum.level1techs.com/t/win10-optimize-drives-makes-underlying-sparse-storage-grow-not-shrink/172803

Version-Release number of selected component (if applicable):
kernel-5.14.0-9.el9.x86_64
qemu-kvm-6.1.0-5.el9.x86_64
seabios-bin-1.14.0-7.el9.noarch
virtio-win-prewhlq-208/214

How reproducible:
100%

Steps to Reproduce:
1) Create images for system disk and data disk: 
qemu-img create -f qcow2 /home/kvm_autotest_root/images/win10-64-21h1-virtio-scsi.qcow2 30G
qemu-img create -f qcow2 /home/kvm_autotest_root/images/storage.qcow2 20G

2) Boot a vm up with discard=unmap, and install os with Windows 10 Enterprise:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm3' \
    -machine q35 \
    -nodefaults \
    -vga std  \
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x3 \
    -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x3.0x1 \
    -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x3.0x2 \
    -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x3.0x3 \
    -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x3.0x4 \
    -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x3.0x5 \
    -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x3.0x6 \
    -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x3.0x7 \
    -device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 \
    -blockdev driver=file,filename=/home/kvm_autotest_root/images/win10-64-21h1-virtio-scsi.qcow2,node-name=libvirt-3-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap \
    -blockdev node-name=libvirt-3-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-3-storage \
    -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-3-format,id=scsi0-0-0-0,bootindex=0,write-cache=on \
    -device virtio-scsi-pci,id=scsi1,bus=pci.4,addr=0x0 \
    -blockdev driver=file,filename=/home/kvm_autotest_root/images/storage.qcow2,node-name=libvirt-1-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap \
    -blockdev node-name=libvirt-1-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-1-storage \
    -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-1,drive=libvirt-1-format,id=scsi0-0-0-1,write-cache=on \
    -device virtio-net-pci,mac=9a:36:83:b6:3d:05,id=idJVpmsF,netdev=id23ZUK6,bus=pci.3  \
    -netdev tap,id=id23ZUK6,vhost=on \
    -m 8192  \
    -smp 2,maxcpus=4 \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -cdrom /home/kvm_autotest_root/iso/windows/winutils.iso \
    -device piix3-usb-uhci,id=usb -device usb-tablet,id=input0 \
    -vnc :11  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -qmp tcp:0:1232,server,nowait \
    -monitor stdio \

3) Guest: format the new volume with "Quick Format" disabled

4) Guest: re-trimmed the volume via cmd
(Administrator) defrag e: /u /v /h /o

5) Check the cpu usage in task manager, and record the retrim complete time  --> cpu usage will up to 50%, and trim complete will need long time, about 10-15 mins.



Actual results:
trim complete in 10-15 mins

Expected results:
trim complete in 2 mins

Additional info:
1. Tested with vioscsi and viostor, both reproduced this issue.
2. Tested with a Windows 10 Enterprise N image, cannot reproduce this issue.
3. Tested with win8.1-64 image, cannot reproduce this issue.

Comment 1 Qianqian Zhu 2022-06-27 09:44:21 UTC

Hi Vadim,

The DTM is set to 16, does this mean the fix has already been included in recent release? Would you please help to move it to ON_QA if so? Thanks.

Regards,
Qianqian

Comment 2 Vadim Rozenfeld 2022-06-27 11:05:26 UTC

(In reply to Qianqian Zhu from comment #1)
> Hi Vadim,
> 
> The DTM is set to 16, does this mean the fix has already been included in
> recent release? Would you please help to move it to ON_QA if so? Thanks.
> 
> Regards,
> Qianqian

Hi Qianqian,

Moved it to 20. 
Honestly, it is still not clear if the problem can be solved by fixing the
drivers or some extra qemu fixes will be required.

Best,
Vadim.

Comment 5 menli@redhat.com 2022-08-23 09:31:11 UTC

hit this issue on win10-32(pc) with viostor.

Packages:
kernel-5.14.0-145.el9.x86_64
qemu-kvm-7.0.0-9.el9.x86_64
seabios-bin-1.16.0-4.el9.noarch
RHEL-9.1.0-20220814.1
virtio-win-prewhql-224

cdrom_cd1 = isos/ISO/Win10/en-us_windows_10_business_editions_version_21h2_updated_april_2022_x86_dvd_691b7024.iso
auto case: trim_support_test
http://virtqetools.lab.eng.pek2.redhat.com/autotest_static_job_log/6924299/test-results/19-Host_RHEL.m9.u1.qcow2.virtio_blk.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.trim_support_test/

Thanks
Menghuan

Comment 6 Vadim Rozenfeld 2022-09-07 10:26:29 UTC

I've just pushed the viostor related fix https://github.com/virtio-win/kvm-guest-drivers-windows/pull/824
vioscsi doesn't need any changes.

In both case (viostor and vioscsi) setting discard_granularity to 16/32M (Hyper-V uses 32M) makes Windows to work with large slabs (clusters) which reduce the defragmentation time significantly

Below please see "defrag.exe e: /u /v /h /o" command execution time for 10G volume on Win10 21H2 system

discard_granularity 4K 32K 256K 2K 16M 32M
Optimal unmap granularity 8 64 512 4096 32768 65536
virtio-blk defrag time in sec 615.61 78.77 15.48 4.29 1.43 1.22
virtio-scsi defrag time in sec 575.77 149 15.50 3.25 1.44 1.72

qemu command line

for virtio-blk
-drive file=$DSK0,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,discard=unmap,aio=native
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=-1,serial=xru001i,discard_granularity=32M \

and virtio-scsi
-drive file=$DSK0,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=none,aio=native,id=drive-vioscsi0
-device virtio-scsi-pci,id=scsi-vioscsi0
-device scsi-hd,drive=drive-vioscsi0,id=vioscsi0,bus=scsi-vioscsi0.0,lun=0,scsi-id=0,bootindex=-1,discard_granularity=32M \

Comment 7 Vadim Rozenfeld 2022-09-22 07:45:32 UTC

the viostor related fix was included in build 226 https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2176313

Comment 8 menli@redhat.com 2022-10-19 06:34:01 UTC

When I run win10 guest viostor test loop with 227 build, seems a new issue was detected related to this change.
Feel free to update me if I am wrong.

1) boot a win10 guest with 10g data disk.
    -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \
    -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \
    -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \


2) Guest: format the new volume with "Quick Format" disabled

3) Guest: re-trimmed the volume via cmd
(Administrator) defrag E: /u /v /h 


Actual result: 
after step 3, it shows the following， seems no space trimmed. (after re-timming the disk size not less then original )

C:\>defrag.exe E: /l /u /v

Invoking retrim on New Volume (E:)...

	Retrim:  100% complete.  

Slab size is too small.



Thanks
Menghuan

Comment 9 Vadim Rozenfeld 2022-10-19 08:28:45 UTC

(In reply to menli from comment #8)
> When I run win10 guest viostor test loop with 227 build, seems a new issue
> was detected related to this change.
> Feel free to update me if I am wrong.
> 
> 1) boot a win10 guest with 10g data disk.
>     -blockdev
> node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,
> filename=data.qcow2,aio=threads,discard=unmap \
>     -blockdev
> node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,
> file=file_stg2,discard=unmap \
>     -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \
> 
> 
> 2) Guest: format the new volume with "Quick Format" disabled
> 
> 3) Guest: re-trimmed the volume via cmd
> (Administrator) defrag E: /u /v /h 
> 
> 
> Actual result: 
> after step 3, it shows the following， seems no space trimmed. (after
> re-timming the disk size not less then original )
> 
> C:\>defrag.exe E: /l /u /v
> 
> Invoking retrim on New Volume (E:)...
> 
> 	Retrim:  100% complete.  
> 
> Slab size is too small.
> 
> 
Thanks a lot Menghuan,
 
Will fix it in the next build. Meanwhile can you try adding ",discard_granularity=32M " as mentioned in 
https://bugzilla.redhat.com/show_bug.cgi?id=2020998#c6 and see if it solves the problem? 
Best, Vadim.

> 
> Thanks
> Menghuan

Comment 10 menli@redhat.com 2022-10-19 08:44:26 UTC

(In reply to Vadim Rozenfeld from comment #9)
> (In reply to menli from comment #8)
> > When I run win10 guest viostor test loop with 227 build, seems a new issue
> > was detected related to this change.
> > Feel free to update me if I am wrong.
> > 
> > 1) boot a win10 guest with 10g data disk.
> >     -blockdev
> > node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,
> > filename=data.qcow2,aio=threads,discard=unmap \
> >     -blockdev
> > node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,
> > file=file_stg2,discard=unmap \
> >     -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \
> > 
> > 
> > 2) Guest: format the new volume with "Quick Format" disabled
> > 
> > 3) Guest: re-trimmed the volume via cmd
> > (Administrator) defrag E: /u /v /h 
> > 
> > 
> > Actual result: 
> > after step 3, it shows the following， seems no space trimmed. (after
> > re-timming the disk size not less then original )
> > 
> > C:\>defrag.exe E: /l /u /v
> > 
> > Invoking retrim on New Volume (E:)...
> > 
> > 	Retrim:  100% complete.  
> > 
> > Slab size is too small.
> > 
> > 
> Thanks a lot Menghuan,
>  
> Will fix it in the next build. Meanwhile can you try adding
> ",discard_granularity=32M " as mentioned in 
> https://bugzilla.redhat.com/show_bug.cgi?id=2020998#c6 and see if it solves
> the problem? 

yes, after adding discard_granularity=32M, the result is expected as the following.

    -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \
    -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \
    -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0,discard_granularity=32M \



C:\>defrag.exe E: /l /u /v

Invoking retrim on New Volume (E:)...

	Retrim:  100% complete.  

The operation completed successfully.

Post Defragmentation Report:

	Volume Information:
		Volume size                 = 9.99 GB
		Cluster size                = 4 KB
		Used space                  = 37.13 MB
		Free space                  = 9.95 GB

	Allocation Units:
		Slab count                  = 319
		Slab size                   = 32 MB
		Slab alignment              = 31.00 MB
		In-use slabs                = 2

	Retrim:
		Backed allocations          = 319
		Allocations trimmed         = 316
		Total space trimmed         = 9.87 GB

C:\>


> Best, Vadim.
> 
> > 
> > Thanks
> > Menghuan

Comment 11 Vadim Rozenfeld 2022-10-21 02:06:03 UTC

the issue in comment #8 when "discard_granularity" is not specified in the QEMU command line
has been addressed in the following PR https://github.com/virtio-win/kvm-guest-drivers-windows/pull/847

Comment 12 Peixiu Hou 2022-10-21 08:48:20 UTC

Hi Vadim,

I tried to test this bug with vioscsi + virtio-win-prewhql-227 build:

Qemu commands with discard=unmap:

-device virtio-scsi-pci,id=scsi1,bus=pci.4,addr=0x0 \
    -blockdev driver=file,filename=/home/kvm_autotest_root/images/storage.qcow2,node-name=libvirt-1-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap \
    -blockdev node-name=libvirt-1-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-1-storage \
    -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-1,drive=libvirt-1-format,id=scsi0-0-0-1,write-cache=on \

3) Format the new volume with "Quick Format" disabled

4) Guest: re-trimmed the volume via cmd
(Administrator) defrag F: /u /v /h /o

Retrim can be completed in 2 mins.

So for vioscsi, this BZ seems was fixed, but I saw you mentioned(comment#7) that you only send the patch for viostor, did not send the patch for vioscsi, could I know the reason here?

Thanks alot~
Peixiu

Comment 14 Vadim Rozenfeld 2022-10-25 06:04:22 UTC

should be fixed in build 228
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2226327

Comment 15 menli@redhat.com 2022-10-25 10:06:17 UTC

Hi Vadim,

Unfortunately， I can still hit this issue like comment8 with 228 build.

C:\>defrag.exe E: /l /u /v

Invoking retrim on New Volume (E:)...

	Retrim:  100% complete.  

Slab size is too small.

After add "discard_granularity=32M" , it can work normally.

Comment 16 Vadim Rozenfeld 2022-10-25 12:21:09 UTC

How does it work with discard_granularity=4K ?

Can you please tell me all steps to reproduce this issue?

Comment 21 Vadim Rozenfeld 2022-11-14 02:23:49 UTC

posted upstream
https://github.com/virtio-win/kvm-guest-drivers-windows/pull/858

Comment 22 Vadim Rozenfeld 2022-11-15 08:17:32 UTC

please check with the build 229
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=48990264
Thanks,
Vadim.

Comment 23 menli@redhat.com 2022-11-17 06:58:26 UTC

Steps:

package:
build 229 on comment22.

guest: win10(64) q35

1) Create an image  data disk: 
qemu-img create -f qcow2  data.qcow2 10G


2) boot a win10 guest with 10g data disk.
    -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \
    -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \
    -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \


3) Guest: format the new volume with "Quick Format" disabled

4) Guest: re-trimmed the volume via cmd
(Administrator) defrag E: /u /v /h 


Actual result: 
after step 3, the trim completed successfully, but the trim completed time is a little long (eg:3 min and 25s.)



Additional info:
1. try with 'discard_granularity=32M', it can work normally, and trim completed within 10s.
2.  try with 'discard_granularity=4k', it can work, but the trim completed time > 2min (eg:3 min and 13s.)

The trim time is still a little long based on the above result.

What's your suggestion?


Thanks 
Menghuan

Comment 27 menli@redhat.com 2022-11-18 08:42:49 UTC

Hi Vadim,

To verify this bug, I try to compare the result described in comment0.


Steps:

guest: win10_64(q35) (try both win10 21h1 and win10 21h2)

1) Create an image  data disk: 
qemu-img create -f qcow2  data.qcow2 20G


2) boot a win10 guest with 20g data disk.
    -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \
    -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \
    -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \


3) Guest: format the new volume with "Quick Format" disabled

4) Guest: re-trimmed the volume via cmd
(Administrator) defrag d: /u /v /h /o


Actual result: 
after step 4, for both 214 build and 229 build, retime complete time is about 8min 16s.


So there seemed no change from the result, my question is whether the checkpoint for retime time is right? or do I need to also pay attention to other aspects？

Thanks in advance.

Comment 28 Vadim Rozenfeld 2022-11-19 02:07:46 UTC

(In reply to menli from comment #27)
> Hi Vadim,
> 
> To verify this bug, I try to compare the result described in comment0.
> 
> 
> Steps:
> 
> guest: win10_64(q35) (try both win10 21h1 and win10 21h2)
> 
> 1) Create an image  data disk: 
> qemu-img create -f qcow2  data.qcow2 20G
> 
> 
> 2) boot a win10 guest with 20g data disk.
>     -blockdev
> node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,
> filename=data.qcow2,aio=threads,discard=unmap \
>     -blockdev
> node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,
> file=file_stg2,discard=unmap \
>     -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \
> 
> 
> 3) Guest: format the new volume with "Quick Format" disabled
> 
> 4) Guest: re-trimmed the volume via cmd
> (Administrator) defrag d: /u /v /h /o
> 
> 
> Actual result: 
> after step 4, for both 214 build and 229 build, retime complete time is
> about 8min 16s.
> 
> 
> So there seemed no change from the result, my question is whether the
> checkpoint for retime time is right? or do I need to also pay attention to
> other aspects？
> 
> Thanks in advance.

That is fine. No "discard_granularity" specified or "discard_granularity=4K" 
should give more or less the same time. The optimal performance can be achieved 
with discard_granularity equal 16MB or 32MB
Best,
Vadim.

Comment 29 menli@redhat.com 2022-11-19 04:35:13 UTC

Thanks for the explanation， Based on comment28, change the status to verified.

Comment 38 errata-xmlrpc 2023-05-09 07:55:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2451