Bug 2020998
Summary: | [virtio-win] Windows 10 "Optimize drive"/Trim/Discard causes all data to be rewritten | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Peixiu Hou <phou> | |
Component: | virtio-win | Assignee: | Vadim Rozenfeld <vrozenfe> | |
virtio-win sub component: | virtio-win-prewhql | QA Contact: | Peixiu Hou <phou> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | medium | CC: | ailan, fdeutsch, gveitmic, mdean, menli, qizhu, vrozenfe, ymankad | |
Version: | 9.0 | Keywords: | Triaged, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Windows | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2145213 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-09 07:55:10 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2145213, 2154127 | |||
Deadline: | 2023-02-13 |
Description
Peixiu Hou
2021-11-08 03:41:37 UTC
Hi Vadim, The DTM is set to 16, does this mean the fix has already been included in recent release? Would you please help to move it to ON_QA if so? Thanks. Regards, Qianqian (In reply to Qianqian Zhu from comment #1) > Hi Vadim, > > The DTM is set to 16, does this mean the fix has already been included in > recent release? Would you please help to move it to ON_QA if so? Thanks. > > Regards, > Qianqian Hi Qianqian, Moved it to 20. Honestly, it is still not clear if the problem can be solved by fixing the drivers or some extra qemu fixes will be required. Best, Vadim. hit this issue on win10-32(pc) with viostor. Packages: kernel-5.14.0-145.el9.x86_64 qemu-kvm-7.0.0-9.el9.x86_64 seabios-bin-1.16.0-4.el9.noarch RHEL-9.1.0-20220814.1 virtio-win-prewhql-224 cdrom_cd1 = isos/ISO/Win10/en-us_windows_10_business_editions_version_21h2_updated_april_2022_x86_dvd_691b7024.iso auto case: trim_support_test http://virtqetools.lab.eng.pek2.redhat.com/autotest_static_job_log/6924299/test-results/19-Host_RHEL.m9.u1.qcow2.virtio_blk.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.trim_support_test/ Thanks Menghuan I've just pushed the viostor related fix https://github.com/virtio-win/kvm-guest-drivers-windows/pull/824 vioscsi doesn't need any changes. In both case (viostor and vioscsi) setting discard_granularity to 16/32M (Hyper-V uses 32M) makes Windows to work with large slabs (clusters) which reduce the defragmentation time significantly Below please see "defrag.exe e: /u /v /h /o" command execution time for 10G volume on Win10 21H2 system discard_granularity 4K 32K 256K 2K 16M 32M Optimal unmap granularity 8 64 512 4096 32768 65536 virtio-blk defrag time in sec 615.61 78.77 15.48 4.29 1.43 1.22 virtio-scsi defrag time in sec 575.77 149 15.50 3.25 1.44 1.72 qemu command line for virtio-blk -drive file=$DSK0,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,discard=unmap,aio=native -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=-1,serial=xru001i,discard_granularity=32M \ and virtio-scsi -drive file=$DSK0,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=none,aio=native,id=drive-vioscsi0 -device virtio-scsi-pci,id=scsi-vioscsi0 -device scsi-hd,drive=drive-vioscsi0,id=vioscsi0,bus=scsi-vioscsi0.0,lun=0,scsi-id=0,bootindex=-1,discard_granularity=32M \ the viostor related fix was included in build 226 https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2176313 When I run win10 guest viostor test loop with 227 build, seems a new issue was detected related to this change. Feel free to update me if I am wrong. 1) boot a win10 guest with 10g data disk. -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \ -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \ -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ 2) Guest: format the new volume with "Quick Format" disabled 3) Guest: re-trimmed the volume via cmd (Administrator) defrag E: /u /v /h Actual result: after step 3, it shows the following, seems no space trimmed. (after re-timming the disk size not less then original ) C:\>defrag.exe E: /l /u /v Invoking retrim on New Volume (E:)... Retrim: 100% complete. Slab size is too small. Thanks Menghuan (In reply to menli from comment #8) > When I run win10 guest viostor test loop with 227 build, seems a new issue > was detected related to this change. > Feel free to update me if I am wrong. > > 1) boot a win10 guest with 10g data disk. > -blockdev > node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off, > filename=data.qcow2,aio=threads,discard=unmap \ > -blockdev > node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off, > file=file_stg2,discard=unmap \ > -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ > > > 2) Guest: format the new volume with "Quick Format" disabled > > 3) Guest: re-trimmed the volume via cmd > (Administrator) defrag E: /u /v /h > > > Actual result: > after step 3, it shows the following, seems no space trimmed. (after > re-timming the disk size not less then original ) > > C:\>defrag.exe E: /l /u /v > > Invoking retrim on New Volume (E:)... > > Retrim: 100% complete. > > Slab size is too small. > > Thanks a lot Menghuan, Will fix it in the next build. Meanwhile can you try adding ",discard_granularity=32M " as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2020998#c6 and see if it solves the problem? Best, Vadim. > > Thanks > Menghuan (In reply to Vadim Rozenfeld from comment #9) > (In reply to menli from comment #8) > > When I run win10 guest viostor test loop with 227 build, seems a new issue > > was detected related to this change. > > Feel free to update me if I am wrong. > > > > 1) boot a win10 guest with 10g data disk. > > -blockdev > > node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off, > > filename=data.qcow2,aio=threads,discard=unmap \ > > -blockdev > > node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off, > > file=file_stg2,discard=unmap \ > > -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ > > > > > > 2) Guest: format the new volume with "Quick Format" disabled > > > > 3) Guest: re-trimmed the volume via cmd > > (Administrator) defrag E: /u /v /h > > > > > > Actual result: > > after step 3, it shows the following, seems no space trimmed. (after > > re-timming the disk size not less then original ) > > > > C:\>defrag.exe E: /l /u /v > > > > Invoking retrim on New Volume (E:)... > > > > Retrim: 100% complete. > > > > Slab size is too small. > > > > > Thanks a lot Menghuan, > > Will fix it in the next build. Meanwhile can you try adding > ",discard_granularity=32M " as mentioned in > https://bugzilla.redhat.com/show_bug.cgi?id=2020998#c6 and see if it solves > the problem? yes, after adding discard_granularity=32M, the result is expected as the following. -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \ -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \ -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0,discard_granularity=32M \ C:\>defrag.exe E: /l /u /v Invoking retrim on New Volume (E:)... Retrim: 100% complete. The operation completed successfully. Post Defragmentation Report: Volume Information: Volume size = 9.99 GB Cluster size = 4 KB Used space = 37.13 MB Free space = 9.95 GB Allocation Units: Slab count = 319 Slab size = 32 MB Slab alignment = 31.00 MB In-use slabs = 2 Retrim: Backed allocations = 319 Allocations trimmed = 316 Total space trimmed = 9.87 GB C:\> > Best, Vadim. > > > > > Thanks > > Menghuan the issue in comment #8 when "discard_granularity" is not specified in the QEMU command line has been addressed in the following PR https://github.com/virtio-win/kvm-guest-drivers-windows/pull/847 Hi Vadim, I tried to test this bug with vioscsi + virtio-win-prewhql-227 build: Qemu commands with discard=unmap: -device virtio-scsi-pci,id=scsi1,bus=pci.4,addr=0x0 \ -blockdev driver=file,filename=/home/kvm_autotest_root/images/storage.qcow2,node-name=libvirt-1-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap \ -blockdev node-name=libvirt-1-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-1-storage \ -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-1,drive=libvirt-1-format,id=scsi0-0-0-1,write-cache=on \ 3) Format the new volume with "Quick Format" disabled 4) Guest: re-trimmed the volume via cmd (Administrator) defrag F: /u /v /h /o Retrim can be completed in 2 mins. So for vioscsi, this BZ seems was fixed, but I saw you mentioned(comment#7) that you only send the patch for viostor, did not send the patch for vioscsi, could I know the reason here? Thanks alot~ Peixiu should be fixed in build 228 https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2226327 Hi Vadim, Unfortunately, I can still hit this issue like comment8 with 228 build. C:\>defrag.exe E: /l /u /v Invoking retrim on New Volume (E:)... Retrim: 100% complete. Slab size is too small. After add "discard_granularity=32M" , it can work normally. How does it work with discard_granularity=4K ? Can you please tell me all steps to reproduce this issue? posted upstream https://github.com/virtio-win/kvm-guest-drivers-windows/pull/858 please check with the build 229 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=48990264 Thanks, Vadim. Steps: package: build 229 on comment22. guest: win10(64) q35 1) Create an image data disk: qemu-img create -f qcow2 data.qcow2 10G 2) boot a win10 guest with 10g data disk. -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \ -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \ -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ 3) Guest: format the new volume with "Quick Format" disabled 4) Guest: re-trimmed the volume via cmd (Administrator) defrag E: /u /v /h Actual result: after step 3, the trim completed successfully, but the trim completed time is a little long (eg:3 min and 25s.) Additional info: 1. try with 'discard_granularity=32M', it can work normally, and trim completed within 10s. 2. try with 'discard_granularity=4k', it can work, but the trim completed time > 2min (eg:3 min and 13s.) The trim time is still a little long based on the above result. What's your suggestion? Thanks Menghuan Hi Vadim, To verify this bug, I try to compare the result described in comment0. Steps: guest: win10_64(q35) (try both win10 21h1 and win10 21h2) 1) Create an image data disk: qemu-img create -f qcow2 data.qcow2 20G 2) boot a win10 guest with 20g data disk. -blockdev node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off,filename=data.qcow2,aio=threads,discard=unmap \ -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg2,discard=unmap \ -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ 3) Guest: format the new volume with "Quick Format" disabled 4) Guest: re-trimmed the volume via cmd (Administrator) defrag d: /u /v /h /o Actual result: after step 4, for both 214 build and 229 build, retime complete time is about 8min 16s. So there seemed no change from the result, my question is whether the checkpoint for retime time is right? or do I need to also pay attention to other aspects? Thanks in advance. (In reply to menli from comment #27) > Hi Vadim, > > To verify this bug, I try to compare the result described in comment0. > > > Steps: > > guest: win10_64(q35) (try both win10 21h1 and win10 21h2) > > 1) Create an image data disk: > qemu-img create -f qcow2 data.qcow2 20G > > > 2) boot a win10 guest with 20g data disk. > -blockdev > node-name=file_stg2,driver=file,cache.direct=on,cache.no-flush=off, > filename=data.qcow2,aio=threads,discard=unmap \ > -blockdev > node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off, > file=file_stg2,discard=unmap \ > -device virtio-blk-pci,id=stg2,drive=drive_stg2,bus=pci.6,addr=0x0 \ > > > 3) Guest: format the new volume with "Quick Format" disabled > > 4) Guest: re-trimmed the volume via cmd > (Administrator) defrag d: /u /v /h /o > > > Actual result: > after step 4, for both 214 build and 229 build, retime complete time is > about 8min 16s. > > > So there seemed no change from the result, my question is whether the > checkpoint for retime time is right? or do I need to also pay attention to > other aspects? > > Thanks in advance. That is fine. No "discard_granularity" specified or "discard_granularity=4K" should give more or less the same time. The optimal performance can be achieved with discard_granularity equal 16MB or 32MB Best, Vadim. Thanks for the explanation, Based on comment28, change the status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2451 |