Bug 1528541

Summary: qemu-img check reports tons of leaked clusters after re-start nfs service to resume writing data in guest
Product: Red Hat Enterprise Linux 7 Reporter: Ping Li <pingl>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: Tingting Mao <timao>
Severity: low Docs Contact:
Priority: low    
Version: 7.5CC: ailan, areis, chayang, coli, juzhang, michen, mrezanin, ngu, pingl, qzhang, virt-maint, xiagao, yhong
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-01 11:04:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Ping Li 2017-12-22 05:34:06 UTC
The default timeo for mount is 600(60s). If the nfs outage time is less than 60s, qemu will not be paused as "io-error". Then the issue does not exist.

Comment 5 Kevin Wolf 2018-06-28 16:03:56 UTC
This won't be a full solution, but I think the following patch series that I just posted upstream will go a long way towards making this a rarer event.

[PATCH 0/3] qcow2: Fix cluster leaks on write error
https://lists.gnu.org/archive/html/qemu-block/2018-06/msg01339.html

Comment 7 Miroslav Rezanina 2018-07-04 08:19:23 UTC
Fix included in qemu-kvm-rhev-2.12.0-7.el7

Comment 9 Jeff Nelson 2018-07-10 21:55:35 UTC
*** Bug 1302929 has been marked as a duplicate of this bug. ***

Comment 10 Tingting Mao 2018-07-11 06:14:56 UTC
Verify this issue like below.

Tested packages:
qemu-kvm-rhev-2.12.0-7.el7
kernel-3.10.0-918.el7

Steps:
1. Use soft mode to mount a nfs server to local directory
# mount -t nfs -o soft 10.73.224.153:/home/nfs /home/share/

2. Copy the installed base file to /home/share/, then boot a vm from base file
/usr/libexec/qemu-kvm \
        -name 'guest-rhel7.5' \
        -machine pc \
        -nodefaults \
        -vga qxl \
        -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x8 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/home/share/base.qcow2 \
        -device scsi-hd,id=image1,drive=drive_image1,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b5:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pci.0,addr=0x9  \
        -netdev tap,id=idxgXAlm \

3. Do write opration through dd command in the guest
#dd if=/dev/urandom of=/home/ftest bs=1M count=4096

4. Make nfs outage for a short period
#service nfs stop
#service nfs start

5. Shutdown the guest after dd process finished(if added "rerror=stop,werror=stop", resume the vm first)

6. Check the image
# qemu-img check /home/share/base.qcow2
No errors were found on the image.
93509/327680 = 28.54% allocated, 14.95% fragmented, 0.00% compressed clusters
Image end offset: 6129844224

The result is also correct when adding "rerror=stop,werror=stop" options for qemu-kvm command line to boot base file.

So set the bug as verified.

Comment 11 errata-xmlrpc 2018-11-01 11:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443