Bug 1526212

Summary: qemu-img should not need a write lock for creating the overlay image
Product: Red Hat Enterprise Linux 7 Reporter: Han Han <hhan>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED ERRATA QA Contact: Ping Li <pingl>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: areis, chayang, chhu, coli, dyuan, dzheng, famz, hhan, juzhang, lmiksik, michen, mrezanin, ngu, nsoffer, pingl, ratamir, timao, virt-maint, xuzhang, yafu, yanqzhan
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-17.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1525303, 1533155    

Description Han Han 2017-12-15 02:15:32 UTC
Description of problem:
As subject

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-12.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start a VM based on base image
# qemu-img create /tmp/base 100M -f qcow2                                                                                                                                
Formatting '/tmp/base', fmt=qcow2 size=104857600 cluster_size=65536 lazy_refcounts=off refcount_bits=16

# /usr/libexec/qemu-kvm /tmp/base                                  
VNC server running on ::1:5900


2. Try to create overlay image based on base image
# /usr/bin/qemu-img create -f qcow2 -o compat=1.1 -b /tmp/base -F qcow2 /tmp/top                                                                                         
qemu-img: /tmp/top: Failed to get shared "write" lock
Is another process using the image?
Could not open backing image to determine size.


Actual results:
As step2

Expected results:
Overlay image created without error.

Additional info:
This bug will block the function of creating snapshots in RHV. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1415252#c8

Comment 3 Fam Zheng 2017-12-15 14:43:02 UTC
Posted the fix to upstream:

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg02816.html

Comment 4 yafu 2017-12-19 12:33:25 UTC
The issue also exists when try to convert a image used by guest.
Steps to reproduce:
1.#/usr/libexec/qemu-kvm /var/lib/libvirt/images/data/img                                
VNC server running on ::1:5900

2.#/usr/bin/qemu-img convert -f qcow2 -O qcow2 -o compat=1.1 /var/lib/libvirt/images/data.img /var/lib/libvirt/images/data.img-clone
qemu-img: Could not open '/var/lib/libvirt/images/data.img': Failed to get shared "write" lock
Is another process using the image?

Comment 5 Fam Zheng 2017-12-20 06:36:40 UTC
(In reply to yafu from comment #4)
> The issue also exists when try to convert a image used by guest.

Don't do this. The live format conversion of image can be done with drive-mirror and the live backup can be done with drive-backup, in QMP.

Reading qcow2 guest visible data is not allowed from qemu-img or any other programs when it is used by guest.

Comment 6 Ping Li 2018-01-03 09:22:23 UTC
*** Bug 1529990 has been marked as a duplicate of this bug. ***

Comment 7 Nir Soffer 2018-01-04 01:36:30 UTC
Fam, we get same error when using qemu-img info on an image used by a guest:

From vdsm log:

Error: Command ['/usr/bin/qemu-img', 'info', '--output', 'json', '-f', 'qcow2', u'/rhev/data-center/mnt/rich-nfs-server2.usersys.redhat.com:_home_storage_sd5/43bdddd5-2edd-45f5-a55e-c08cd36648a6/images/64e7f158-7829-4933-8b80-e785b72ebf6d/244d8cd8-9782-4e1e-8b84-77016ca11406'] failed with rc=1 out='' err='qemu-img: Could not open \'/rhev/data-center/mnt/rich-nfs-server2.usersys.redhat.com:_home_storage_sd5/43bdddd5-2edd-45f5-a55e-c08cd36648a6/images/64e7f158-7829-4933-8b80-e785b72ebf6d/244d8cd8-9782-4e1e-8b84-77016ca11406\': Failed to get shared "write" lock\nIs another process using the image?\n'

We run qemu-img info on an image to get the qcow2 compat value, and taking a lock in this
case breaks released versions of RHV.

I think info should behave in the same way as create in this case.

Comment 8 Fam Zheng 2018-01-04 02:16:09 UTC
The idea is that '-U' should be used with "qemu-img info":

    # qemu-img info -U ...

when the image is qcow2 AND being used by guest.

The reason is that this is inherently racy with the QEMU process and may yield unexpected results, if the image is being updated.

The situation is the same as the other changes of behaviors due to image locking. <sharable/> won't work with new QEMU and old libvirt, similarly.

Can RHV be updated to use "-U" in this case?

Comment 9 Nir Soffer 2018-01-04 02:52:29 UTC
We will use -U in RHV 4.2[1] - but RHV 4.1 that must run with 7.5 is already out
there. Users updating the 7.5 will be hit by this backward incompatible change.

We know that accessing an image header is racy, but we need it only for getting
the qcow2 compat value, and it is not expected to be modified by qemu.

[1] https://gerrit.ovirt.org/85874

Comment 10 Fam Zheng 2018-01-05 02:45:30 UTC
OK, it is probably okay to "fallback" to -U automatically for "qemu-img info" if the locking failed. I'll send a patch to upstream to see how maintainers think.

Comment 11 Fam Zheng 2018-01-05 09:40:35 UTC
The patch mentioned in comment 10:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00845.html

Comment 12 Nir Soffer 2018-01-05 16:53:02 UTC
(In reply to Fam Zheng from comment #11)
> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00845.html

We would like to test this fix when a build is available.

Comment 13 Fam Zheng 2018-01-08 03:05:59 UTC
I made a scratch build with the fix:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14889521

Comment 14 Ademar Reis 2018-01-12 21:51:19 UTC
(In reply to Fam Zheng from comment #10)
> OK, it is probably okay to "fallback" to -U automatically for "qemu-img
> info" if the locking failed. I'll send a patch to upstream to see how
> maintainers think.

(In reply to Fam Zheng from comment #11)
> The patch mentioned in comment 10:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00845.html

Looks like the consensus upstream is to drop this patch. Nir: can you workaround this in RHV, or will we need a downstream-only patch?

Besides that, is the original report still accurate? This BZ was originally opened as a TestBlocker/Regression, but this issue with qemu-img -U looks like expected behavior on the QEMU side.

Comment 16 Miroslav Rezanina 2018-01-16 13:44:52 UTC
Fix included in qemu-kvm-rhev-2.10.0-17.el7

Comment 18 Ping Li 2018-01-17 10:06:38 UTC
Hi Ademar, Nir,

According to the below test result, I tend to set the bug as verified. If we still intend to set force shared option "-U" as default option for "qemu-img info", could we create a new bug to track it? Thanks



Packages tested:
qemu-kvm-rhev-2.10.0-17.el7
kernel-3.10.0-829.el7.x86_64

Test steps:
1. Create backing image using qcow2 format
1) Create backing image and boot the image
# qemu-img create -f qcow2 base.qcow2 1G
# /usr/libexec/qemu-kvm -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/tests/diskfile/base.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -monitor stdio
2) Create snapshot
# qemu-img create -f qcow2 -b base.qcow2 -F qcow2 sn.qcow2
3) Get information of the snapshot 
# qemu-img info sn.qcow2 
image: sn.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 196K
cluster_size: 65536
backing file: base.qcow2
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
4) Check the snapshot
# qemu-img check sn.qcow2 
qemu-img: Could not open 'sn.qcow2': Could not open backing file: Failed to get shared "write" lock
Is another process using the image?

2. Create backing image using raw format
1) Create backing image and boot the image
# qemu-img create -f raw base.img 1G
# /usr/libexec/qemu-kvm -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/home/tests/diskfile/base.img -device scsi-hd,id=image1,drive=drive_image1 -monitor stdio
2) Create snapshot
# qemu-img create -f qcow2 -b base.img -F raw sn.qcow2
3) Get information of the snapshot 
# qemu-img info sn.qcow2 
image: sn.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 196K
cluster_size: 65536
backing file: base.img
backing file format: raw
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
4) Check the snapshot    ----> Checked with Fam, raw is a bit more permissive for concurrent openers
# qemu-img check sn.qcow2 
No errors were found on the image.
Image end offset: 262144

3. Create backing image using luks format
1) Create backing image and boot the image
# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 base.luks 1G
# /usr/libexec/qemu-kvm -nodefaults -object secret,id=sec0,data=base -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=luks,file=/home/tests/diskfile/base.luks,key-secret=sec0 -device scsi-hd,id=image1,drive=drive_image1 -monitor stdio
2) Create snapshot
# qemu-img create -f qcow2 --object secret,id=sec0,data=base -b 'json:{"driver": "luks", "file": {"driver": "file", "filename": "/home/tests/diskfile/base.luks"}, "key-secret": "sec0"}' sn.qcow2
3) Get information of the snapshot 
# qemu-img info sn.qcow2 
image: sn.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"driver": "luks", "file": {"driver": "file", "filename": "/home/tests/diskfile/base.luks"}, "key-secret": "sec0"}
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
4) Check the snapshot
# qemu-img check --object secret,id=sec0,data=base --image-opts driver=qcow2,file.filename=sn.qcow2,backing.key-secret=sec0
qemu-img: Could not open 'driver=qcow2,file.filename=sn.qcow2,backing.key-secret=sec0': Could not open backing file: Failed to get shared "write" lock
Is another process using the image?

4. Create backing image using qcow2 format encrypted by luks
1) Create backing image and boot the image
# qemu-img create --object secret,id=sec0,data=base -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 base.qcow2 20G
# /usr/libexec/qemu-kvm -nodefaults -object secret,id=sec0,data=base -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/tests/diskfile/base.qcow2,encrypt.key-secret=sec0 -device scsi-hd,id=image1,drive=drive_image1 -monitor stdio
2) Create snapshot
# qemu-img create --object secret,id=sec0,data=base -f qcow2 -b 'json:{"encrypt.key-secret": "sec0", "driver": "qcow2", "file": {"driver": "file", "filename": "/home/tests/diskfile/base.qcow2"}}' sn.qcow2
3) Get information of the snapshot 
# qemu-img info sn.qcow2 
image: sn.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"encrypt.key-secret": "sec0", "driver": "qcow2", "file": {"driver": "file", "filename": "/home/tests/diskfile/base.qcow2"}}
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
4) Check the snapshot
# qemu-img check --object secret,id=sec0,data=base --image-opts driver=qcow2,file.filename=sn.qcow2,backing.encrypt.key-secret=sec0
qemu-img: Could not open 'driver=qcow2,file.filename=sn.qcow2,backing.encrypt.key-secret=sec0': Could not open backing file: Failed to get shared "write" lock
Is another process using the image?

Comment 19 Ademar Reis 2018-01-18 00:43:32 UTC
(In reply to Ping Li from comment #18)
> Hi Ademar, Nir,
> 
> According to the below test result, I tend to set the bug as verified. If we
> still intend to set force shared option "-U" as default option for "qemu-img
> info", could we create a new bug to track it? Thanks


That's correct. Nir: please open a new BZ if RHV needs the -U change.

Comment 20 Ademar Reis 2018-01-18 00:44:51 UTC
(In reply to Ademar Reis from comment #19)
> (In reply to Ping Li from comment #18)
> > Hi Ademar, Nir,
> > 
> > According to the below test result, I tend to set the bug as verified. If we
> > still intend to set force shared option "-U" as default option for "qemu-img
> > info", could we create a new bug to track it? Thanks
> 
> 
> That's correct. Nir: please open a new BZ if RHV needs the -U change.

NEEDINFO(Nir and Han)

Comment 21 Yanqiu Zhang 2018-01-18 12:19:24 UTC
(In reply to Ademar Reis from comment #20)
> (In reply to Ademar Reis from comment #19)
> > (In reply to Ping Li from comment #18)
> > > Hi Ademar, Nir,
> > > 
> > > According to the below test result, I tend to set the bug as verified. If we
> > > still intend to set force shared option "-U" as default option for "qemu-img
> > > info", could we create a new bug to track it? Thanks
> > 
> > 
> > That's correct. Nir: please open a new BZ if RHV needs the -U change.
> 
> NEEDINFO(Nir and Han)

Filed a new bug to track this: https://bugzilla.redhat.com/show_bug.cgi?id=1535992

Comment 22 Ping Li 2018-01-19 02:25:55 UTC
According to the comment 18 and comment 21, set the bug as verified.

Comment 24 errata-xmlrpc 2018-04-11 00:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104