Bug 1526313

Summary: Improve QEMU lock error info for hot-plugging same qcow2 image file twice in different target to VM
Product: Red Hat Enterprise Linux 7 Reporter: jiyan <jiyan>
Component: qemu-kvm-rhevAssignee: Hanna Czenczek <hreitz>
Status: CLOSED ERRATA QA Contact: Tingting Mao <timao>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: aliang, chayang, coli, dyuan, guillaume.pavese, hhan, jiyan, juzhang, knoel, lmen, meili, mtessun, ngu, pkrempa, qzhang, virt-maint, xuwei, xuzhang, yhong, yisun
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-22 09:18:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1551486    
Bug Blocks: 1609654    

Description jiyan 2017-12-15 08:11:48 UTC
Description of problem:
Hot-plugging same qcow2 image file twice in different target to VM raises unclear error info

Version-Release number of selected component (if applicable):
libvirt-3.9.0-6.el7.x86_64
qemu-kvm-rhev-2.10.0-12.el7.x86_64
kernel-3.10.0-823.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
Scenario-1: Hot-plug same qcow2 image file twice in different target to VM
1. Prepare a running VM with no disk 
# virsh domstate test1
running

# virsh dumpxml test1 |grep "<disk"
no output 

2. Hot-plug same qcow2 image file twice in different target to VM
# virsh attach-disk --domain test1 --source /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 --target vda --driver qemu --subdriver qcow2
Disk attached successfully

# virsh attach-disk --domain test1 --source /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 --target vdb --driver qemu --subdriver qcow2
error: Failed to attach disk
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized

# virsh dumpxml test1 |grep "<disk" -A7
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </disk>

Scenario-2: Hot-plug same raw image file twice in different target to VM
1. Prepare a running VM with no disk 
# virsh domstate test1
running

# virsh dumpxml test1 |grep "<disk"
no output 

2. Hot-plug same raw image file twice in different target to VM
# virsh attach-disk --domain test1 --source /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.img --target vda --driver qemu
Disk attached successfully

# virsh attach-disk --domain test1 --source /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.img --target vdb --driver qemu
error: Failed to attach disk
error: internal error: unable to execute QEMU command 'device_add': Failed to get "write" lock

# virsh dumpxml test1 |grep "<disk"  -A10
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </disk>

Actual results:
As Scenario-1 shows

Expected results:
As Scenario-2 shows

Additional info:
According to the following bug
Bug 1378242 - QEMU image file locking (libvirt)

Comment 2 Peter Krempa 2017-12-15 09:37:11 UTC
Moving to qemu. These errors are reported by qemu from the locking infrastructure and it isn't practical for libvirt to reimplement everything. Libvirt's locking would report a proper error in the second case, but the first case is okay from libvirt's POV.

Also note that the error may be weird due to the downstream command __com.redhat_drive_add, while blockdev-add may report proper error.

Comment 3 Fam Zheng 2017-12-18 14:15:30 UTC
So, when image locking failed, the error message on the command line and HMP are more informative (there is this hint message "Is another process using the image?" after the error) than QMP because QMP error info doesn't include the "hint" info.

We could add "hint" key to the returned error object in QMP. This also means some work on libvirt to return it to the user.

The easier alternative is to squash that sentence into the error message and be done.

Comment 4 Fam Zheng 2017-12-18 14:41:43 UTC
Meanwhile I'm posting a middle ground solution to upstream as RFC: always include "hint" messages in QMP error texts, which has the advantages of both sides: include hint in QMP errors generically so all errors can benefit, and no libvirt change is necessary.

Comment 5 jiyan 2018-03-14 08:08:18 UTC
Hi, Fam.
I can not create external snapshot after raising the error info as this bug describes, could you please help to check whether it is same issue or another issue related to snapshot? Thanks in advance.

Version:
kernel-3.10.0-860.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7.x86_64
libvirt-3.9.0-14.virtcov.el7.x86_64

Steps:
1. Prepare a shutdown VM with bootable source file as following:
# virsh domstate test4
shut off

# virsh dumpxml test4 |grep "<disk" -A5
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

2. Prepare a xml file, the source file is same with Step-1.
# cat test4.xml 
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/>
      <target dev='hdb' bus='virtio'/>
    </disk>

3. Start VM and hot-plug, the error will raise
# virsh start test4
Domain test4 started

# virsh attach-device test4 test4.xml 
error: Failed to attach device from test4.xml
error: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized


4. Try to create external snapshot, it will fail
# virsh snapshot-create-as test4 t4s1 --disk-only --diskspec hda,snapshot=external,file=/tmp/t4s1
error: internal error: unable to execute QEMU command 'transaction': Failed to lock byte 100

# virsh snapshot-create-as test4 t4s1 --disk-only --diskspec hda,snapshot=external,file=/tmp/t4s1
error: internal error: unable to execute QEMU command 'transaction': Failed to lock byte 100

# virsh snapshot-create-as test4 t4s1 --disk-only --diskspec hda,snapshot=external,file=/tmp/t4s1
error: internal error: unable to execute QEMU command 'transaction': Failed to lock byte 100

Comment 6 Fam Zheng 2018-03-14 11:22:48 UTC
I find that disabling SELinux will fix the issue. My guess is that the error handling logic in libvirt triggered in step 3 has misconfigured the context of the image or something, as a result, a following locking operation when QEMU tries to create the snapshot is rejected by system security policy:

[pid  9863] fcntl(17, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=100, l_len=1}) = -1 EACCES (Permission denied)

I think it's a bug in libvirt, please file a separate BZ.

Comment 8 Ademar Reis 2018-12-10 19:46:06 UTC
Need to wait for blockdev-add infrastructure to be available in libvirt, so we can retest. See also: bug 609654 and bug 1589627.

Comment 9 Ademar Reis 2019-02-04 21:38:22 UTC
Adding the libvirt blockdev BZ as a dependency.

(In reply to Ademar Reis from comment #8)
> Need to wait for blockdev-add infrastructure to be available in libvirt, so
> we can retest. See also: bug 609654 and bug 1589627.

I meant bug 1609654 (typo).

Comment 10 Ademar Reis 2019-02-04 21:42:15 UTC
(In reply to Fam Zheng from comment #4)
> Meanwhile I'm posting a middle ground solution to upstream as RFC: always
> include "hint" messages in QMP error texts, which has the advantages of both
> sides: include hint in QMP errors generically so all errors can benefit, and
> no libvirt change is necessary.

This seems to be in QEMU 3.1:

commit b857431d2abe3945b672b41f33690e9943a8752a
Author: Fam Zheng <famz>
Date:   Tue Sep 25 13:05:01 2018 +0800

    file-posix: Include filename in locking error message
    
    Image locking errors happening at device initialization time doesn't say
    which file cannot be locked, for instance,
    
        -device scsi-disk,drive=drive-1: Failed to get shared "write" lock
        Is another process using the image?
    
    could refer to either the overlay image or its backing image.
    
    Hoist the error_append_hint to the caller of raw_check_lock_bytes where
    file name is known, and include it in the error hint.
    
    Signed-off-by: Fam Zheng <famz>
    Reviewed-by: Eric Blake <eblake>
    Signed-off-by: Kevin Wolf <kwolf>

Comment 11 Hanna Czenczek 2019-02-06 13:41:31 UTC
(In reply to Ademar Reis from comment #10)
> This seems to be in QEMU 3.1:

That commit is part of my current backport of BZ 1551486 (to reduce conflicts).

Max

Comment 12 Tingting Mao 2019-04-16 08:13:06 UTC
Verified this bug as below:

Tested with:
qemu-kvm-rhev-2.12.0-26.el7
kernel-3.10.0-1025.el7

Steps:

Scenario 1 (For qcow2 files)
1. Boot a guest with no data disk
(qemu) info block
drive_image1 (#block181): /home/kvm_autotest_root/images/rhel77-64-virtio-scsi.qcow2 (qcow2)
    Attached to:      /machine/peripheral/virtio_blk_pci0/virtio-backend
    Cache mode:       writethrough, ignore flushes

2. Hotplug the same qcow2 image twice in QMP

2.1 First time

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 12, "major": 2}, "package": "qemu-kvm-rhev-2.12.0-26.el7"}, "capabilities": []}}

{"execute":"qmp_capabilities"}
{"return": {}}

{
    "execute": "blockdev-add",
        "arguments": {
            "node-name": "driver2",
            "driver": "qcow2",
            "file": {
                "driver": "file",
                "filename": "/home/test/data.qcow2"
            }
        }
}
{"return": {}}

{
    "execute":"device_add",
        "arguments":{
            "driver":"virtio-scsi-pci",
            "id":"shuang-pci"
                    }
}
{"return": {}}

{
    "execute":"device_add",
        "arguments":{
            "driver":"scsi-hd",
            "drive":"driver2",
            "id":"shuang-hd"
                    }
}
{"return": {}}

2.2 Second time
{
    "execute": "blockdev-add",
        "arguments": {
            "node-name": "driver3",
            "driver": "qcow2",
            "file": {
                "driver": "file",
                "filename": "/home/test/data.qcow2"
            }
        }
}
{"error": {"class": "GenericError", "desc": "Failed to get \"write\" lock"}}



Scenario 2(For raw file)

Hotplug for first time
{
    "execute": "blockdev-add",
        "arguments": {
            "node-name": "driver3",
            "driver": "raw",
            "file": {
                "driver": "file",
                "filename": "/home/test/data.img"
            }
        }
}
{"return": {}}

{
    "execute":"device_add",
        "arguments":{
            "driver":"scsi-hd",
            "drive":"driver3",
            "id":"shuang-hd1"
                    }
}
{"return": {}}

Hotplug for second time
{
    "execute": "blockdev-add",
        "arguments": {
            "node-name": "driver4",
            "driver": "raw",
            "file": {
                "driver": "file",
                "filename": "/home/test/data.img"
            }
        }
}
{"return": {}}

{
    "execute":"device_add",
        "arguments":{
            "driver":"scsi-hd",
            "drive":"driver4",
            "id":"shuang-hd2"
                    }
}
{"error": {"class": "GenericError", "desc": "Failed to get \"write\" lock"}}

Comment 15 errata-xmlrpc 2019-08-22 09:18:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553