Bug 1378242
Summary: | QEMU image file locking (libvirt) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ademar Reis <areis> | |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | |
Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | high | |||
Version: | 7.0 | CC: | berrange, bialekr, coli, dyuan, dzheng, famz, fjin, jsuchane, juzhou, kchamart, lmen, mriedem, mtessun, pkrempa, rbalakri, rjones, virt-bugs, virt-maint, xuzhang, yalzhang, yisun | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | 7.4 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-3.9.0-3.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1378241 | |||
: | 1415250 1415252 1513447 (view as bug list) | Environment: | ||
Last Closed: | 2018-04-10 10:39:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1378241, 1432523 | |||
Bug Blocks: | 1415250, 1415252, 1417306, 1469590, 1513447 |
Description
Ademar Reis
2016-09-21 21:52:53 UTC
Please also check https://bugzilla.redhat.com/show_bug.cgi?id=1378241#c7 and following discussion. Maybe we need some more libvirt "functionality" to solve the remote block device locking. (In reply to Martin Tessun from comment #2) > Please also check https://bugzilla.redhat.com/show_bug.cgi?id=1378241#c7 and > following discussion. > > Maybe we need some more libvirt "functionality" to solve the remote block > device locking. Replicating what I wrote there here, for the record: To solve remote block locking, we have bug 1337005 - "Log event when a block device in use by a guest is open read-write by external applications". The bug description contains a good explanation for the motivation and the use-cases that it covers. But the actual solution still requires some research. This is already enabled in QEMU, pending work on libvirt. https://bugzilla.redhat.com/show_bug.cgi?id=1510708 that bug trigger the qemu's lock before libvirt's, maybe need to be considered during fix this issue. When doing this we must also address this https://bugzilla.redhat.com/show_bug.cgi?id=1511480 to avoid reintroducing the qcow2 corruption problem qemu addresses with locking This bug should ideally be fixed by using share-rw=on device property for <sharable/> disks (the other possibility is "-drive file.locking=off" option, which is not intended for libvirt use, but for libguestfs). Even if share-rw=on is specified, QEMU will still make sure qcow2 is _not_ written to by two QEMU instances. Shareable disks with the new qemu were fixed by: commit 28907b0043fbf71085a798372ab9c816ba043b93 Author: Peter Krempa <pkrempa> Date: Wed Nov 15 15:21:14 2017 +0100 qemu: command: Mark <shared/> disks as such in qemu Qemu has now an internal mechanism for locking images to fix specific cases of disk corruption. This requires libvirt to mark the image as shared so that qemu lifts certain restrictions. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1378242 commit 860a3c4bea1d24773d8a495f213d5de3ac48a462 Author: Peter Krempa <pkrempa> Date: Wed Nov 15 15:02:58 2017 +0100 qemu: caps: Add capability for 'share-rw' disk option 'share-rw' for the disk device configures qemu to allow concurrent access to the backing storage. The capability is checked in various supported disk frontend buses since it does not make sense to partially backport it. I share raw files between two vms (test installation for Oracle Real Application Clusters). Example: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/work1/vms/cldb01/asm_u01_00.img'/> <target dev='vdf' bus='virtio'/> <shareable/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/> </disk> This configuration worked perfectly in Fedora 26/25, but after upgrade to Fedora 27 trying to start my second vm fails with "Failed to get "write" lock". How can I use the share-rw=on device property you mentioned above? kernel 4.13.13-300.fc27.x86_64 qemu-system-x86-2.10.1-1.fc27.x86_64 Thank you, Robert (In reply to bialekr from comment #13) > I share raw files between two vms (test installation for Oracle Real > Application Clusters). Example: > > <disk type='file' device='disk'> > <driver name='qemu' type='raw'/> > <source file='/work1/vms/cldb01/asm_u01_00.img'/> > <target dev='vdf' bus='virtio'/> > <shareable/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' > function='0x0'/> > </disk> > > This configuration worked perfectly in Fedora 26/25, but after upgrade to > Fedora 27 > trying to start my second vm fails with "Failed to get "write" lock". This is exactly what this bug tracks. > > How can I use the share-rw=on device property you mentioned above? > > kernel 4.13.13-300.fc27.x86_64 > qemu-system-x86-2.10.1-1.fc27.x86_64 This BZ was cloned to fedora as: https://bugzilla.redhat.com/show_bug.cgi?id=1513447 You'll need to wait until fedora builds the package with the commits mentioned in comment 2 there. Using blcok disk will make guest fail to start <disk type='block' device='lun' rawio='no' sgio='unfiltered'> <driver name='qemu' type='raw'/> <source dev='/dev/sdb'/> <target dev='sdb' bus='scsi'/> <shareable/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> # virsh start vm1 error: Failed to start domain vm1 error: internal error: qemu unexpectedly closed the monitor: 2017-11-28T09:19:07.595325Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,share-rw=on,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1: Property '.share-rw' not found (In reply to Dan Zheng from comment #15) > Using blcok disk will make guest fail to start > <disk type='block' device='lun' rawio='no' sgio='unfiltered'> > <driver name='qemu' type='raw'/> > <source dev='/dev/sdb'/> > <target dev='sdb' bus='scsi'/> > <shareable/> > <address type='drive' controller='0' bus='0' target='0' unit='1'/> > </disk> > Please file a QEMU bug, the underlying "share-rw" property should be available on scsi passthrough devices as well. Thanks, file a new bug 1518482 Hi, peter, could you please help to check whether the following issue is a problem? thanks in advance. Description: 2 VMs with same source Qcow2/RAW file and different bus type for disk, 2 VMs can boot successfully at the same time. Test components' version: libvirt-3.9.0-5.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.12.x86_64 kernel-3.10.0-801.el7.x86_64 Test scenarios: Scenario1: 2 VMs with same source RAW file and different bus type for disks, 2 VMs can boot successfully at the same time # virsh domstate generic shut off # virsh domstate generic1 shut off # virsh dumpxml generic --inactive |grep "<disk" -A6 <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/test.img'/> <target dev='hda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> # virsh dumpxml generic1 --inactive |grep "<disk" -A6 <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/test.img'/> <target dev='hda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # virsh start generic Domain generic started # virsh start generic1 Domain generic1 started Scenario2: 2 VMs with same source Qcow2 file and different bus type for disks, 2 VMs can boot successfully at the same time # virsh domstate generic shut off # virsh domstate generic1 shut off # virsh dumpxml generic --inactive |grep "<disk" -A6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/> <target dev='hda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> # virsh dumpxml generic1 --inactive |grep "<disk" -A6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/> <target dev='hda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # virsh start generic Domain generic started # virsh start generic1 Domain generic1 started Sorry for the outdated version of qemu-kvm-rhev in https://bugzilla.redhat.com/show_bug.cgi?id=1378242#c18 Same issue in qemu-kvm-rhev-2.10.0-11.el7.x86_64 can not be reproduced I'm running some tests against qemu 2.10 and libvirt 3.6 using a shareable raw disk and hit the write lock issue. I thought raw disks would be OK and only cow images before libvirt 3.10 were an issue? This is the snippet of the failure run from openstack nova CI testing: http://paste.openstack.org/show/639185/ Note that this is a block device - does that make a difference? (In reply to Matt Riedemann from comment #20) > I'm running some tests against qemu 2.10 and libvirt 3.6 using a shareable > raw disk and hit the write lock issue. I thought raw disks would be OK and > only cow images before libvirt 3.10 were an issue? Oh right, I did not notice that this is with libvirt 3.6 which did not properly tell qemu that the disk needs to be shared. So the issue is that qemu 2.10 requires being told that the disk needs to be shared and it was fixed only in libvirt 3.10 So libvirt versions older than that will not treat it properly and that is even including raw disks, since locking applies even there. So unfortunately you'll need a version of libvirt which has the commit mentioned in Commend 10. We can also add an entry into the domain capabilities XML mentioning that given qemu version requires locking, but that still will not close the window of incompatibility. Version: kernel-3.10.0-829.el7.x86_64 qemu-kvm-rhev-2.10.0-15.el7.x86_64 kernel-3.10.0-823.el7.x86_64 libvirt-3.9.0-7.virtcov.el7.x86_64 Test scenarios: Test scenario-1: Boot 1 VM with same qcow2 image two virtual disks 1.1 Qcow2 image in local storage - PASS # virsh domstate vm1 shut off # virsh -q domblklist vm1 hda /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 hdb /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 # virsh start vm1 error: Failed to start domain vm1 error: internal error: process exited while connecting to monitor: profiling:/builddir/build/BUILD/libvirt-3.9.0/src/util/.libs/libvirt_util_la-virbuffer.gcda:Cannot open 2018-01-15T10:55:41.819399Z qemu-kvm: -drive file=/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-1: Failed to get "write" lock Is another process using the image? 2.1 Qcow2 image in NFS storage # mount |grep nfs4 (NFSServerIP):/home/test on /tmp type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=(NFSClientIP),local_lock=none,addr=(NFSServerIP)) # virsh domstate vm1 shut off # virsh -q domblklist vm1 hda /tmp/RHEL-7.5-x86_64-latest.qcow2 hdb /tmp/RHEL-7.5-x86_64-latest.qcow2 # virsh start vm1 error: Failed to start domain vm1 error: internal error: qemu unexpectedly closed the monitor: 2018-01-16T02:32:48.235557Z qemu-kvm: -drive file=/tmp/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-1: Failed to get "write" lock Is another process using the image? Test scenario-2: Boot 2 VMs with same qcow2 image, and check the image info 1.1 Qcow2 image in local storage - PASS # virsh domstate vm1; virsh domstate vm2 shut off shut off # virsh -q domblklist vm1 ;virsh -q domblklist vm2 hda /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 hda /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 # virsh start vm1 ; virsh start vm2 Domain vm1 started error: Failed to start domain vm2 error: internal error: process exited while connecting to monitor: 2018-01-16T01:35:54.426683Z qemu-kvm: -drive file=/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-0: Failed to get "write" lock Is another process using the image? # qemu-img info /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 qemu-img: Could not open '/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2': Failed to get shared "write" lock Is another process using the image? # qemu-img info /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 -U image: /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.3G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false 2.1 Qcow2 image in NFS storage # mount |grep nfs4 (NFSServerIP):/home/test on /tmp type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=(NFSClientIP),local_lock=none,addr=(NFSServerIP)) # virsh -q domstate vm1;virsh -q domstate vm2 shut off shut off # virsh -q domblklist vm1;virsh -q domblklist vm2 hda /tmp/RHEL-7.5-x86_64-latest.qcow2 hda /tmp/RHEL-7.5-x86_64-latest.qcow2 virsh start vm1; virsh start vm2 Domain vm1 started error: Failed to start domain vm2 error: internal error: process exited while connecting to monitor: 2018-01-16T02:34:47.753522Z qemu-kvm: -drive file=/tmp/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-0: Failed to get "write" lock Is another process using the image? # qemu-img info /tmp/RHEL-7.5-x86_64-latest.qcow2 qemu-img: Could not open '/tmp/RHEL-7.5-x86_64-latest.qcow2': Failed to get shared "write" lock Is another process using the image? # qemu-img info /tmp/RHEL-7.5-x86_64-latest.qcow2 -U image: /tmp/RHEL-7.5-x86_64-latest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.3G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Test scenario-3: Boot 1 VM with qcow2 image, boot another VM with its backing file, and check the image info 1.1 Qcow2 base image in local storage # virsh domstate vm1;virsh domstate vm2 shut off shut off # virsh dumpxml vm1 |grep "<disk" -A5;virsh dumpxml vm2 |grep "<disk" -A5 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/backingfileisqcow2.qcow2'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # virsh start vm1;virsh start vm2 Domain vm1 started error: Failed to start domain vm2 error: internal error: process exited while connecting to monitor: 2018-01-16T02:09:16.708415Z qemu-kvm: -drive file=/var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-0: Failed to get "write" lock Is another process using the image? # qemu-img info /var/lib/libvirt/images/backingfileisqcow2.qcow2 qemu-img: Could not open '/var/lib/libvirt/images/backingfileisqcow2.qcow2': Failed to get shared "write" lock Is another process using the image? # qemu-img info /var/lib/libvirt/images/backingfileisqcow2.qcow2 -U image: /var/lib/libvirt/images/backingfileisqcow2.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 6.3M cluster_size: 65536 backing file: /var/lib/libvirt/images/RHEL-7.5-x86_64-latest.qcow2 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false 2.1 Qcow2 base image in NFS storage # mount |grep nfs4 (NFSServerIP):/home/test on /tmp type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=(NFSClientIP),local_lock=none,addr=(NFSServerIP)) # virsh -q domstate vm1;virsh -q domstate vm2 shut off shut off # virsh dumpxml vm1 |grep "<disk" -A5;virsh dumpxml vm2 |grep "<disk" -A5 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/backingfileisqcow2.qcow2'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/RHEL-7.5-x86_64-latest.qcow2'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # virsh start vm1;virsh start vm2 Domain vm1 started error: Failed to start domain vm2 error: internal error: process exited while connecting to monitor: 2018-01-16T02:40:29.214643Z qemu-kvm: -drive file=/tmp/RHEL-7.5-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-0: Failed to get "write" lock Is another process using the image? # qemu-img info /tmp/backingfileisqcow2.qcow2 qemu-img: Could not open '/tmp/backingfileisqcow2.qcow2': Failed to get shared "write" lock Is another process using the image? # qemu-img info /tmp/backingfileisqcow2.qcow2 -U image: /tmp/backingfileisqcow2.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 7.9M cluster_size: 65536 backing file: /tmp/RHEL-7.5-x86_64-latest.qcow2 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false All the results are as expected, move this bug to 'verified'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |