Bug 1510708

Summary: Incorrect error messages when 2 guests on 1 host pointing at the same disk in non-shareable mode while virtlockd is enabled
Product: Red Hat Enterprise Linux 7 Reporter: Meina Li <meili>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NOTABUG QA Contact: Fangge Jin <fjin>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.5CC: dyuan, hhan, jiyan, lmen, pkrempa, rbalakri, xuzhang, yisun, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-13 12:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Meina Li 2017-11-08 03:22:01 UTC
Description of problem:
Incorrect error messages when 2 guests on 1 host pointing at the same disk in non-shareable mode while virtlockd is enabled

Version-Release number of selected component (if applicable):
libvirt-3.9.0-1.el7.x86_64
kernel-3.10.0-766.el7.x86_64
qemu-kvm-rhev-2.10.0-4.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare two guests which use one image without <shareable> element.
 <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/lmn.qcow2'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
2. Config /etc/libvirt/qemu-lockd.conf.
    #vim /etc/libvirt/qemu-lockd.conf
    auto_disk_leases = 1
    require_lease_for_disks = 1
    file_lockspace_dir = "/var/lib/libvirt/lockd/files"
    #vim /etc/libvirt/qemu.conf
    lock_manager = "lockd"
3. Service start
    #systemctl start virtlockd
    #systemctl restart libvirtd
4. Start two guest.
    # virsh start lmn
    Domain lmn started
    # virsh start test
error: Failed to start domain test
error: internal error: process exited while connecting to monitor: 2017-11-07T09:53:44.435636Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/4 (label charserial0)
2017-11-07T09:53:44.437831Z qemu-kvm: -drive file=/var/lib/libvirt/images/lmn.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native: Failed to get "write" lock
Is another process using the image?  

Actual results:
As above descriptions.

Expected results:
# virsh start test
error: Failed to start domain test
error: resource busy: Lockspace resource '9e03fed04cb6a2f32f6624b47d212acb06d97653a02627d824b89fcf74c175bb' is locked

Additional info:
The error messages is expected in the following scenario:
(1)The above test steps in libvirt-3.2.0-14.el7_4.3.x86_64;
(2)Pointing at the same lvm/scsi device in none-shareable mode;
(3)2 guests on 2 hosts pointing at the same disk in none-shareable mode.

Comment 2 Peter Krempa 2017-11-13 12:10:42 UTC
This is expected. The locking driver in libvirt will try to acquire the lock only when the vCPUs are being started. Due to the changes to qemu to add internal image locking, the message produced by qemu is reported at the point when the image is being opened (thus before vCPUs are started).

Since qemu is able to handle the locks only on one host, the locking daemon is still necessary. (That covers the iscsi/lvm/scsi case too).

note that for case 1) in additional notes, the difference is also in the qemu package, which actually reports the error.

The shared mode would not work though, but that is tracked by a combination of
https://bugzilla.redhat.com/show_bug.cgi?id=1378242
and
https://bugzilla.redhat.com/show_bug.cgi?id=1511480

The behavior here is expected.