1547095 – QEMU image locking on NFSv3 prevents VMs from getting restarted on different hosts upon an host crash, seen on RHEL 7.5

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1547095 - QEMU image locking on NFSv3 prevents VMs from getting restarted on different hosts upon an host crash, seen on RHEL 7.5

Summary: QEMU image locking on NFSv3 prevents VMs from getting restarted on different ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Fam Zheng
QA Contact:	Ping Li
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1547033 1592582 (view as bug list)
Depends On:
Blocks:	1547033 1550016 1553154 1556957
TreeView+	depends on / blocked

Reported:	2018-02-20 13:53 UTC by Simone Tiraboschi
Modified:	2021-09-09 13:14 UTC (History)
CC List:	41 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1553154 (view as bug list)
Environment:
Last Closed:	2018-05-30 18:45:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qemu strace (120.00 KB, text/plain) 2018-02-20 15:52 UTC, Simone Tiraboschi	no flags	Details
ovf from the OVF_STORE (15.46 KB, application/xml) 2018-02-21 08:33 UTC, Simone Tiraboschi	no flags	Details
vdsm logs (260.05 KB, application/x-xz) 2018-02-21 22:11 UTC, Simone Tiraboschi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1378242	high	CLOSED	QEMU image file locking (libvirt)	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1504606	high	CLOSED	[Blocked] Use the Domain XML to create the HE VM	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1550127	unspecified	CLOSED	[NFS v3] HA VM is stuck in unkown status after ungraceful shutdown	2021-02-22 00:41:40 UTC

Internal Links: 1378242 1504606 1550127

Description Simone Tiraboschi 2018-02-20 13:53:14 UTC

Description of problem:
Seen on RHEL 7.5.
Deploy hosted-engine (on NFS in the reporter case) on RHEL 7.5 hosts.
Add two additional 7.5 hosts and then suddenly power off the host where the hosted-engine VM was running.

The other two hosts fails to restart the engine VM.
In /var/log/libvirt/qemu/HostedEngine.log we see:

 2018-02-20 13:09:30.169+0000: starting up libvirt version: 3.9.0, package: 13.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-02-13-06:21:28, x86-041.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-20.el7), hostname: rose05.qa.lab.tlv.redhat.com
 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Conroe,vmx=on -m 8192 -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 28f35d31-d0fc-4902-a44c-9d2251f09e21 -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.5-6.el7,serial=4C4C4544-0057-5610-8056-C4C04F4D5731,uuid=28f35d31-d0fc-4902-a44c-9d2251f09e21' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2018-02-20T13:09:30,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/var/run/vdsm/storage/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,format=raw,if=none,id=drive-virtio-disk0,serial=e62bf4a4-6132-4c14-8aba-f292febdc4f9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on,werror=stop,rerror=stop -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:10:9f,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/28f35d31-d0fc-4902-a44c-9d2251f09e21.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/28f35d31-d0fc-4902-a44c-9d2251f09e21.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/28f35d31-d0fc-4902-a44c-9d2251f09e21.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
 2018-02-20 13:09:30.178+0000: 21779: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f4fc80efcb0
 2018-02-20T13:09:30.297036Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/1 (label charconsole0)
 2018-02-20T13:09:30.298806Z qemu-kvm: -drive file=/var/run/vdsm/storage/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,format=raw,if=none,id=drive-virtio-disk0,serial=e62bf4a4-6132-4c14-8aba-f292febdc4f9,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead
 2018-02-20T13:09:30.317745Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get "write" lock
 Is another process using the image?

On older ovirt-hosted-engine-ha release we were starting the engine VM from json vm.conf which contains:
 # Editing the hosted engine VM is only possible via the manager UI\API
 cpuType=Conroe
 emulatedMachine=pc-i440fx-rhel7.5.0
 vmId=28f35d31-d0fc-4902-a44c-9d2251f09e21
 smp=2
 memSize=8192
 maxVCpus=16
 spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
xmlBase64=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48ZG9tYWluIHR5cGU9Imt2bSIgeG1sbnM6b3ZpcnQtdHVuZT0iaHR0cDovL292aXJ0Lm9yZy92bS90dW5lLzEuMCIgeG1sbnM6b3ZpcnQtdm09Imh0dHA6Ly9vdmlydC5vcmcvdm0vMS4wIj48bmFtZT5Ib3N0ZWRFbmdpbmU8L25hbWU+PHV1aWQ+MjhmMzVkMzEtZDBmYy00OTAyLWE0NGMtOWQyMjUxZjA5ZTIxPC91dWlkPjxtZW1vcnk+ODM4ODYwODwvbWVtb3J5PjxjdXJyZW50TWVtb3J5PjgzODg2MDg8L2N1cnJlbnRNZW1vcnk+PG1heE1lbW9yeSBzbG90cz0iMTYiPjE2Nzc3MjE2PC9tYXhNZW1vcnk+PHZjcHUgY3VycmVudD0iMiI+MTY8L3ZjcHU+PHN5c2luZm8gdHlwZT0ic21iaW9zIj48c3lzdGVtPjxlbnRyeSBuYW1lPSJtYW51ZmFjdHVyZXIiPm9WaXJ0PC9lbnRyeT48ZW50cnkgbmFtZT0icHJvZHVjdCI+T1MtTkFNRTo8L2VudHJ5PjxlbnRyeSBuYW1lPSJ2ZXJzaW9uIj5PUy1WRVJTSU9OOjwvZW50cnk+PGVudHJ5IG5hbWU9InNlcmlhbCI+SE9TVC1TRVJJQUw6PC9lbnRyeT48ZW50cnkgbmFtZT0idXVpZCI+MjhmMzVkMzEtZDBmYy00OTAyLWE0NGMtOWQyMjUxZjA5ZTIxPC9lbnRyeT48L3N5c3RlbT48L3N5c2luZm8+PGNsb2NrIG9mZnNldD0idmFyaWFibGUiIGFkanVzdG1lbnQ9IjAiPjx0aW1lciBuYW1lPSJydGMiIHRpY2twb2xpY3k9ImNhdGNodXAiPjwvdGltZXI+PHRpbWVyIG5hbWU9InBpdCIgdGlja3BvbGljeT0iZGVsYXkiPjwvdGltZXI+PHRpbWVyIG5hbWU9ImhwZXQiIHByZXNlbnQ9Im5vIj48L3RpbWVyPjwvY2xvY2s+PGZlYXR1cmVzPjxhY3BpPjwvYWNwaT48L2ZlYXR1cmVzPjxjcHUgbWF0Y2g9ImV4YWN0Ij48bW9kZWw+Q29ucm9lPC9tb2RlbD48dG9wb2xvZ3kgY29yZXM9IjEiIHRocmVhZHM9IjEiIHNvY2tldHM9IjE2Ij48L3RvcG9sb2d5PjxudW1hPjxjZWxsIGNwdXM9IjAsMSIgbWVtb3J5PSI4Mzg4NjA4Ij48L2NlbGw+PC9udW1hPjwvY3B1PjxjcHV0dW5lPjwvY3B1dHVuZT48ZGV2aWNlcz48aW5wdXQgdHlwZT0idGFibGV0IiBidXM9InVzYiI+PC9pbnB1dD48Y2hhbm5lbCB0eXBlPSJ1bml4Ij48dGFyZ2V0IHR5cGU9InZpcnRpbyIgbmFtZT0ib3ZpcnQtZ3Vlc3QtYWdlbnQuMCI+PC90YXJnZXQ+PHNvdXJjZSBtb2RlPSJiaW5kIiBwYXRoPSIvdmFyL2xpYi9saWJ2aXJ0L3FlbXUvY2hhbm5lbHMvMjhmMzVkMzEtZDBmYy00OTAyLWE0NGMtOWQyMjUxZjA5ZTIxLm92aXJ0LWd1ZXN0LWFnZW50LjAiPjwvc291cmNlPjwvY2hhbm5lbD48Y2hhbm5lbCB0eXBlPSJ1bml4Ij48dGFyZ2V0IHR5cGU9InZpcnRpbyIgbmFtZT0ib3JnLnFlbXUuZ3Vlc3RfYWdlbnQuMCI+PC90YXJnZXQ+PHNvdXJjZSBtb2RlPSJiaW5kIiBwYXRoPSIvdmFyL2xpYi9saWJ2aXJ0L3FlbXUvY2hhbm5lbHMvMjhmMzVkMzEtZDBmYy00OTAyLWE0NGMtOWQyMjUxZjA5ZTIxLm9yZy5xZW11Lmd1ZXN0X2FnZW50LjAiPjwvc291cmNlPjwvY2hhbm5lbD48Z3JhcGhpY3MgdHlwZT0idm5jIiBwb3J0PSItMSIgYXV0b3BvcnQ9InllcyIgcGFzc3dkPSIqKioqKiIgcGFzc3dkVmFsaWRUbz0iMTk3MC0wMS0wMVQwMDowMDowMSIga2V5bWFwPSJlbi11cyI+PGxpc3RlbiB0eXBlPSJuZXR3b3JrIiBuZXR3b3JrPSJ2ZHNtLW92aXJ0bWdtdCI+PC9saXN0ZW4+PC9ncmFwaGljcz48Y29udHJvbGxlciB0eXBlPSJ2aXJ0aW8tc2VyaWFsIiBpbmRleD0iMCIgcG9ydHM9IjE2Ij48YWRkcmVzcyBidXM9IjB4MDAiIGRvbWFpbj0iMHgwMDAwIiBmdW5jdGlvbj0iMHgwIiBzbG90PSIweDA1IiB0eXBlPSJwY2kiPjwvYWRkcmVzcz48L2NvbnRyb2xsZXI+PGNvbnNvbGUgdHlwZT0icHR5Ij48dGFyZ2V0IHR5cGU9InZpcnRpbyIgcG9ydD0iMCI+PC90YXJnZXQ+PC9jb25zb2xlPjxjb250cm9sbGVyIHR5cGU9InVzYiIgbW9kZWw9InBpaXgzLXVoY2kiIGluZGV4PSIwIj48YWRkcmVzcyBidXM9IjB4MDAiIGRvbWFpbj0iMHgwMDAwIiBmdW5jdGlvbj0iMHgyIiBzbG90PSIweDAxIiB0eXBlPSJwY2kiPjwvYWRkcmVzcz48L2NvbnRyb2xsZXI+PGNvbnRyb2xsZXIgdHlwZT0ic2NzaSIgbW9kZWw9InZpcnRpby1zY3NpIiBpbmRleD0iMCI+PGFkZHJlc3MgYnVzPSIweDAwIiBkb21haW49IjB4MDAwMCIgZnVuY3Rpb249IjB4MCIgc2xvdD0iMHgwNCIgdHlwZT0icGNpIj48L2FkZHJlc3M+PC9jb250cm9sbGVyPjxjb250cm9sbGVyIHR5cGU9ImlkZSIgaW5kZXg9IjAiPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDEiIHNsb3Q9IjB4MDEiIHR5cGU9InBjaSI+PC9hZGRyZXNzPjwvY29udHJvbGxlcj48bWVtYmFsbG9vbiBtb2RlbD0ibm9uZSI+PC9tZW1iYWxsb29uPjxpbnRlcmZhY2UgdHlwZT0iYnJpZGdlIj48bW9kZWwgdHlwZT0idmlydGlvIj48L21vZGVsPjxsaW5rIHN0YXRlPSJ1cCI+PC9saW5rPjxzb3VyY2UgYnJpZGdlPSJvdmlydG1nbXQiPjwvc291cmNlPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDAiIHNsb3Q9IjB4MDMiIHR5cGU9InBjaSI+PC9hZGRyZXNzPjxtYWMgYWRkcmVzcz0iMDA6MWE6NGE6MTY6MTA6OWYiPjwvbWFjPjxmaWx0ZXJyZWYgZmlsdGVyPSJ2ZHNtLW5vLW1hYy1zcG9vZmluZyI+PC9maWx0ZXJyZWY+PGJhbmR3aWR0aD48L2JhbmR3aWR0aD48L2ludGVyZmFjZT48ZGlzayB0eXBlPSJmaWxlIiBkZXZpY2U9ImNkcm9tIiBzbmFwc2hvdD0ibm8iPjxkcml2ZXIgbmFtZT0icWVtdSIgdHlwZT0icmF3IiBlcnJvcl9wb2xpY3k9InJlcG9ydCI+PC9kcml2ZXI+PHNvdXJjZSBmaWxlPSIiIHN0YXJ0dXBQb2xpY3k9Im9wdGlvbmFsIj48L3NvdXJjZT48dGFyZ2V0IGRldj0iaGRjIiBidXM9ImlkZSI+PC90YXJnZXQ+PHJlYWRvbmx5PjwvcmVhZG9ubHk+PGFkZHJlc3MgYnVzPSIxIiBjb250cm9sbGVyPSIwIiB1bml0PSIwIiB0eXBlPSJkcml2ZSIgdGFyZ2V0PSIwIj48L2FkZHJlc3M+PC9kaXNrPjxkaXNrIHNuYXBzaG90PSJubyIgdHlwZT0iZmlsZSIgZGV2aWNlPSJkaXNrIj48dGFyZ2V0IGRldj0idmRhIiBidXM9InZpcnRpbyI+PC90YXJnZXQ+PHNvdXJjZSBmaWxlPSIvcmhldi9kYXRhLWNlbnRlci8wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvMmE3MzM0ZTctYzFkMy00MWZkLTk1NTItMmFhY2JmYTRmOWFmL2ltYWdlcy9lNjJiZjRhNC02MTMyLTRjMTQtOGFiYS1mMjkyZmViZGM0ZjkvOTc2ZWNiYTgtNzEyYi00YjljLWIzZDMtOWQ2ZmU5ZDdlNjE4Ij48L3NvdXJjZT48ZHJpdmVyIG5hbWU9InFlbXUiIGlvPSJ0aHJlYWRzIiB0eXBlPSJyYXciIGVycm9yX3BvbGljeT0ic3RvcCIgY2FjaGU9Im5vbmUiPjwvZHJpdmVyPjxhZGRyZXNzIGJ1cz0iMHgwMCIgZG9tYWluPSIweDAwMDAiIGZ1bmN0aW9uPSIweDAiIHNsb3Q9IjB4MDYiIHR5cGU9InBjaSI+PC9hZGRyZXNzPjxzZXJpYWw+ZTYyYmY0YTQtNjEzMi00YzE0LThhYmEtZjI5MmZlYmRjNGY5PC9zZXJpYWw+PC9kaXNrPjwvZGV2aWNlcz48cG0+PHN1c3BlbmQtdG8tZGlzayBlbmFibGVkPSJubyI+PC9zdXNwZW5kLXRvLWRpc2s+PHN1c3BlbmQtdG8tbWVtIGVuYWJsZWQ9Im5vIj48L3N1c3BlbmQtdG8tbWVtPjwvcG0+PG9zPjx0eXBlIGFyY2g9Ing4Nl82NCIgbWFjaGluZT0icGMtaTQ0MGZ4LXJoZWw3LjUuMCI+aHZtPC90eXBlPjxzbWJpb3MgbW9kZT0ic3lzaW5mbyI+PC9zbWJpb3M+PC9vcz48bWV0YWRhdGE+PG92aXJ0LXR1bmU6cW9zPjwvb3ZpcnQtdHVuZTpxb3M+PG92aXJ0LXZtOnZtPjxtaW5HdWFyYW50ZWVkTWVtb3J5TWIgdHlwZT0iaW50Ij44MTkyPC9taW5HdWFyYW50ZWVkTWVtb3J5TWI+PGNsdXN0ZXJWZXJzaW9uPjQuMjwvY2x1c3RlclZlcnNpb24+PG92aXJ0LXZtOmN1c3RvbT48L292aXJ0LXZtOmN1c3RvbT48b3ZpcnQtdm06ZGV2aWNlIG1hY19hZGRyZXNzPSIwMDoxYTo0YToxNjoxMDo5ZiI+PG92aXJ0LXZtOmN1c3RvbT48L292aXJ0LXZtOmN1c3RvbT48L292aXJ0LXZtOmRldmljZT48b3ZpcnQtdm06ZGV2aWNlIGRldnR5cGU9ImRpc2siIG5hbWU9InZkYSI+PG92aXJ0LXZtOmltYWdlSUQ+ZTYyYmY0YTQtNjEzMi00YzE0LThhYmEtZjI5MmZlYmRjNGY5PC9vdmlydC12bTppbWFnZUlEPjxvdmlydC12bTpwb29sSUQ+MDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwPC9vdmlydC12bTpwb29sSUQ+PG92aXJ0LXZtOnZvbHVtZUlEPjk3NmVjYmE4LTcxMmItNGI5Yy1iM2QzLTlkNmZlOWQ3ZTYxODwvb3ZpcnQtdm06dm9sdW1lSUQ+PG92aXJ0LXZtOmRvbWFpbklEPjJhNzMzNGU3LWMxZDMtNDFmZC05NTUyLTJhYWNiZmE0ZjlhZjwvb3ZpcnQtdm06ZG9tYWluSUQ+PC9vdmlydC12bTpkZXZpY2U+PGxhdW5jaFBhdXNlZD5mYWxzZTwvbGF1bmNoUGF1c2VkPjxyZXN1bWVCZWhhdmlvcj5hdXRvX3Jlc3VtZTwvcmVzdW1lQmVoYXZpb3I+PC9vdmlydC12bTp2bT48L21ldGFkYXRhPjwvZG9tYWluPg==
 vmName=HostedEngine
 display=vnc
 devices={index:0,iface:virtio,format:raw,bootOrder:1,address:{type:pci,slot:0x06,bus:0x00,domain:0x0000,function:0x0},volumeID:976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,imageID:e62bf4a4-6132-4c14-8aba-f292febdc4f9,readonly:false,domainID:2a7334e7-c1d3-41fd-9552-2aacbfa4f9af,deviceId:e62bf4a4-6132-4c14-8aba-f292febdc4f9,poolID:00000000-0000-0000-0000-000000000000,device:disk,shared:exclusive,propagateErrors:off,type:disk}
 devices={nicModel:pv,macAddr:00:1a:4a:16:10:9f,linkActive:true,network:ovirtmgmt,deviceId:a029a17b-c0d2-4045-9e89-e9b7a0e23b80,address:{type:pci,slot:0x03,bus:0x00,domain:0x0000,function:0x0},device:bridge,type:interface}
 devices={device:vnc,type:graphics,deviceId:30f608bc-8161-4db3-bd8d-c1c567f7ad75,address:None}
 devices={index:2,iface:ide,shared:false,readonly:true,deviceId:8c3179ac-b322-4f5c-9449-c52e3665e0ae,address:{controller:0,target:0,unit:0,bus:1,type:drive},device:cdrom,path:,type:disk}
 devices={device:usb,specParams:{index:0,model:piix3-uhci},type:controller,deviceId:b30ade5c-5394-421f-85d7-c499341c0027,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x2}}
 devices={specParams:{index:0,model:virtio-scsi},deviceId:bc5e64f4-98b6-482e-8223-03fc525ae522,address:{type:pci,slot:0x04,bus:0x00,domain:0x0000,function:0x0},device:scsi,model:virtio-scsi,type:controller}
 devices={device:ide,specParams:{index:0},type:controller,deviceId:f565c69a-2b0f-4d4c-b004-3da303c43da5,address:{type:pci,slot:0x01,bus:0x00,domain:0x0000,function:0x1}}
 devices={device:virtio-serial,specParams:{index:0},type:controller,deviceId:48a86d50-518a-4f8c-8d93-a81c868ca022,address:{type:pci,slot:0x05,bus:0x00,domain:0x0000,function:0x0}}
 devices={device:console,type:console,deviceId:4af63e2a-1590-41fc-9a31-11d19ec2ada8,address:None}
 devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng}

please note "shared:exclusive".

Now we are starting (as for https://bugzilla.redhat.com/1504606 ) the engine VM directly from libvirt XML set by the engine in the OVF_STORE.
In our case we have:
 <?xml version="1.0" encoding="UTF-8"?>
 <domain xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
    <name>HostedEngine</name>
    <uuid>28f35d31-d0fc-4902-a44c-9d2251f09e21</uuid>
    <memory>8388608</memory>
    <currentMemory>8388608</currentMemory>
    <maxMemory slots="16">16777216</maxMemory>
    <vcpu current="2">16</vcpu>
    <sysinfo type="smbios">
       <system>
          <entry name="manufacturer">oVirt</entry>
          <entry name="product">OS-NAME:</entry>
          <entry name="version">OS-VERSION:</entry>
          <entry name="serial">HOST-SERIAL:</entry>
          <entry name="uuid">28f35d31-d0fc-4902-a44c-9d2251f09e21</entry>
       </system>
    </sysinfo>
    <clock offset="variable" adjustment="0">
       <timer name="rtc" tickpolicy="catchup" />
       <timer name="pit" tickpolicy="delay" />
       <timer name="hpet" present="no" />
    </clock>
    <features>
       <acpi />
    </features>
    <cpu match="exact">
       <model>Conroe</model>
       <topology cores="1" threads="1" sockets="16" />
       <numa>
          <cell cpus="0,1" memory="8388608" />
       </numa>
    </cpu>
    <cputune />
    <devices>
       <input type="tablet" bus="usb" />
       <channel type="unix">
          <target type="virtio" name="ovirt-guest-agent.0" />
          <source mode="bind" path="/var/lib/libvirt/qemu/channels/28f35d31-d0fc-4902-a44c-9d2251f09e21.ovirt-guest-agent.0" />
       </channel>
       <channel type="unix">
          <target type="virtio" name="org.qemu.guest_agent.0" />
          <source mode="bind" path="/var/lib/libvirt/qemu/channels/28f35d31-d0fc-4902-a44c-9d2251f09e21.org.qemu.guest_agent.0" />
       </channel>
       <graphics type="vnc" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" keymap="en-us">
          <listen type="network" network="vdsm-ovirtmgmt" />
       </graphics>
       <controller type="virtio-serial" index="0" ports="16">
          <address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci" />
       </controller>
       <console type="pty">
          <target type="virtio" port="0" />
       </console>
       <controller type="usb" model="piix3-uhci" index="0">
          <address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci" />
       </controller>
       <controller type="scsi" model="virtio-scsi" index="0">
          <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci" />
       </controller>
       <controller type="ide" index="0">
          <address bus="0x00" domain="0x0000" function="0x1" slot="0x01" type="pci" />
       </controller>
       <memballoon model="none" />
       <interface type="bridge">
          <model type="virtio" />
          <link state="up" />
          <source bridge="ovirtmgmt" />
          <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci" />
          <mac address="00:1a:4a:16:10:9f" />
          <filterref filter="vdsm-no-mac-spoofing" />
          <bandwidth />
       </interface>
       <disk type="file" device="cdrom" snapshot="no">
          <driver name="qemu" type="raw" error_policy="report" />
          <source file="" startupPolicy="optional" />
          <target dev="hdc" bus="ide" />
          <readonly />
          <address bus="1" controller="0" unit="0" type="drive" target="0" />
       </disk>
       <disk snapshot="no" type="file" device="disk">
          <target dev="vda" bus="virtio" />
          <source file="/rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618" />
          <driver name="qemu" io="threads" type="raw" error_policy="stop" cache="none" />
          <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci" />
          <serial>e62bf4a4-6132-4c14-8aba-f292febdc4f9</serial>
       </disk>
    </devices>
    <pm>
       <suspend-to-disk enabled="no" />
       <suspend-to-mem enabled="no" />
    </pm>
    <os>
       <type arch="x86_64" machine="pc-i440fx-rhel7.5.0">hvm</type>
       <smbios mode="sysinfo" />
    </os>
    <metadata>
       <ovirt-tune:qos />
       <ovirt-vm:vm>
          <minGuaranteedMemoryMb type="int">8192</minGuaranteedMemoryMb>
          <clusterVersion>4.2</clusterVersion>
          <ovirt-vm:custom />
          <ovirt-vm:device mac_address="00:1a:4a:16:10:9f">
             <ovirt-vm:custom />
          </ovirt-vm:device>
          <ovirt-vm:device devtype="disk" name="vda">
             <ovirt-vm:imageID>e62bf4a4-6132-4c14-8aba-f292febdc4f9</ovirt-vm:imageID>
             <ovirt-vm:poolID>00000000-0000-0000-0000-000000000000</ovirt-vm:poolID>
             <ovirt-vm:volumeID>976ecba8-712b-4b9c-b3d3-9d6fe9d7e618</ovirt-vm:volumeID>
             <ovirt-vm:domainID>2a7334e7-c1d3-41fd-9552-2aacbfa4f9af</ovirt-vm:domainID>
          </ovirt-vm:device>
          <launchPaused>false</launchPaused>
          <resumeBehavior>auto_resume</resumeBehavior>
       </ovirt-vm:vm>
    </metadata>
 </domain>

Please note that
      <disk snapshot="no" type="file" device="disk">
         <target dev="vda" bus="virtio" />
         <source file="/rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618" />
         <driver name="qemu" io="threads" type="raw" error_policy="stop" cache="none" />
         <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci" />
         <serial>e62bf4a4-6132-4c14-8aba-f292febdc4f9</serial>
      </disk>

misses at all the <shareable/> element.


Version-Release number of selected component (if applicable):
 ovirt-hosted-engine-ha.noarch          2.2.5-1.el7ev              @rhv-4.2.2    

 vdsm.x86_64                            4.20.18-1.el7ev            @rhv-4.2.2    

 libvirt-client.x86_64                  3.9.0-13.el7               @rhel-7.5-base
 libvirt-lock-sanlock.x86_64            3.9.0-13.el7               @rhel-7.5-optional
 qemu-kvm-common-rhev.x86_64            10:2.10.0-20.el7           @rhevh-75     

 qemu-kvm-rhev.x86_64                   10:2.10.0-20.el7           @rhevh-75   

How reproducible:
seen on RHEL 7.5

Steps to Reproduce:
1. deploy hosted-engine, add additional hosts
2. kill the host where the engine VM is running
3. check other hosts

Actual results:
ovirt-ha-agent fails to start the engine VM on other hosts.
In /var/log/libvirt/qemu/HostedEngine.log:
 qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get "write" lock
 Is another process using the image?


Expected results:
ovirt-ha-agent can restart the engine VM on other hosts.

Additional info:
RHEL 7.5 specific?

Comment 1 Simone Tiraboschi 2018-02-20 15:50:27 UTC

A direct qemu invocation fails as well.

[root@rose05 ~]# LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm  -nographic -drive file=/var/run/vdsm/storage/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,format=raw,if=none,id=drive-virtio-disk0,serial=e62bf4a4-6132-4c14-8aba-f292febdc4f9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
qemu-kvm: -drive file=/var/run/vdsm/storage/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,format=raw,if=none,id=drive-virtio-disk0,serial=e62bf4a4-6132-4c14-8aba-f292febdc4f9,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead
qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get "write" lock
Is another process using the image?

Attaching strace output

Comment 2 Simone Tiraboschi 2018-02-20 15:52:22 UTC

Created attachment 1398270 [details]
qemu strace

Comment 3 Michal Skrivanek 2018-02-21 06:15:56 UTC

So it works in 7.5 with that shareable element?

Comment 4 Michal Skrivanek 2018-02-21 07:49:52 UTC

can you attach the actual OVF? Does it have the ovf:shareable element?
If not (so it's the same as the xml) then your VM definition doesn't have the disk set as shareable.

Comment 7 Simone Tiraboschi 2018-02-21 08:32:30 UTC

Attaching 28f35d31-d0fc-4902-a44c-9d2251f09e21.ovf as extracted from the OVF_STORE.

No shareable there.

Comment 8 Simone Tiraboschi 2018-02-21 08:33:09 UTC

Created attachment 1398567 [details]
ovf from the OVF_STORE

Comment 9 Michal Skrivanek 2018-02-21 09:25:03 UTC

it's set to ovf:shareable="false", so it's not there. Though I do not quite understand what's the desired state, do you intend to have <shareable> in libvirt xml definition? or something else?

Comment 10 Martin Sivák 2018-02-21 10:51:02 UTC

We do not want the disk to be shared. We use the lock to ensure exclusive access.

Comment 11 Simone Tiraboschi 2018-02-21 10:55:14 UTC

(In reply to Martin Sivák from comment #10)
> We do not want the disk to be shared. We use the lock to ensure exclusive
> access.

at vdsm level shared could be: none, exclusive, shared, transient

We need exclusive but it seams that at engine level we can set just share true/false.
I'm trying to understand if this is conflicting with shared: exclusive

Comment 12 Michal Skrivanek 2018-02-21 12:00:56 UTC

ok, that makes sense now. That is supported and should work should you set it in the metadata section. That is missing

So the question is how to get it there on the engine side

Comment 13 Michal Skrivanek 2018-02-21 12:08:09 UTC

can you paste how exactly it differs between 4.1 and 4.2? It generates a lease for the drive, right?

Comment 14 Simone Tiraboschi 2018-02-21 12:47:27 UTC

(In reply to Michal Skrivanek from comment #13)
> can you paste how exactly it differs between 4.1 and 4.2? It generates a
> lease for the drive, right?

No, we were using 'shared: exclusive' also before (since 3.4 I think) having VM leases support on engine side and nothing is changed on that area on ovirt-ha-agent side.

Comment 17 Michal Skrivanek 2018-02-21 14:44:55 UTC

(In reply to Simone Tiraboschi from comment #14)
> (In reply to Michal Skrivanek from comment #13)
> > can you paste how exactly it differs between 4.1 and 4.2? It generates a
> > lease for the drive, right?
> 
> No, we were using 'shared: exclusive' also before (since 3.4 I think) having
> VM leases support on engine side and nothing is changed on that area on
> ovirt-ha-agent side.

in the resulting libvirt xml I mean. IIUC the current code it's supposed to generate a lease

Comment 19 Simone Tiraboschi 2018-02-21 14:59:03 UTC

I tried reverting https://gerrit.ovirt.org/#/c/86435/ and reproducing without that and the issue is still there.

I don't think that the issue is related to https://bugzilla.redhat.com/show_bug.cgi?id=1504606

Comment 20 Michal Skrivanek 2018-02-21 17:54:03 UTC

Thanks Simone, interesting, then this is likely broken for a longer time, and only showed up because of the more strict 7.5 qemu locking (similar to bug 1395941)
There are things to fix within virt (adding "shared" to metadata), but I'm afraid this needs storage involvement anyway.

Allon, can anyone take a look at the HE lease mechanism? Doesn't seem to be related to the  gap in vmxml

Comment 21 Allon Mureinik 2018-02-21 17:59:13 UTC

(In reply to Michal Skrivanek from comment #20)
> Thanks Simone, interesting, then this is likely broken for a longer time,
> and only showed up because of the more strict 7.5 qemu locking (similar to
> bug 1395941)
> There are things to fix within virt (adding "shared" to metadata), but I'm
> afraid this needs storage involvement anyway.
> 
> Allon, can anyone take a look at the HE lease mechanism? Doesn't seem to be
> related to the  gap in vmxml
Sure.
Nir, Ala, can one of you take a look please?

Comment 22 Nir Soffer 2018-02-21 18:57:40 UTC

HE uses shared:exclusive, which acquire the volume lease for this drive. The
libvirt xml should contain a lease element with the lease path and offset of the 
active volume lease.

Simone: please attach the vm xml to the bug.

The error we see come from qemu, looks like libvirt local image locking conflicts
with qemu image locking.

Daniel: how do you suggest to debug this in libvirt/qemu?

Comment 23 Michal Skrivanek 2018-02-21 19:02:11 UTC

Nir, as per comment #19 Simone reproduced the same behavior when using the legacy vm conf (reverted msivak's change to use vm xml), so I assume (and Simone please confirm) that there it used shared=exclusive, there was no change on HE side regarding that part.
This lead me to the thought that it is not related to vm xml, but to RHEL 7.5 and/or some other vdsm refactoring

Comment 24 Simone Tiraboschi 2018-02-21 19:13:09 UTC

Engine generated libvirt XML for sure doesn't contain shared=exclusive and so we have also: https://bugzilla.redhat.com/1547479

But I reproduced it also reverting https://gerrit.ovirt.org/#/c/86435/ , and in that case we use a json vm configuration that contained for sure shared=exclusive, and the issue is there also in that case.

Comment 25 Nir Soffer 2018-02-21 19:32:24 UTC

(In reply to Michal Skrivanek from comment #23)
I think that "shared=exclusive" is a vdsm thing, you will not find it in the vm
xml.

We use "shared=shared" to add the "sharable" disk attribute (not related to volume
leases).

When using "shared=exclusive", we add a volume lease to the xml here:

2297         for dev_objs in self._devices.values():
2298             for dev in dev_objs:
2299                 for elem in dev.get_extra_xmls():
2300                     domxml._devices.appendChild(element=elem)

The vm xml must have a <lease> element with the details of the volume lease.

  <lease>
    <key>volume-uuid</key>
    <lockspace>sd-uuid</lockspace>
    <target offset="123" path=".../leases" />
  </lease>

Francesco is maintaining this area.

Comment 26 Michal Skrivanek 2018-02-21 20:07:04 UTC

(In reply to Nir Soffer from comment #25)
> (In reply to Michal Skrivanek from comment #23)
> I think that "shared=exclusive" is a vdsm thing, you will not find it in the
> vm xml.

yes, that is understood. It's a gap currently. But that's not the point here when considering comment #24 - the same behavior happens without vm xml now in RHEL 7.5

> When using "shared=exclusive", we add a volume lease to the xml here:
> 
> 2297         for dev_objs in self._devices.values():
> 2298             for dev in dev_objs:
> 2299                 for elem in dev.get_extra_xmls():
> 2300                     domxml._devices.appendChild(element=elem)
> 
> The vm xml must have a <lease> element with the details of the volume lease.

understood. This gap needs to be closed regardless. But first it needs to work without vm xml

Comment 27 Nir Soffer 2018-02-21 20:40:55 UTC

Adding back needinfo for Daniel, see comment 22.

Comment 28 Nir Soffer 2018-02-21 20:43:11 UTC

Simone, can you confirm that you have the lease element in the vm xml when using
vm conf? See comment 25.

Comment 29 Simone Tiraboschi 2018-02-21 22:10:41 UTC

(In reply to Nir Soffer from comment #28)
> Simone, can you confirm that you have the lease element in the vm xml when
> using
> vm conf? See comment 25.

Skipping the libvirt XML generated by the engine (as in 4.1), we have shared=exclusive in the json sent to vdsm:
2018-02-21 23:08:26,916+0200 INFO  (jsonrpc/5) [api.virt] START create(vmParams={u'emulatedMachine': u'pc-i440fx-rhel7.5.0', u'vmId': u'28f35d31-d0fc-4902-a44c-9d2251f09e21', u'devices': [{u'index': u'0', u'iface': u'virtio', u'format': u'raw', u'bootOrder': u'1', u'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'volumeID': u'976ecba8-712b-4b9c-b3d3-9d6fe9d7e618', u'imageID': u'e62bf4a4-6132-4c14-8aba-f292febdc4f9', u'readonly': u'false', u'domainID': u'2a7334e7-c1d3-41fd-9552-2aacbfa4f9af', u'deviceId': u'e62bf4a4-6132-4c14-8aba-f292febdc4f9', u'poolID': u'00000000-0000-0000-0000-000000000000', u'device': u'disk', u'shared': u'exclusive', u'propagateErrors': u'off', u'type': u'disk'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:10:9f', u'linkActive': u'true', u'network': u'ovirtmgmt', u'deviceId': u'a029a17b-c0d2-4045-9e89-e9b7a0e23b80', u'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'bridge', u'type': u'interface'}, {u'device': u'vnc', u'type': u'graphics', u'deviceId': u'30f608bc-8161-4db3-bd8d-c1c567f7ad75', u'address': u'None'}, {u'index': u'2', u'iface': u'ide', u'readonly': u'true', u'deviceId': u'8c3179ac-b322-4f5c-9449-c52e3665e0ae', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'cdrom', u'shared': u'false', u'path': u'', u'type': u'disk'}, {u'device': u'usb', u'specParams': {u'index': u'0', u'model': u'piix3-uhci'}, u'type': u'controller', u'deviceId': u'b30ade5c-5394-421f-85d7-c499341c0027', u'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x2'}}, {u'specParams': {u'index': u'0', u'model': u'virtio-scsi'}, u'deviceId': u'bc5e64f4-98b6-482e-8223-03fc525ae522', u'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'ide', u'specParams': {u'index': u'0'}, u'type': u'controller', u'deviceId': u'f565c69a-2b0f-4d4c-b004-3da303c43da5', u'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x1'}}, {u'device': u'virtio-serial', u'specParams': {u'index': u'0'}, u'type': u'controller', u'deviceId': u'48a86d50-518a-4f8c-8d93-a81c868ca022', u'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {u'device': u'console', u'type': u'console', u'deviceId': u'4af63e2a-1590-41fc-9a31-11d19ec2ada8', u'address': u'None'}, {u'device': u'virtio', u'specParams': {u'source': u'urandom'}, u'model': u'virtio', u'type': u'rng'}], u'smp': u'2', u'memSize': u'8192', u'cpuType': u'Conroe', u'spiceSecureChannels': u'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', u'vmName': u'HostedEngine', u'display': u'vnc', u'maxVCpus': u'16'}) from=::1,46156 (api:46)


and so the lease element in the XML sent to libvirt:

        <lease>
            <key>976ecba8-712b-4b9c-b3d3-9d6fe9d7e618</key>
            <lockspace>2a7334e7-c1d3-41fd-9552-2aacbfa4f9af</lockspace>
            <target offset="0" path="/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618.lease"/>
        </lease>


And the lease volume is there and for sure we can read:
[root@alma07 ~]# ls -l /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618.lease
-rw-rw----. 1 vdsm kvm 1048576 21 feb 14.11 /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618.lease
[root@alma07 ~]# dd if=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_alukiano_compute-ge-he-1/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618.lease of=/dev/null bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4,1 kB) copied, 0,000121077 s, 33,8 MB/s


but then the VM fails to start:
2018-02-21 23:08:28,173+0200 ERROR (vm/28f35d31) [virt.vm] (vmId='28f35d31-d0fc-4902-a44c-9d2251f09e21') The vm start process failed (vm:939)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2774, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-02-21T21:08:27.917516Z qemu-kvm: -drive file=/var/run/vdsm/storage/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618,format=raw,if=none,id=drive-virtio-disk0,serial=e62bf4a4-6132-4c14-8aba-f292febdc4f9,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead
2018-02-21T21:08:27.950047Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get "write" lock
Is another process using the image?


SELinux is clean
[root@alma07 ~]# ausearch -m avc
<no matches>

Attaching the whole vdsm log file.

Comment 30 Simone Tiraboschi 2018-02-21 22:11:28 UTC

Created attachment 1399037 [details]
vdsm logs

Comment 31 Simone Tiraboschi 2018-02-21 22:17:31 UTC

[root@alma07 ~]# vdsm-client Volume getInfo volumeID=976ecba8-712b-4b9c-b3d3-9d6fe9d7e618 imageID=e62bf4a4-6132-4c14-8aba-f292febdc4f9 storagepoolID=00000000-0000-0000-0000-000000000000 storagedomainID=2a7334e7-c1d3-41fd-9552-2aacbfa4f9af
{
    "status": "OK", 
    "lease": {
        "owners": [
            1
        ], 
        "version": 6
    }, 
    "domain": "2a7334e7-c1d3-41fd-9552-2aacbfa4f9af", 
    "capacity": "53687091200", 
    "voltype": "LEAF", 
    "description": "Hosted Engine Image", 
    "parent": "00000000-0000-0000-0000-000000000000", 
    "format": "RAW", 
    "generation": 0, 
    "image": "e62bf4a4-6132-4c14-8aba-f292febdc4f9", 
    "uuid": "976ecba8-712b-4b9c-b3d3-9d6fe9d7e618", 
    "disktype": "2", 
    "legality": "LEGAL", 
    "mtime": "0", 
    "apparentsize": "53687091200", 
    "truesize": "5334265856", 
    "type": "SPARSE", 
    "children": [], 
    "pool": "", 
    "ctime": "1519059771"
}

Comment 32 Daniel Berrangé 2018-02-22 09:09:43 UTC

(In reply to Nir Soffer from comment #22)
> The error we see come from qemu, looks like libvirt local image locking
> conflicts with qemu image locking.

I don't see any evidence that libvirt is doing locking on this image. Libvirt's fcntl based locking is disabled by default and OVirt has presumably enabled sanlock instead. Even if libvirt's fcntl locks were enabled, libvirt locks at a different byte offset to QEMU so they can co-exist. The error message:

 qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1: Failed to get "write" lock
 Is another process using the image?

Is referring to the disk

       <disk snapshot="no" type="file" device="disk">
          <target dev="vda" bus="virtio" />
          <source file="/rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618" />
          <driver name="qemu" io="threads" type="raw" error_policy="stop" cache="none" />
          <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci" />
          <serial>e62bf4a4-6132-4c14-8aba-f292febdc4f9</serial>
       </disk>

So it simply appears that 2 processes both have

/rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618

open at the same time. If you don't have 2 QEMU's running with it at once, perhaps you have a qemu-img process with it open, or qemu-nbd.

Comment 33 Nir Soffer 2018-02-22 13:13:22 UTC

(In reply to Daniel Berrange from comment #32)
...
> So it simply appears that 2 processes both have
> 
> /rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-
> 9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-
> 4b9c-b3d3-9d6fe9d7e618
> 
> open at the same time. If you don't have 2 QEMU's running with it at once,
> perhaps you have a qemu-img process with it open, or qemu-nbd.

Vdsm is not accessing the image when starting a vm with qemu-img or qemu-nbd.
I think this bug should move to qemu to investigate why locking the image failed.

Comment 34 Nir Soffer 2018-02-22 13:17:43 UTC

(In reply to Simone Tiraboschi from comment #31)
> [root@alma07 ~]# vdsm-client Volume getInfo
...
>     "lease": {
>         "owners": [
>             1
>         ], 
>         "version": 6
>     },

This show that the volume lease xml was generated correctly and libvirt acquired
the lease.

Comment 37 Daniel Berrangé 2018-02-22 14:46:35 UTC

(In reply to Nir Soffer from comment #33)
> (In reply to Daniel Berrange from comment #32)
> ...
> > So it simply appears that 2 processes both have
> > 
> > /rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-
> > 9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-
> > 4b9c-b3d3-9d6fe9d7e618
> > 
> > open at the same time. If you don't have 2 QEMU's running with it at once,
> > perhaps you have a qemu-img process with it open, or qemu-nbd.
> 
> Vdsm is not accessing the image when starting a vm with qemu-img or qemu-nbd.
> I think this bug should move to qemu to investigate why locking the image
> failed.

Something must be accessing it to get this error - try running 'lslocks' on the server in question when the error happens to see what other processes have locks open on that file.

Comment 38 Fam Zheng 2018-02-22 14:56:23 UTC

Is the same image, pointed to as /rhev/data-center/00000000-0000-0000-0000-000000000000/2a7334e7-c1d3-41fd-9552-2aacbfa4f9af/images/e62bf4a4-6132-4c14-8aba-f292febdc4f9/976ecba8-712b-4b9c-b3d3-9d6fe9d7e618, already open by a QEMU process running on another host (since the $subject says "on a different host")? If so, at libvirt level, "<shareable />" must be used for this setup to work, because from QEMU's point of view, this image _is_ shared.

Comment 39 Simone Tiraboschi 2018-02-22 15:00:30 UTC

The VM was running on host1 and so that lease was open there, then we forcefully shutdown host1 with 'poweroff -f' and we are not able, also after many hours, to restart that VM on host2.

Comment 40 Daniel Berrangé 2018-02-22 15:04:30 UTC

Presumably this image is on NFS.  Forceably shutting down an NFS client does *not* release any fcntl() locks it held. IIRC, the locks will only get released when that NFS client boots up and comes back online and flushes stale state on the NFS server.

Comment 41 Martin Sivák 2018-02-22 15:06:46 UTC

I believe we use sanlock leases for locking for exactly that reason.

Comment 42 Nir Soffer 2018-02-22 15:14:56 UTC

(In reply to Daniel Berrange from comment #40)
> Presumably this image is on NFS.  Forceably shutting down an NFS client does
> *not* release any fcntl() locks it held. IIRC, the locks will only get
> released when that NFS client boots up and comes back online and flushes
> stale state on the NFS server.

We cannot use file based locking which is not released when qemu is killed.

I think the qemu locking is not compatible with oVirt file based storage, and must
be disabled in this case. We should use it only for localfs storage.

To use qemu locking qemu must use a local resource (e.g. semaphore or local file)
for locking.

Comment 43 Daniel Berrangé 2018-02-22 15:19:42 UTC

(In reply to Nir Soffer from comment #42)
> (In reply to Daniel Berrange from comment #40)
> > Presumably this image is on NFS.  Forceably shutting down an NFS client does
> > *not* release any fcntl() locks it held. IIRC, the locks will only get
> > released when that NFS client boots up and comes back online and flushes
> > stale state on the NFS server.
> 
> We cannot use file based locking which is not released when qemu is killed.

The locks *are* released when QEMU is killed. The problem you've hit here is when the *host* is killed and then never powered back on.

Comment 44 Simone Tiraboschi 2018-02-22 15:25:09 UTC

(In reply to Daniel Berrange from comment #43)
> The locks *are* released when QEMU is killed. The problem you've hit here is
> when the *host* is killed and then never powered back on.

Exactly.
As soon as the host got power on again, we are able to restart the VM on any other host.

The share is on NFS v3!!!

The share is under /Compute_NFS but we have a lot of locks there on storage server side:

lslocks -o COMMAND,PID,TYPE,SIZE,MODE,M,START,END,PATH,BLOCKER
COMMAND           PID  TYPE SIZE MODE  M START        END PATH                              BLOCKER
libvirtd         1234 POSIX   4B WRITE 0     0          0 /run/libvirtd.pid                 
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2437 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2439 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   201        201 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   103        103 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   201        201 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   203        203 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   103        103 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   201        201 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   203        203 /Compute_NFS                      
(unknown)        1306 FLOCK   0B WRITE 0     0          0 /run                              
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2438 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
lockd            2432 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2439 LEASE   0B READ  0     0          0 /                                 
lvmetad           471 POSIX   4B WRITE 0     0          0 /run/lvmetad.pid                  
abrtd             695 POSIX   4B WRITE 0     0          0 /run/abrt/abrtd.pid               
rhsmcertd        1258 FLOCK   0B WRITE 0     0          0 /run/lock/subsys/rhsmcertd        
nfsd             2441 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /RHV_NFS                          
nfsd             2439 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2440 POSIX   0B READ  0   100        101 /RHV_NFS                          
iscsid           1312 POSIX   5B WRITE 0     0          0 /run/iscsid.pid                   
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2435 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2439 POSIX   0B READ  0   201        201 /RHV_NFS                          
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2442 POSIX   0B READ  0   201        201 /Compute_NFS                      
multipathd        500 POSIX   3B WRITE 0     0          0 /run/multipathd/multipathd.pid    
crond            1192 FLOCK   5B WRITE 0     0          0 /run/crond.pid                    
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /RHV_NFS                          
nfsd             2442 LEASE   0B READ  0     0          0 /RHV_NFS                          
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2442 POSIX   0B READ  0   103        103 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   100        101 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   103        103 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   201        201 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   203        203 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   100        100 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   201        201 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   203        203 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 POSIX   0B READ  0   100        101 /Compute_NFS                      
nfsd             2441 POSIX   0B READ  0   201        201 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2435 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 POSIX   0B READ  0   201        201 /Compute_NFS                      
nfsd             2441 POSIX   0B READ  0   203        203 /Compute_NFS                      
nfsd             2441 POSIX   0B READ  0   100        101 /RHV_NFS                          
nfsd             2442 POSIX   0B READ  0   201        201 /RHV_NFS                          
atd              1194 POSIX   5B WRITE 0     0          0 /run/atd.pid                      
master           1686 FLOCK  33B WRITE 0     0          0 /var/spool/postfix/pid/master.pid 
master           1686 FLOCK  33B WRITE 0     0          0 /var/lib/postfix/master.lock      
nfsd             2437 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2435 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2439 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2436 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2441 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Storage_NFS                      
nfsd             2440 LEASE   0B READ  0     0          0 /Compute_NFS                      
nfsd             2442 LEASE   0B READ  0     0          0 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   100        101 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   103        103 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   201        201 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   203        203 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   100        100 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   201        201 /RHV_NFS                          
lockd            2432 POSIX   0B READ  0   203        203 /RHV_NFS                          
nfsd             2441 LEASE   0B READ  0     0          0 /Compute_NFS                      
lockd            2432 POSIX   0B READ  0   100        101 /QE_images                        
lockd            2432 POSIX   0B READ  0   103        103 /QE_images                        
lockd            2432 POSIX   0B READ  0   201        201 /QE_images                        
lockd            2432 POSIX   0B READ  0   203        203 /QE_images                        
lockd            2432 POSIX   0B READ  0   100        100 /QE_images                        
lockd            2432 POSIX   0B READ  0   201        201 /QE_images                        
lockd            2432 POSIX   0B READ  0   203        203 /QE_images

Comment 45 Daniel Berrangé 2018-02-22 15:27:37 UTC

(In reply to Simone Tiraboschi from comment #44)
> (In reply to Daniel Berrange from comment #43)
> > The locks *are* released when QEMU is killed. The problem you've hit here is
> > when the *host* is killed and then never powered back on.
> 
> Exactly.
> As soon as the host got power on again, we are able to restart the VM on any
> other host.
> 
> The share is on NFS v3!!!

FYI i get the impression locking works better on NFS v4, as its a standard part of the protocol, rather than an out of band side-service. This might mean dead client detection is better, but I've no test env available to test this. Regardless, NFSv4 is generally a better choice than v3 no matter what.

Comment 46 Nir Soffer 2018-02-22 15:31:08 UTC

(In reply to Daniel Berrange from comment #43)
> The locks *are* released when QEMU is killed. The problem you've hit here is
> when the *host* is killed and then never powered back on.

This is the same issue from our point of view. We cannot use locking that require
the host to be up again. If a host loose power, we must be able to start a VM
on another host.

We are using sanlock lease to make this safe, and it supports this use case.

How do disable locking in libvirt xml?

Comment 47 Daniel Berrangé 2018-02-22 15:34:56 UTC

(In reply to Nir Soffer from comment #46)
> (In reply to Daniel Berrange from comment #43)
> > The locks *are* released when QEMU is killed. The problem you've hit here is
> > when the *host* is killed and then never powered back on.
> 
> This is the same issue from our point of view. We cannot use locking that
> require
> the host to be up again. If a host loose power, we must be able to start a VM
> on another host.
> 
> We are using sanlock lease to make this safe, and it supports this use case.
> 
> How do disable locking in libvirt xml?

There's no support for controlling QEMU file locking in libvirt at this time - it was turned on unconditionally in QEMU with no interaction from libvirt.

Comment 48 Nir Soffer 2018-02-22 15:47:07 UTC

Fam, this looks like another backward incompatible change in qemu, that may be good
for some users, but is not compatible with RHV use case.

Can disable locking in qemu-rhev until we have a better solution?

Comment 49 Fam Zheng 2018-02-23 02:17:06 UTC

This now seems like a result of a rare host crash combined with the odd NFSv3 behavior. I'm not sure it is worth reverting QEMU image locking as lose all the protection just for that.

Comment 50 Simone Tiraboschi 2018-02-23 08:27:30 UTC

(In reply to Fam Zheng from comment #49)
> This now seems like a result of a rare host crash combined with the odd
> NFSv3 behavior. I'm not sure it is worth reverting QEMU image locking as
> lose all the protection just for that.

HA is there just to restart VMs on host failures: this can break HA capabilities.
On oVirt we also have host fencing via IPMI and via sanlock for network unresponsive hosts so a sudden host reboot is not a that a rare event.

Moving to qemu since the issue seams there.

Comment 52 Michal Skrivanek 2018-02-23 08:45:11 UTC

(In reply to Fam Zheng from comment #49)
> This now seems like a result of a rare host crash combined with the odd
> NFSv3 behavior. I'm not sure it is worth reverting QEMU image locking as
> lose all the protection just for that.

Sadly, it's what RHV supports and customers rely on. QEMU new locking is useless in RHV, hence the request to be able to be able to either control that via libvirt or disable it unconditionally in qemu-kvm-rhev

Comment 53 Martin Sivák 2018-02-23 08:56:29 UTC

I just got an idea we could use in RHV maybe. Mounting the NFS storage with "-o nolock" will disable locking and so the other nodes will never learn about it. I am not sure if we do not use the lock for something else as well though.

Nir? What do you think?
Daniel, Fam? How will qemu react to FS with no locking support?

Btw Fam: rare host crash (or lost connectivity) is exactly what all the distributed systems like RHV and OpenStack need to handle. And this will probably affect OpenStack as well.

Comment 54 Fam Zheng 2018-02-23 09:17:49 UTC

(In reply to Martin Sivák from comment #53)
> Daniel, Fam? How will qemu react to FS with no locking support?

As long as the OFD lock API (fcntl(fd, F_OFD_SETLK, ..)) doesn't work, QEMU will disable image locking automatically.

> 
> Btw Fam: rare host crash (or lost connectivity) is exactly what all the
> distributed systems like RHV and OpenStack need to handle. And this will
> probably affect OpenStack as well.

OK, thanks for explaining.

Comment 55 Simone Tiraboschi 2018-02-23 09:31:42 UTC

We already mount with local_lock=none

let's see if nolock works.

Comment 56 Martin Sivák 2018-02-23 09:34:38 UTC

Which is exactly what configures distributed locking:

local_lock:

"If this option is not specified, or if  none  is  speci‐
 fied, the client assumes that the locks are not local."


Disabling locking might actually do what we want, unless we are limited by something else:

nolock:

"When using the nolock option, applications
 can lock files, but such locks  provide  exclusion  only
 against  other  applications running on the same client.
 Remote applications are not affected by these locks."

Comment 57 Simone Tiraboschi 2018-02-23 10:19:18 UTC

(In reply to Martin Sivák from comment #56)
> Disabling locking might actually do what we want, unless we are limited by
> something else:
> 
> nolock:
> 
> "When using the nolock option, applications
>  can lock files, but such locks  provide  exclusion  only
>  against  other  applications running on the same client.
>  Remote applications are not affected by these locks."

I confirm that the issue is not reproducible mounting the NFS share with nolock option.
Not sure if this can introduce any side effect somewhere else.

Comment 58 Daniel Berrangé 2018-02-23 10:27:32 UTC

FYI, I'm told by a storage maintainer that this is only really a problem with NFSv3. With NFSv4, locks use an active lease mechanism with the client having to refresh the lease periodically for it to remain valid.  So if you are using NFSv4 and the client dies with locks held, they should be revoked by the server after the lease renewal timeout is reached, allowing another host to acquire them.

Some more info here about NFSv4 locking here:

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.idan400/lockv4.htm

Given NFSv3 is a legacy protocol, I don't think it justifies disabling locking from QEMU side. The nolock mount option seems like a reasonable  workaround for V3, if the sites in question really can't use V4.

Comment 59 Martin Sivák 2018-02-23 10:44:39 UTC

On the other hand, we might not want to depend on two separate locking mechanisms at the same time. Especially when one would be cluster wide only on NFS. That would be a support nightmare.

What we have now is "battle tested" and works even when we use LVM on top of iSCSI/FC or with Gluster as the backing storage tech.

But we really need an answer from our storage folks here.

Btw, someone should tell the OpenStack team so they check if this affects them as well or not.

Comment 61 Nir Soffer 2018-02-25 13:17:46 UTC

(In reply to Martin Sivák from comment #59)
> On the other hand, we might not want to depend on two separate locking
> mechanisms at the same time. Especially when one would be cluster wide only
> on NFS. That would be a support nightmare.

I agree, we don't want to depend on 2 locking solutions. We have a locking 
solution that works with *any* storage supported by RHV. We don't want to use
a second locking solution that may work (never tested it) on NFS 4.

I think the basic issue is (again), changing the default behavior of qemu in a 
backward incompatible way. This does not work for RHV and probably other solutions
built on qemu.

This change will break existing RHV 4.1 installations, that must work with RHEL
7.5 - so we must have a solution for 7.5.

We cannot fix this using NFS "nolock" option, since:
- RHV 4.1 does not support this option
- This options works only for NFS - we need a solution for GlusterFS, CephFS,
  or an other posix-like file system that the RHV can use today

What we need is:
- 7.5: disable locking or make locking optional
- 7.6: if locking is made the default, add option to disable it

We need these changes also upstream - the changes break oVirt on Fedora 27.

Comment 63 Daniel Berrangé 2018-02-26 09:40:30 UTC

(In reply to Nir Soffer from comment #61)
> (In reply to Martin Sivák from comment #59)
> > On the other hand, we might not want to depend on two separate locking
> > mechanisms at the same time. Especially when one would be cluster wide only
> > on NFS. That would be a support nightmare.
> 
> I agree, we don't want to depend on 2 locking solutions. We have a locking 
> solution that works with *any* storage supported by RHV. We don't want to use
> a second locking solution that may work (never tested it) on NFS 4.

The sanlock locking mechanism doesn't provide the same level of protection against data corruption that QEMU's built-in locking does, because it relies on everything being done via the RHEV mgmt app. If any application or administrator runs qemu-img / QEMU themselves they're still at risk, which is what QEMU's locking protects against.

> I think the basic issue is (again), changing the default behavior of qemu in
> a 
> backward incompatible way. This does not work for RHV and probably other
> solutions
> built on qemu.

From OpenStack POV, the QEMU locking is welcomed as it adds protection against data corruption to images.

> We cannot fix this using NFS "nolock" option, since:
> - RHV 4.1 does not support this option
> - This options works only for NFS - we need a solution for GlusterFS, CephFS,
>   or an other posix-like file system that the RHV can use today

There's no evidence that any other filesystem besides obsolete NFSv3 has a problem that needs fixing, so it doesn't matter that "nolock" doesn't work with them.

> What we need is:
> - 7.5: disable locking or make locking optional
> - 7.6: if locking is made the default, add option to disable it
> 
> We need these changes also upstream - the changes break oVirt on Fedora 27.

From upstream / Fedora POV, RHV can be made to use the "nolock" option for NFS v3.

Comment 64 Kevin Wolf 2018-02-26 09:40:57 UTC

(In reply to Nir Soffer from comment #61)
> I agree, we don't want to depend on 2 locking solutions. We have a locking 
> solution that works with *any* storage supported by RHV. We don't want to use
> a second locking solution that may work (never tested it) on NFS 4.

For some values of "working". The image locking in QEMU is made specifically for cases where users manually modify images (e.g. with qemu-img) while a VM is using them. If your locking were able to prevent this, we would have had quite a few hard to debug bug reports less that turned out not to be a corruption bug in the QEMU code, but simply a user error.

Which means that the two locking solutions aren't protecting against the same thing, so neither of them is redundant.

> - This options works only for NFS - we need a solution for GlusterFS, CephFS,
>   or an other posix-like file system that the RHV can use today

So we established that file locking is broken in NFSv3, which is hardly a QEMU bug, but an NFS one (and apparently one that is fixed in more recent NFS versions). Did you find out that all of Gluster, Ceph and whatever else you're using are broken, too? Nobody mentioned this so far, and I would certainly hope that it's not the case.

Comment 65 Simone Tiraboschi 2018-02-26 11:52:45 UTC

*** Bug 1547033 has been marked as a duplicate of this bug. ***

Comment 76 Ademar Reis 2018-02-28 12:37:35 UTC

(In reply to Daniel Berrange from comment #58)
> FYI, I'm told by a storage maintainer that this is only really a problem
> with NFSv3. With NFSv4, locks use an active lease mechanism with the client
> having to refresh the lease periodically for it to remain valid.  So if you
> are using NFSv4 and the client dies with locks held, they should be revoked
> by the server after the lease renewal timeout is reached, allowing another
> host to acquire them.
> 
> Some more info here about NFSv4 locking here:
> 
> https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.
> idan400/lockv4.htm
> 
> Given NFSv3 is a legacy protocol, I don't think it justifies disabling
> locking from QEMU side. The nolock mount option seems like a reasonable 
> workaround for V3, if the sites in question really can't use V4.

Can the RHV team test it with NFSv4 to confirm the behavior? What is the lease timeout there?

BTW, NFSv4 is the default in RHEL-7.

Comment 77 Ademar Reis 2018-02-28 12:39:41 UTC

(In reply to Ademar Reis from comment #76)
> (In reply to Daniel Berrange from comment #58)
> > FYI, I'm told by a storage maintainer that this is only really a problem
> > with NFSv3. With NFSv4, locks use an active lease mechanism with the client
> > having to refresh the lease periodically for it to remain valid.  So if you
> > are using NFSv4 and the client dies with locks held, they should be revoked
> > by the server after the lease renewal timeout is reached, allowing another
> > host to acquire them.
> > 
> > Some more info here about NFSv4 locking here:
> > 
> > https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.
> > idan400/lockv4.htm
> > 
> > Given NFSv3 is a legacy protocol, I don't think it justifies disabling
> > locking from QEMU side. The nolock mount option seems like a reasonable 
> > workaround for V3, if the sites in question really can't use V4.
> 
> Can the RHV team test it with NFSv4 to confirm the behavior? What is the
> lease timeout there?
> 
> BTW, NFSv4 is the default in RHEL-7.

Also needinfor(QE) for some exploratory testing. Ping Li: can you please reproduce it directly without RHV, using NFSv4 and see which kind of lease timeouts are involved? Thanks.

Comment 78 Simone Tiraboschi 2018-02-28 12:46:26 UTC

(In reply to Ademar Reis from comment #76)
> Can the RHV team test it with NFSv4 to confirm the behavior? What is the
> lease timeout there?

We already tested that it's not reproducible on NFSv4:
see https://bugzilla.redhat.com/show_bug.cgi?id=1547033#c4

> BTW, NFSv4 is the default in RHEL-7.

We have also to handle upgrades from systems deployed in the past when NFSv3 was the default (at least on RHV side).

Comment 79 Daniel Berrangé 2018-02-28 13:01:02 UTC

IIUC, since the NFS server relies on the client notifying it when it comes back online to release the locks, there should be a way to fake that notification. ie once RHV has fenced the node to guarantee it is offline, RHV could issue a notification to the NFS server to force release the dead nodes' locks. This is something that tools like clustersuite probably know how to do already, since HA deployments have been using NFSv3 for a long time in the past before NFSv4 fixed the locking problems.

Comment 80 Ademar Reis 2018-02-28 13:31:18 UTC

Looks like there's a similar case with gluster, although the testcase seems to be different (simply blocking the connection instead of a crash): 

https://bugzilla.redhat.com/show_bug.cgi?id=1550016

Comment 82 Yaniv Lavi 2018-02-28 13:35:34 UTC

(In reply to Daniel Berrange from comment #79)
> IIUC, since the NFS server relies on the client notifying it when it comes
> back online to release the locks, there should be a way to fake that
> notification. ie once RHV has fenced the node to guarantee it is offline,
> RHV could issue a notification to the NFS server to force release the dead
> nodes' locks. This is something that tools like clustersuite probably know
> how to do already, since HA deployments have been using NFSv3 for a long
> time in the past before NFSv4 fixed the locking problems.

We can do a lot of things, but we are nearing a release as well and this is not something we can address without prior notice and planning.

Comment 83 Nir Soffer 2018-02-28 16:26:52 UTC

(In reply to Daniel Berrange from comment #79)
> IIUC, since the NFS server relies on the client notifying it when it comes
> back online to release the locks, there should be a way to fake that
> notification. ie once RHV has fenced the node to guarantee it is offline,
> RHV could issue a notification to the NFS server to force release the dead
> nodes' locks. This is something that tools like clustersuite probably know
> how to do already, since HA deployments have been using NFSv3 for a long
> time in the past before NFSv4 fixed the locking problems.

We can do this only as manual override, when the user confirm that the host is
not available.

But this is not needed since we are going to disable the NLM locks with NFSv3 by
default. This give the same protection as block storage - locks are local.

Comment 87 Ademar Reis 2018-05-30 18:45:43 UTC

This has been workarounded in RHV (NFSv3 mounts are using 'nolock') and in Cinder they're defaulting to NFSv4 or documenting the limitations (see Bug 1556957).

Hence I'm closing this BZ. Currently there are no plans to disable QEMU image locking.

Comment 90 John Ferlan 2018-08-30 11:43:43 UTC

*** Bug 1592582 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

ahino
akrejcir
aliang
alukiano
areis
berrange
chayang
coli
cshao
dfediuck
famz
fromani
juzhang
kgoldbla
knoel
kwolf
lsurette
michal.skrivanek
michen
mkalinin
msivak
mtessun
ngu
nsoffer
pingl
pnguyen
qzhang
rbalakri
rjones
srevivo
ssigwald
stirabos
timao
virt-maint
xuwei
ycui
yhong
yisun
ykulkarn
ylavi
yzhao