1079697 – disk hotplug fails in case other disks were hotplugged while it was deactivated

Bug 1079697 - disk hotplug fails in case other disks were hotplugged while it was deactivated

Summary: disk hotplug fails in case other disks were hotplugged while it was deactivated

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.4.0
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Liron Aravot
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:	rhev3.5beta 1156165
TreeView+	depends on / blocked

Reported:	2014-03-23 09:44 UTC by Elad
Modified:	2016-02-10 16:37 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ovirt-engine-3.5.0_rc1.1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	amureini: needinfo+ amureini: Triaged+

Attachments	(Terms of Use)
engine, vdsm. libvirt and qemu logs (1.17 MB, application/x-gzip) 2014-03-23 09:44 UTC, Elad	no flags	Details
re-opengine logs (1.01 MB, application/x-gzip) 2014-05-28 12:13 UTC, Elad	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	26598	master	MERGED	engine: Check for conflicting address when hotplugging a disk	2020-01-24 22:21:23 UTC
oVirt gerrit	31008	master	MERGED	core: clear disk device address when it's being unplugged from a vm	2020-01-24 22:21:23 UTC
oVirt gerrit	31280	ovirt-engine-3.5	MERGED	core: clear disk device address when it's being unplugged from a vm	2020-01-24 22:21:23 UTC

Description Elad 2014-03-23 09:44:40 UTC

Created attachment 877747 [details]
engine, vdsm. libvirt and qemu logs

Description of problem:
I had a running VM with 1 disk. I deactivated the VM disk (hotunplug) and then added a new disk to it and hotplugged the new disk. I tried to hotplug the first disk to the VM and failed with a libvirt error in vdsm.   

Version-Release number of selected component (if applicable):
RHEV3.4-AV4
rhevm-3.4.0-0.10.beta2.el6ev.noarch
vdsm-4.14.5-0.1.beta2.el6ev.x86_64
libvirt-0.10.2-29.el6_5.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.6.x86_64

How reproducible:
Always

Steps to Reproduce:
On a shared DC:
1. Create and run a VM with 1 disk attached
2. Deactivate the VM disk (hotunplug)
3. Attach and activate (hotplug) a new disk to the VM
4. Try to activate the first disk


Actual results:
Hotplug fails with the following error in vdsm.log:

Thread-85::ERROR::2014-03-23 11:29:17,900::vm::3573::vm.Vm::(hotplugDisk) vmId=`ce69a69a-5e55-407f-a7db-961894a58577`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 3571, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/share/vdsm/vm.py", line 859, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error unable to execute QEMU command 'device_add': Device 'virtio-blk-pci' could not be initialized


Expected results:
Hotplug should succeed for all disk, even if another disks were plugged to the VM while disks were unplugged

Additional info: engine, vdsm. libvirt and qemu logs

Comment 1 Allon Mureinik 2014-03-26 09:32:24 UTC

hotplugging requires a guest, not sure if it /could/ work after unplugging the only (=OS) disk.

Comment 2 Xavi Francisco 2014-04-01 15:28:06 UTC

The problem is in ovirt-engine-backend and it's caused because libvirt does not accept the PCI address the engine has stored for the first disk because it's now being used by the second disk. When a disk is unplugged, his old address is freed and, in this case, is being assigned to thesecond disk when it's plugged.

A possible solution to this problem could be that, when a new disk is hotplugged into the VM, all the PCI addresses assigned to unplugged devices for that VM should be discarded. If that's not possible, that should be done at least for the unplugged devices with the conflicting address.

Comment 3 Elad 2014-05-28 12:09:55 UTC

Tested the scenario I described in comment #0 (VM with RHEL-6.5 installed).
Disk hotplug still fails in case it was deactivated before and another disk was hot-plugged in between.  Error in vdsm.log:

Thread-32::ERROR::2014-05-28 15:03:44,146::vm::3357::vm.Vm::(hotplugDisk) vmId=`e172b1c7-5bc5-4513-ab25-3496e4032901`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3355, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/share/vdsm/virt/vm.py", line 442, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 93, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error unable to reserve PCI address 0:0:6.0


Engine.log:

2014-05-28 15:03:46,240 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (org.ovirt.thread.pool-6-thread-42) [5ef63342] Command HotPlugDiskVDSCommand(HostName = green-vdsa, HostId = f397fb84-ac4e-4672-b11e-f0e4b24b3d65, vmId=e172b1c7-5bc5-4513-ab25-3496e4032901, diskId = af9be9e4-f4c1-4156-a14c-07581b47794c) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error unable to reserve PCI address 0:0:6.0, code = 45

Re-opening. 

Checked with ovirt-engine-3.5.0_alpha1.1:
ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch
vdsm-4.14.1-340.gitedb02ba.el6.x86_64
libvirt-0.10.2-29.el6_5.8.x86_64

Comment 4 Elad 2014-05-28 12:13:59 UTC

Created attachment 899960 [details]
re-opengine logs

Comment 5 Xavi Francisco 2014-06-03 09:21:08 UTC

We tried to reproduce the issue several times and it's not 100% reproducible. With various set ups I was not able to reproduce it but Elad was able to reproduce it once so we need to investigate a little bit more.

Comment 6 Xavi Francisco 2014-06-11 09:55:46 UTC

After more investigation, we've found that the issue is caused by a race condition: when you hotplug the first disk after hotplugging the second, the second disk may not have the plugged status in the database when the hotplug operation of the first disk looks for plugged devices with the same address in the database. 

As the hotplug operation is synchronous in the VDSM side and quite fast, the easiest solution would be to get a new exlcusive lock on the whole VM when hotplugging a disk avoiding such race conditions. Omer, what do you think?

Comment 7 Omer Frenkel 2014-06-11 10:46:08 UTC

wouldn't the approach you suggest on comment 2 (clear the device address on unplug) will solve this in a 'cleaner' way and will not require the lock?
on unplug, the address is not relevant anymore, and libvirt will assign a new address on plug?

Comment 9 Liron Aravot 2014-07-24 11:38:22 UTC

I see an with the solution of clearing the address on unplug - 
1. clear the address on unplug always is problematic - if the vm is down there's no need to clear the address as on the next start the device might be plugged with a different address.
2. clear the address on hot unplug isn't good enough, because than if we performed "cold" unplug (address left) and then started the vm we will try to add the device with a address which is already taken.

So basically we have three options here:
1. Synchronize the hot plug executions so that at each execution we'll know what addresses are taken by plugged devices and which are not
2. Always clear the address when performing hot plug and let libvirt to choose the address.
3. Ignore that race, in the worst case the user can try to hot plug the device again.

Omer, from the user prespective - when having a guest with os installed and running, is always clearing the address on hotplug will have affect the user experience? (driver installations, how the os will recognize the device..etc).

thanks,
Liron.

Comment 10 Omer Frenkel 2014-07-31 12:26:55 UTC

yes, but again, when you plug something that wasnt plugged before, you cant be sure the address will remain, i think the best approach would be to clear the address on plug, i think it will solve the issue and we could revert the previous fix that was done for this.

Comment 11 Liron Aravot 2014-07-31 12:59:22 UTC

Clearing the address on unplug may be bit problematic, if my vm is down and i unplug/plug things and clears the address on the next vm start they might be assigned with new addresses which may be bit irritating for the user.

We can clear it on the plug always when it hot plug, and when it's regular plug than compare it against the other devices and only if it's used than clear it (that's what i meant on the suggested solutions on https://bugzilla.redhat.com/show_bug.cgi?id=1079697#c9).

Sean/Allon - what solution do you prefer here?

Comment 12 Allon Mureinik 2014-08-03 04:37:14 UTC

I'm fine with clearing the address (under the KISS rule).

Comment 13 Elad 2014-08-25 08:20:42 UTC

Disk hotplug succeeds for the scenario described in comment #0
Checked with block and file storage, with OS installed and without.

Verified using ovirt-3.5 RC1.1

Comment 14 Allon Mureinik 2015-02-16 19:12:18 UTC

RHEV-M 3.5.0 has been released, closing this bug.

Comment 15 Allon Mureinik 2015-02-16 19:12:20 UTC

RHEV-M 3.5.0 has been released, closing this bug.

Note You need to log in before you can comment on or make changes to this bug.