Created attachment 877747 [details] engine, vdsm. libvirt and qemu logs Description of problem: I had a running VM with 1 disk. I deactivated the VM disk (hotunplug) and then added a new disk to it and hotplugged the new disk. I tried to hotplug the first disk to the VM and failed with a libvirt error in vdsm. Version-Release number of selected component (if applicable): RHEV3.4-AV4 rhevm-3.4.0-0.10.beta2.el6ev.noarch vdsm-4.14.5-0.1.beta2.el6ev.x86_64 libvirt-0.10.2-29.el6_5.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.6.x86_64 How reproducible: Always Steps to Reproduce: On a shared DC: 1. Create and run a VM with 1 disk attached 2. Deactivate the VM disk (hotunplug) 3. Attach and activate (hotplug) a new disk to the VM 4. Try to activate the first disk Actual results: Hotplug fails with the following error in vdsm.log: Thread-85::ERROR::2014-03-23 11:29:17,900::vm::3573::vm.Vm::(hotplugDisk) vmId=`ce69a69a-5e55-407f-a7db-961894a58577`::Hotplug failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 3571, in hotplugDisk self._dom.attachDevice(driveXml) File "/usr/share/vdsm/vm.py", line 859, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) libvirtError: internal error unable to execute QEMU command 'device_add': Device 'virtio-blk-pci' could not be initialized Expected results: Hotplug should succeed for all disk, even if another disks were plugged to the VM while disks were unplugged Additional info: engine, vdsm. libvirt and qemu logs
hotplugging requires a guest, not sure if it /could/ work after unplugging the only (=OS) disk.
The problem is in ovirt-engine-backend and it's caused because libvirt does not accept the PCI address the engine has stored for the first disk because it's now being used by the second disk. When a disk is unplugged, his old address is freed and, in this case, is being assigned to thesecond disk when it's plugged. A possible solution to this problem could be that, when a new disk is hotplugged into the VM, all the PCI addresses assigned to unplugged devices for that VM should be discarded. If that's not possible, that should be done at least for the unplugged devices with the conflicting address.
Tested the scenario I described in comment #0 (VM with RHEL-6.5 installed). Disk hotplug still fails in case it was deactivated before and another disk was hot-plugged in between. Error in vdsm.log: Thread-32::ERROR::2014-05-28 15:03:44,146::vm::3357::vm.Vm::(hotplugDisk) vmId=`e172b1c7-5bc5-4513-ab25-3496e4032901`::Hotplug failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3355, in hotplugDisk self._dom.attachDevice(driveXml) File "/usr/share/vdsm/virt/vm.py", line 442, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 93, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) libvirtError: internal error unable to reserve PCI address 0:0:6.0 Engine.log: 2014-05-28 15:03:46,240 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (org.ovirt.thread.pool-6-thread-42) [5ef63342] Command HotPlugDiskVDSCommand(HostName = green-vdsa, HostId = f397fb84-ac4e-4672-b11e-f0e4b24b3d65, vmId=e172b1c7-5bc5-4513-ab25-3496e4032901, diskId = af9be9e4-f4c1-4156-a14c-07581b47794c) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error unable to reserve PCI address 0:0:6.0, code = 45 Re-opening. Checked with ovirt-engine-3.5.0_alpha1.1: ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch vdsm-4.14.1-340.gitedb02ba.el6.x86_64 libvirt-0.10.2-29.el6_5.8.x86_64
Created attachment 899960 [details] re-opengine logs
We tried to reproduce the issue several times and it's not 100% reproducible. With various set ups I was not able to reproduce it but Elad was able to reproduce it once so we need to investigate a little bit more.
After more investigation, we've found that the issue is caused by a race condition: when you hotplug the first disk after hotplugging the second, the second disk may not have the plugged status in the database when the hotplug operation of the first disk looks for plugged devices with the same address in the database. As the hotplug operation is synchronous in the VDSM side and quite fast, the easiest solution would be to get a new exlcusive lock on the whole VM when hotplugging a disk avoiding such race conditions. Omer, what do you think?
wouldn't the approach you suggest on comment 2 (clear the device address on unplug) will solve this in a 'cleaner' way and will not require the lock? on unplug, the address is not relevant anymore, and libvirt will assign a new address on plug?
I see an with the solution of clearing the address on unplug - 1. clear the address on unplug always is problematic - if the vm is down there's no need to clear the address as on the next start the device might be plugged with a different address. 2. clear the address on hot unplug isn't good enough, because than if we performed "cold" unplug (address left) and then started the vm we will try to add the device with a address which is already taken. So basically we have three options here: 1. Synchronize the hot plug executions so that at each execution we'll know what addresses are taken by plugged devices and which are not 2. Always clear the address when performing hot plug and let libvirt to choose the address. 3. Ignore that race, in the worst case the user can try to hot plug the device again. Omer, from the user prespective - when having a guest with os installed and running, is always clearing the address on hotplug will have affect the user experience? (driver installations, how the os will recognize the device..etc). thanks, Liron.
yes, but again, when you plug something that wasnt plugged before, you cant be sure the address will remain, i think the best approach would be to clear the address on plug, i think it will solve the issue and we could revert the previous fix that was done for this.
Clearing the address on unplug may be bit problematic, if my vm is down and i unplug/plug things and clears the address on the next vm start they might be assigned with new addresses which may be bit irritating for the user. We can clear it on the plug always when it hot plug, and when it's regular plug than compare it against the other devices and only if it's used than clear it (that's what i meant on the suggested solutions on https://bugzilla.redhat.com/show_bug.cgi?id=1079697#c9). Sean/Allon - what solution do you prefer here?
I'm fine with clearing the address (under the KISS rule).
Disk hotplug succeeds for the scenario described in comment #0 Checked with block and file storage, with OS installed and without. Verified using ovirt-3.5 RC1.1
RHEV-M 3.5.0 has been released, closing this bug.