Bug 1044466 - VDSM ignores libvirt's succuss/failure when trying to disconnect a disk.
Summary: VDSM ignores libvirt's succuss/failure when trying to disconnect a disk.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ovirt-3.5.6
: 4.16.28
Assignee: Amit Aviram
QA Contact: Aharon Canan
URL:
Whiteboard: storage
: 1294071 (view as bug list)
Depends On: 807023
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-18 11:13 UTC by Elad
Modified: 2019-09-12 09:41 UTC (History)
16 users (show)

Fixed In Version: v4.17.5
Clone Of:
Environment:
Last Closed: 2015-12-22 13:25:32 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-3.5.z+
ylavi: planning_ack+
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine, libvirt, qemu and vdsm logs (1.67 MB, application/x-xz)
2013-12-18 11:13 UTC, Elad
no flags Details
engine, libvirt, qemu and vdsm logs (new) (1.67 MB, application/x-gzip)
2013-12-19 07:30 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2107461 0 None None None 2019-09-12 09:39:32 UTC
oVirt gerrit 45138 0 master MERGED vm: Libvirt quering after disk detach operation addition. Never
oVirt gerrit 45646 0 ovirt-3.6 MERGED vm: Libvirt quering after disk detach operation addition. Never
oVirt gerrit 47629 0 ovirt-3.5 MERGED vm: Libvirt quering after disk detach operation addition. Never

Description Elad 2013-12-18 11:13:52 UTC
Created attachment 838263 [details]
engine, libvirt, qemu and vdsm logs

Description of problem:
hotplugDisk fails with: 

Thread-63::DEBUG::2013-12-18 12:02:29,095::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 1 edom: 10 level: 2 message: internal error unable to execute QEMU command '__com.redhat
_drive_add': Duplicate ID 'drive-virtio-disk1' for drive
Thread-63::ERROR::2013-12-18 12:02:29,096::vm::3448::vm.Vm::(hotplugDisk) vmId=`86a4451b-6698-407c-9444-c713c77760cc`::Hotplug failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 3446, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/share/vdsm/vm.py", line 842, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 399, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Duplicate ID 'drive-virtio-disk1' for drive

Version-Release number of selected component (if applicable):
vdsm-4.13.2-0.1.rc.el6ev.x86_64
rhevm-3.3.0-0.40.rc.el6ev.noarch
libvirt-0.10.2-29.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
On file storage pool:
1) Detach a disk from VM and restart vdsm right after
2) Attach the same disk to the VM when vdsm takes SPM again

Actual results:
hotplugDisk fails


Additional info: 
engine, libvirt, qemu and vdsm logs

Comment 1 Elad 2013-12-19 07:30:23 UTC
Created attachment 838791 [details]
engine, libvirt, qemu and vdsm logs (new)

wrong file was uploaded, uploading the relevant file. (new)

Comment 2 Allon Mureinik 2013-12-19 11:14:43 UTC
Does this reproduce if you wait for the disk to be detached and only then restart VDSM?
Assuming the answer is "no", deferring this to 3.3.1. If it is "yes", please comment on it and return the target version to 3.3

Comment 3 Elad 2013-12-19 11:57:37 UTC
(In reply to Allon Mureinik from comment #2)
> Does this reproduce if you wait for the disk to be detached and only then
> restart VDSM?
> Assuming the answer is "no", deferring this to 3.3.1. If it is "yes", please
> comment on it and return the target version to 3.3

no

Comment 5 Yeela Kaplan 2013-12-30 09:05:44 UTC
no...
What happens in the hotunplug code: 
1 detach disk from python vm object
2 save vm object state to disk
3 hotunplug disk through libvirt

The restart probably happened between stage 2 and 3.

Comment 6 Ayal Baron 2014-02-16 09:59:56 UTC
Yeela, any update on this?

Comment 8 Allon Mureinik 2014-03-17 10:06:54 UTC
(In reply to Ayal Baron from comment #6)
> Yeela, any update on this?
Ping?

Comment 10 Liron Aravot 2014-08-12 12:17:25 UTC
The bugs in that area are general and not disk specific (for example - they are relevant for NIC as well).

The issue is that we persist the vm configuration on vdsm side for different purposes, so it always might be inconsistent with the actual state on libvirt and we are missing handling for that.

the solution here should be more general and not for that specific case, Allon - what's your take on that?

Comment 11 Allon Mureinik 2014-08-12 20:16:24 UTC
(In reply to Liron Aravot from comment #10)
> The bugs in that area are general and not disk specific (for example - they
> are relevant for NIC as well).
> 
> The issue is that we persist the vm configuration on vdsm side for different
> purposes, so it always might be inconsistent with the actual state on
> libvirt and we are missing handling for that.
> 
> the solution here should be more general and not for that specific case,
> Allon - what's your take on that?

I agree.
Michal - can one of your guys take a look at this please?

Comment 12 Michal Skrivanek 2014-08-14 14:32:13 UTC
I don't see anything general, I see the hotunplugDisk() code doing exactly what's described in comment #5.
Yes it's the case for networking too (adding Dan), but it's not the case for hotplug CPU for example.

Comment 13 Dan Kenigsberg 2014-08-15 15:37:25 UTC
In hotplug CPU we have a very similar bug - we set (and save) conf['smp'] slightly after setVcpusFlags() succeeds, which means that a crash would create a discrepancy between libvirt and vdsm's conf. Luckily for hotplug CPU, nothing really care about this.

We should use vm.conf to create domxml and then drop it. When it's needed for backward compatibility (list verb, migration), it has to be re-calculated from libvirt's domxml.

However, this is big task that can be solved piece-wise, hence I'm keeping this bug on storage.

Comment 14 Amit Aviram 2015-08-25 14:19:27 UTC
This problem occurs when detaching a device that has no OS installed on it. (Specifically, the problem here is that calling libvirt for detaching a device in that state returns a success, while actually the device is never detached.. This causing to the device to be duplicated and at some point it cannot be attached anymore with the described message)

Elad, did you test this issue with a VM that has an OS installed?

Comment 15 Elad 2015-08-26 11:02:31 UTC
I don't remember and we cannot know from the logs.

Comment 16 Amit Aviram 2015-08-27 14:00:45 UTC
It seems like this bug origins in Bug 807023, which by now is fixed. 
(basically, libvirt didn't check if detaching a disk succeeds and it deleted it from its pci-stubs. when trying to attach a device again, the device ID was already taken. The bug fix was to verify that the device was actually detached, and only then remove it from the configuration).

VDSM doesn't even handle the device's ID, it only requests an XML description of the disk to be attached.
However, VDSM ALSO should also verify that the disk is detached, as libvirt's returns a success when it succeeds sending the guest a detach REQUEST, which doesn't mean the device is actually detached.

The posted patch adds exactly that functionality. After having it, detaching and attaching should work correctly also if unplugging did not succeed.

Comment 17 Amit Aviram 2015-08-27 14:43:13 UTC
Note that if we cherry-pick this patch to 3.5.z, we must change VDSM's spec in rhel6 to support only libvirt >= 0.10.2-38.el6 as this bug is dependent on changes made in Bug 807023.

Comment 18 Sandro Bonazzola 2015-10-30 09:36:39 UTC
In oVirt testing is done on single release by default. Therefore I'm removing
the 3.6 flag. If you think this bug must be tested in 3.6 as well, please
re-add the flag. Please note we might not have testing resources to handle the
3.6 clone.

Comment 19 Amit Aviram 2015-11-10 13:39:34 UTC
Verification steps for the bug:
-

Comment 20 Amit Aviram 2015-11-10 13:43:44 UTC
Verification steps for the bug:
1. unsuccessful unplugging:
- Create a VM
- Add a disk to it, DO NOT install any OS.
- run the VM
- Try to hotunplug the disk.

VDSM should throw a timeout error for not being able to unlplug the disk.


2. successful unplugging:
- Run a VM with OS installed
- hotunplug a disk other than the one with OS.

The disk should be successfully unplugged.

Comment 21 Aharon Canan 2015-11-10 14:38:36 UTC
(In reply to Amit Aviram from comment #20)
> Verification steps for the bug:
> 1. unsuccessful unplugging:
> - Create a VM
> - Add a disk to it, DO NOT install any OS.
> - run the VM
> - Try to hotunplug the disk.
> 
> VDSM should throw a timeout error for not being able to unlplug the disk.
> 

Thread-3456::ERROR::2015-11-10 16:27:22,006::vm::4091::vm.Vm::(hotunplugDisk) vmId=`426b742f-5ce6-477d-a9fc-b9d9dbd83afe`::Timeout detaching drive vdb


> 
> 2. successful unplugging:
> - Run a VM with OS installed
> - hotunplug a disk other than the one with OS.
> 
> The disk should be successfully unplugged.

Disk successfully unplugged

Verified using vt18.2


Amit - 
Shouldn't we trigger vdsm restart (like stated in comment #5)

Comment 22 Amit Aviram 2015-11-10 15:11:07 UTC
No, The issue described in the bug description originates in libvirt's Bug 807023 which is now fixed. 
the problem in our side was that we didn't check at all if libvirt succeeded in unplugging the disk, it has nothing to do with restarting VDSM while plugging/hotplugging (which Bug 980914 describes)

Comment 23 Sandro Bonazzola 2015-12-22 13:25:32 UTC
oVirt 3.5.6 has been released and the bz verified, moving to closed current release.

Comment 24 Amit Aviram 2016-01-24 13:12:02 UTC
*** Bug 1294071 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.